/ Blog / 

Integrating Video Conferencing in Your Product

Integrating Video Conferencing in Your Product

October 21, 20226 min read


Integrate Video Conferencing | Cover Image

Live interactive videos in the products that allow people to live, work, play, and learn together are the future. The onset of the global pandemic led us to use video conferencing tools, such as Zoom, for interacting with people.

Did you read about our $20M Series A funding?

Settling into this new reality, builders (aka developers) are coming up with great ideas about how video conferencing is not the central fabric but an enabler in a shared environment.

Are you a Flutter developer? checkout our recent guide on Flutter WebRTC

Some live interactive videos ideas

Let's 'zoom' in a bit to understand how developers are thinking beyond Zoom:

Interactive Video using 100ms

  1. Building a Yoga App with real-time pose detection and workout score would assist the yoga teacher, unlike Zoom, where teachers find it hard.
  2. Virtual office app which empowers water cooler moments, and all-day audio-only connections.
  3. An online school App with special tools for class management and learning. E.g., Attendance management, polls, quizzes, and universal annotations are all built within the app, which makes the lives of both students and teachers less stressful.

Read more about designing EdTech app with 100ms and translating needs into features.

Gold-quality audio/video call

Developers want high-quality video conferencing even before the thought of building custom applications. The complexity of video conferencing, however, is higher than just streaming video.

Also, the complexity increases exponentially with the number of people interacting via video. Let’s look at some simple math.

If 98% of people experience good-quality video calls for a single-person streaming, then the probability of having a high-quality video call with 2 people = 0.98*0.98 = 0.96 = 96%

But, the probability of having a good-quality video call with 20 people when a single person is streaming becomes = 0.98 * 0.98 *0.98...20 times = 0.66 = 66%

As you can notice that in a shared experience, everyone's error rate gets multiplied and the probability of having a good experience reduces with more people.

Let's dive deeper into the issues:

  1. I can't connect: The most common problems are network issues and frequent disconnections. Corporate VPN or firewall issues make it worse.
  2. I can't select the right microphone or camera: Using an external camera or microphone aggravates the issue.
  3. I can't hear/see you: Another common problem with video calls is not being able to see/hear the participants. Words getting garbled and video getting frozen is also fairly common.
  4. I am hearing myself: Echo is a very annoying problem where one bad device can spoil the whole conference.
  5. I can't see what you are sharing: Blurred screen-share.
  6. My device died: Device gets heated up, and consumes too much battery.
  7. Let's switch off our videos so that we can talk: One of the participant’s network is bad and that forces everyone in the room to switch off their videos.
  8. I got a phone call and the video conference ended: Common interruptions, app going into background, other apps taking control of microphone and speaker, my music app and conferencing both playing audio simultaneously.

Developers need to spend a significant amount of time discovering and fix these issues.  A lot of time the issues are device specific and a lot harder to fix. The time spent on making the video conferencing work could be better spent on the growth of the product.

100ms abstracts out all of these complexities in the APIs so that developers don't spend time on fixing the same issues.

Before Autotune and After Autotune

Tuning audio/video parameters

In order to support a good-quality video conference, developers need to spend a significant amount of time tuning the parameters for bandwidth availability and device peculiarities.

Imagine a 10-party conference where everyone is sending 256kbps of video data.  Now let's take some end-device constraints as an example.

Network bandwidth required >  what is available

This is a very common problem where not everyone's network has the required download bandwidth. Take a case of:

  • DSL Link with 1000 kbps downlink available
  • Downlink bandwidth required = 9 other parties * 256 kbps = 2300kbps.

This ultimately results in a bad video conferencing experience.

How to fix:

The brute force fix is to reduce everyone's video upload bitrate to fit into the bandwidth availability of the participant with the worst internet speed.  In our case, this would mean adjusting everyone's bitrate to 100kbps. However, this would not yield the perfect video conferencing experience.

The right solution is to degrade the experience of the participant with the least internet speed while the rest of the participants enjoy the good quality video. However, designing such a solution requires significant effort.

CPU memory required > what is available

This problem happens especially with mobile devices. Take a case of:

  • An Android phone with a CPU available to decode and render 4 videos only
  • Since it's a 10-party conference, the phone will try to decode all 10 videos

This will lead to the phone overheating and crashing, which in turn will spoil the video conferencing experience.

How to fix the problem:

  • Decode and render the video of only 4 active speakers.

The right solution will require constant active speaker detection. It is not advisable to download the video of participants who are not active speakers to save on the network. This means the developers will have to write complicated logic of who are the active speakers and show/hide their videos in real time.

A significant amount of time needs to be spent building logic for handling diverse network and device conditions. 100ms abstracts out these constructs as "auto-tune" and changes these parameters dynamically such that everyone in the conference will be able to enjoy a high-quality video experience.

Real-time database: Building an engaging interactive videos product

Instead of solving audio/video issues, building user engagement is where developers want to spend time. However, they find the primitives missing for synchronizing the audio/video with the state of the room and participants. Let’s take a few examples:

  1. Building a hand-raise feature needs a synchronized real-time db
  2. Showing a reaction or emoji will again need some real-time db synchronized among participants of the room
  3. Even a simple count of participants needs a globally synchronized room variable

At 100ms, we understand these challenges and hence plan to build a real-time database that is supported along with audio/video infra.

Integrated Real Time Analytics

Metrics: Grow your product

Things worked great in the internal dogfooding sessions. Now metrics are needed to measure engagement with the users. Typically, metrics are classified into two areas -  Growth metrics and error metrics.  One shows an upward trend while the other one pulls the product down.  Questions that you are trying to find answers for:

  1. Growth - Did users discover the feature?
  2. Growth - How much time did it take for the other party to join after the invite was sent?
  3. Growth - How much time is spent on the call? Did it increase after adding audio/video?
  4. Error - Were participants able to start or join an audio/video call without errors?
  5. Error - How is the quality of the call? Does it work on low-end devices and mobile networks?

The other kinds of issues are more social or related to business logic.

  1. Low invite-to-join ratio: For synchronous interactions, people need to come together at the same time. It's not easy to get everyone to tune in at the same time.
  2. Low engagement: People don't want to turn their videos or mics on. We need to make online conferencing more engaging.

Integrating Video Conferencing using 100ms

How can 100ms help in building interactive videos?

Being a video-first team, we understand how time-consuming and painful it is to build scalable video applications that work flawlessly. 100ms has been built on the principle that complex problems that are common for all video applications must get abstracted in the SDK. The abstraction, however, shouldn't limit developers to an opinionated user experience or box them into just a few common use cases.

Build your live app with 100ms and get first 10,000 minutes for free - Try it now

In the next blog, we will go into the details of how we are helping developers build live applications with a 1000% reduction in the lines of code required.

Until then try one of our step-by-step guides to building Clubhouse 👋  like app



Related articles

See all articles