Home

 / Blog / 

Comparing Call Quality: 100ms vs. Twilio in 1:1 Web Calls

Comparing Call Quality: 100ms vs. Twilio in 1:1 Web Calls

December 15, 20236 min read

Share

twilio vs 100ms.png

Use case: 1:1 Call Quality

As Twilio announces the end of life for their Programmable Video product, many companies are left with the critical decision of finding alternative solutions. Going beyond the usual feature rundowns and abundant migration guides saturating the web, we performed a focused benchmarking exercise comparing 100ms stack with Twilio. Our mission: unearthing real data that empowers our prospective customers to make informed decisions.

This post inaugurates our series, evaluating the audio and video quality on 1:1 calls over the web. Before we delve into the finer details of the test setup, let’s take a quick sneak peek at the different types of media servers used on both platforms and how they leverage media quality and enhance end-user experience.

P2P vs SFU architecture

In the Peer-to-Peer (P2P) model, clients form direct mesh connections to exchange media. While this setup is optimized for one-on-one calls, its scalability diminishes beyond 3-4 clients, and it lacks support for on-cloud recording and post-processing features. For scenarios involving larger group calls, the Selective Forwarding Unit (SFU) comes into play. In this configuration, each client uploads their media once to the server, which then redistributes it to all connected clients. Although SFU optimizes for uplink efficiency, downlink remains resource-intensive, with each client receiving N-1 streams from all publishing clients. To alleviate the demand on download bandwidth, many SFU architectures implement Simulcast. This involves the publishing client sending multiple renditions of the media stream, and the SFU forwards the optimal rendition to each client based on their available bandwidth capacity.

mesh routing arch

sfu routing arch

There are a lot of nuances in the SFU while tuning for quality and efficiency. One of the hardest things to get right in an SFU is how to handle varying network conditions. In the real world, the upload and download speeds of a client keep changing. When the network is good, SFU should send media at the highest quality possible. Conversely, during network fluctuations, SFU should respond quickly by reducing the quality, so that video and audio don't freeze. Implementing an effective congestion control system including simulcast support, handling subscribe degradations, and support for temporal scaling makes this a most critical component to benchmark for the high-quality end-user experience.

Test set up

In this test, our primary focus is evaluating how Twilio and 100ms SFUs perform against each other under simple constraints. Twilio supports both P2P rooms and Group rooms (SFU-based), while 100ms exclusively supports SFU-based media routing. For this test, we created SFU-based rooms with Simulcast enabled on both Twilio and 100ms platforms. To make the source video exactly the same, we used OBS virtual camera and played a local video file in a loop.

We precisely matched simulcast layer settings, including resolution, frame rate, and bitrate, between Twilio and 100ms for a fair comparison. These settings play a significant role, making their uniformity crucial. The complete set of input parameters for the test is listed in the table below.

Parameters Values
SFU Enabled
Simulcast Enabled
Video codec vp8
Audio codec opus
Simulcast layer #1 720p / 24 fps / 2500Kbps
Simulcast layer #2 360p / 24fps / 800Kbps
Simulcast layer #3 180p / 24 fps / 100Kbps

Network Bandwidth Shaping Test

In our quest to mimic real-world network fluctuations, we decided to keep things simple rather than getting bogged down with complex tests. We started with a test that only isolated network changes.

  1. We focused solely on 1:1 calls over the web.
  2. Throttling was applied only to receiver bandwidth, while upload bandwidth remained unlimited.
  3. Both delay and packet loss were consistently set to 0.

Even with this stripped-down approach, we can gain some valuable insights into the SFU’s congestion control capabilities. We wrote a Python script that does the traffic shaping by going through a predetermined series of bitrates (refer to the image below). WebRTC exposes a wealth of metrics to make this kind of quality comparison easy. Our script, then, queries WebRTC stats every second, aggregates some data, and prints these values to stdout for later analysis.

bandwidth in kbps vs time in seconds

As we can see from the chart above, the test starts with a very high bandwidth for the receiver and SFU sends the highest-quality stream available. After 20 seconds, the download bandwidth is throttled to 150Kbps. This tests how quickly the SFU is able to adjust to a sharp fall. Subsequently, a phase of sustained low download speed ensues, marked by some variations. At the 740-second mark, the download bandwidth increases to 1200Kbps and then 1500Kbps. This segment tests how quickly SFU can raise the quality as network congestion eases. The majority of the test is between 150Kbps to 500Kbps, providing insights into the SFU's performance under sustained low download speeds.

Results

To compare the impact on end-user quality, we analyzed the following key metrics in particular.

Video Freeze Count

This metric is used to quantify the instances when a video stream appears to freeze or stop during the call. For the end user, these are seen as small stutters in the video.

number of freezes

Video Freezes Duration

This metric serves as a comprehensive measure of the overall user experience, influenced by how quickly an SFU can adjust to bad network conditions.

duration of freezes in seconds

Audio Concealed Samples

Audio sample concealment, if excessive, can be heard as distorted audio and reduces the audio quality.

audio samples concealed

Amount of Time Spent in Each Resolution, Frame Rate Combination

More time at higher resolutions and frame rates indicates higher quality video received in the call.

time spent in each layer

Twilio spent the most time in the lowest quality layer (180p12) and slightly better at higher quality (360p12) compared to 100ms.

Videos

Video recordings during the initial 30 seconds, when bitrate drops sharply, show the effect on audio/video during sharp network transitions.

Twilio

100ms

Conclusion

On analyzing the key metrics from the test results, it becomes apparent that 100ms exhibits a clear advantage over Twilio. With fewer freezes and shorter freeze times, 100ms keeps video more stable and smooth. The audio quality is also noticeably better with 100ms, with lesser packet losses and concealments required. When we talk about the visible picture quality (resolution) and frames per second (how smoothly the video runs), 100ms performs marginally better in extremely low bandwidth conditions. Though Twilio rendered a higher resolution video layer a little longer than 100ms, there is a bigger difference between the two providers at the lowest quality tier. We see that Twilio had longer stretches of very ‘bad’ video - low clarity along with longer freezes whereas 100ms playback suffered from lesser freezes even if the clarity is poor.

Overall, 100ms performs better than Twilio in very bad network conditions and is comparable when network conditions are stable.

Resources

Github repo for the quality benchmarking test

Video

Share

Related articles

See all articles