I attended Meta’s RTC@Scale 2024 Conference where Meta talked about two new major changes that it accomplished while revamping the audio processing core stack. BERYL – new breakthrough Acoustic Echo Cancellation by Meta and MLOW – a new low bitrate audio codec fully written in software. this blog contains notes on Beryl. PDF of handwritten notes can be found here.
BERYL -full software AC (by Sriram Srinivasan & Hoang Do)
META did 20% reduction in “No Audio” or “Audio device reliability” issue on iOS & Android
15% reduction in P50 mouth to ear latency on Android
Revamp of Audio processing stack core for WhatsApp, Instagram messenger
Very diverse user base
Different kinds of handsets
Different Geography
Noisy conditions
Both high end & Low end phones (more than 20% low end ARMV7)
Based on telemetry and user feedback Meta decided to tackle 1. ECHO and 2. Audio Quality under low bit rate network
High end devices use ML to suppress echo
To accommodate low end devices which cannot run ML, a baseline solution for echo cancellation is needed
WebRTC (Web Real-Time Communication) has revolutionized the way web applications handle communication. It empowers developers to embed real-time audio, video, and data exchange functionalities directly within web pages and apps, eliminating the need for plugins or additional downloads. This blog’s attempts in demystifying WebRTC is the first step in learning the basics of this technology.
Signaling: The Orchestrator of Connections
WebRTC itself doesn’t establish direct connections between browsers. Signaling, the first act in the WebRTC play, takes center stage. It involves exchanging information about the communication session between peers. This information typically includes:
Session Description Protocol (SDP): An SDP carries details about the media streams (audio/video) each peer intends to send or receive, along with the codecs they support.
ICE Candidates: These describe the network addresses and ports a peer can use for communication.
Offer/Answer Model: The initiating peer sends an SDP (offer) outlining its capabilities. The receiving peer responds with an SDP (answer) indicating its acceptance and potentially modifying the offer.
Several signaling mechanisms can be employed, including WebSockets, Server-Sent Events (SSE), or even custom solutions. The choice depends on the application’s specific needs and desired level of real-time interaction.
NAT Traversal: Hurdles and Leapfrogs
WebRTC connections often face the obstacle of Network Address Translation (NAT). NAT devices on home networks hide private IP addresses behind a single public address. Direct communication between peers behind NATs becomes a challenge. WebRTC employs a combination of techniques to overcome this hurdle:
STUN (Session Traversal Utilities for NAT): A peer sends a STUN request to a public server, which reveals the public IP and port the NAT maps the request to. This helps a peer learn its own public facing address.
TURN (Traversal Using Relays around NAT): When a direct connection isn’t feasible due to restrictive firewalls, TURN servers act as relays. Peers send their media streams to the TURN server, which then forwards them to the destination peer. While TURN provides a reliable fallback, it introduces latency and may not be suitable for bandwidth-intensive applications.
NAT Traversal in webRTC
Image Credit : GarcÃa, Boni & Gallego, Micael & Gortázar, Francisco & Bertolino, Antonia. (2019). Understanding and estimating quality of experience in WebRTC applications. Computing. 101. 10.1007/s00607-018-0669-7.
ICE: The Candidate for Connectivity
The Interactive Connectivity Establishment (ICE) framework plays a pivotal role in NAT traversal. Here’s how it works:
Gathering Candidates: Each peer gathers potential connection points (local IP addresses and ports) it can use for communication. These include public addresses obtained via STUN and local network interfaces.
Candidate Exchange: Peers exchange their gathered candidates with each other through the signaling channel.
Connectivity Checks: Each peer attempts to establish a connection with the other using the received candidates. This might involve trying different combinations of local and remote candidates.
Best Path Selection: Once a successful connection is established, the peers determine the optimal path based on factors like latency and bandwidth.
SDP: The Session Description
The Session Description Protocol (SDP) acts as a blueprint for the WebRTC session. It’s a text-based format that conveys essential information about the media streams involved:
Media types: Whether it’s audio, video, or data communication.
Codecs: The specific compression formats used for encoding and decoding media.
Transport protocols: The underlying protocols used for media transport (e.g., RTP for real-time data).
ICE candidates: The potential connection points offered by each peer.
The SDP is exchanged during the signaling phase, allowing peers to negotiate and agree upon a mutually supported configuration for the communication session.
WebRTC prioritizes secure communication. Two key protocols ensure data integrity and confidentiality:
Secure Real-time Transport Protocol (SRTP): SRTP encrypts the media content (audio/video) being transmitted between peers. This safeguards the content from eavesdroppers on the network.
Datagram Transport Layer Security (DTLS): DTLS secures the signaling channel, protecting the SDP and ICE candidates exchanged during session establishment. It establishes a secure connection using digital certificates and encryption.
SCTP: Streamlining Data Delivery
While WebRTC primarily relies on RTP for media transport, it also supports the Stream Control Transmission Protocol (SCTP). SCTP offers several advantages over RTP:
Ordered Delivery: SCTP guarantees the order in which data packets are delivered, which is crucial for reliable data communication.
Multihoming: A peer can use multiple network interfaces with SCTP, improving reliability and redundancy.
Partial Reliability: SCTP allows selective retransmission of lost packets, improving efficiency.
WebRTC might look complex to a beginner, however it is not a new technology. It is infact combination of existing protocols, codecs, networking mechanisms and transport to enable two clients behind firewall start a P2P session to exchange media and data. The beauty of WebRTC is displayed in two humans able to exchange the bond of love despite being continents apart. Lookout for future blogs for more on this amazing technology.