Demystifying WebRTC

WebRTC (Web Real-Time Communication) has revolutionized the way web applications handle communication. It empowers developers to embed real-time audio, video, and data exchange functionalities directly within web pages and apps, eliminating the need for plugins or additional downloads. This blog’s attempts in demystifying WebRTC is the first step in learning the basics of this technology.

Signaling: The Orchestrator of Connections

WebRTC itself doesn’t establish direct connections between browsers. Signaling, the first act in the WebRTC play, takes center stage. It involves exchanging information about the communication session between peers. This information typically includes:

  • Session Description Protocol (SDP): An SDP carries details about the media streams (audio/video) each peer intends to send or receive, along with the codecs they support.
  • ICE Candidates: These describe the network addresses and ports a peer can use for communication.
  • Offer/Answer Model: The initiating peer sends an SDP (offer) outlining its capabilities. The receiving peer responds with an SDP (answer) indicating its acceptance and potentially modifying the offer.

Several signaling mechanisms can be employed, including WebSockets, Server-Sent Events (SSE), or even custom solutions. The choice depends on the application’s specific needs and desired level of real-time interaction.

NAT Traversal: Hurdles and Leapfrogs

WebRTC connections often face the obstacle of Network Address Translation (NAT). NAT devices on home networks hide private IP addresses behind a single public address. Direct communication between peers behind NATs becomes a challenge. WebRTC employs a combination of techniques to overcome this hurdle:

  • STUN (Session Traversal Utilities for NAT): A peer sends a STUN request to a public server, which reveals the public IP and port the NAT maps the request to. This helps a peer learn its own public facing address.
  • TURN (Traversal Using Relays around NAT): When a direct connection isn’t feasible due to restrictive firewalls, TURN servers act as relays. Peers send their media streams to the TURN server, which then forwards them to the destination peer. While TURN provides a reliable fallback, it introduces latency and may not be suitable for bandwidth-intensive applications.
NAT traversal in WebRTC

NAT Traversal in webRTC

Image Credit : García, Boni & Gallego, Micael & Gortázar, Francisco & Bertolino, Antonia. (2019). Understanding and estimating quality of experience in WebRTC applications. Computing. 101. 10.1007/s00607-018-0669-7.

ICE: The Candidate for Connectivity

The Interactive Connectivity Establishment (ICE) framework plays a pivotal role in NAT traversal. Here’s how it works:

  1. Gathering Candidates: Each peer gathers potential connection points (local IP addresses and ports) it can use for communication. These include public addresses obtained via STUN and local network interfaces.
  2. Candidate Exchange: Peers exchange their gathered candidates with each other through the signaling channel.
  3. Connectivity Checks: Each peer attempts to establish a connection with the other using the received candidates. This might involve trying different combinations of local and remote candidates.
  4. Best Path Selection: Once a successful connection is established, the peers determine the optimal path based on factors like latency and bandwidth.

SDP: The Session Description

The Session Description Protocol (SDP) acts as a blueprint for the WebRTC session. It’s a text-based format that conveys essential information about the media streams involved:

  • Media types: Whether it’s audio, video, or data communication.
  • Codecs: The specific compression formats used for encoding and decoding media.
  • Transport protocols: The underlying protocols used for media transport (e.g., RTP for real-time data).
  • ICE candidates: The potential connection points offered by each peer.

The SDP is exchanged during the signaling phase, allowing peers to negotiate and agree upon a mutually supported configuration for the communication session.

v=0 
o=- 487255629242026503 2 IN IP4 127.0.0.1 
s=- 
t=0 0 

a=group:BUNDLE audio video 
a=msid-semantic: WMS 6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG 
m=audio 9 RTP/SAVPF 111 103 104 9 0 8 106 105 13 126 
c=IN IP4 0.0.0.0

a=rtcp:9 IN IP4 0.0.0.0 
a=ice-ufrag:8a1/LJqQMzBmYtes 
a=ice-pwd:sbfskHYHACygyHW1wVi8GZM+ 
a=ice-options:google-ice 
a=fingerprint:sha-256 28:4C:19:10:97:56:FB:22:57:9E:5A:88:28:F3:04:
   DF:37:D0:7D:55:C3:D1:59:B0:B2:81 :FB:9D:DF:CB:15:A8 
a=setup:actpass 
a=mid:audio 
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level 
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time 

a=sendrecv 
a=rtcp-mux 
a=rtpmap:111 opus/48000/2 
a=fmtp:111 minptime=10 
a=rtpmap:103 ISAC/16000 
a=rtpmap:104 ISAC/32000 
a=rtpmap:9 G722/8000 
a=rtpmap:0 PCMU/8000 
a=rtpmap:8 PCMA/8000 
a=rtpmap:106 CN/32000 
a=rtpmap:105 CN/16000 
a=rtpmap:13 CN/8000 
a=rtpmap:126 telephone-event/8000 

a=maxptime:60 
a=ssrc:3607952327 cname:v1SBHP7c76XqYcWx 
a=ssrc:3607952327 msid:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG 9eb1f6d5-c3b246fe
   -b46b-63ea11c46c74 
a=ssrc:3607952327 mslabel:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG 
a=ssrc:3607952327 label:9eb1f6d5-c3b2-46fe-b46b-63ea11c46c74 
m=video 9 RTP/SAVPF 100 116 117 96 

c=IN IP4 0.0.0.0 
a=rtcp:9 IN IP4 0.0.0.0 
a=ice-ufrag:8a1/LJqQMzBmYtes
a=ice-pwd:sbfskHYHACygyHW1wVi8GZM+ 
a=ice-options:google-ice 

a=fingerprint:sha-256 28:4C:19:10:97:56:FB:22:57:9E:5A:88:28:F3:04:
   DF:37:D0:7D:55:C3:D1:59:B0:B2:81 :FB:9D:DF:CB:15:A8 
a=setup:actpass 
a=mid:video 
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset 
a=extmap:3 http://www.webrtc.org/experiments/rtp-hdrext/abs-send-time

a=sendrecv 
a=rtcp-mux 
a=rtpmap:100 VP8/90000 
a=rtcp-fb:100 ccm fir 
a=rtcp-fb:100 nack 
a=rtcp-fb:100 nack pli 
a=rtcp-fb:100 goog-remb 
a=rtpmap:116 red/90000 
a=rtpmap:117 ulpfec/90000 
a=rtpmap:96 rtx/90000 

a=fmtp:96 apt=100 
a=ssrc-group:FID 1175220440 3592114481 
a=ssrc:1175220440 cname:v1SBHP7c76XqYcWx 
a=ssrc:1175220440 msid:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG
   43d2eec3-7116-4b29-ad33-466c9358bfb3 
a=ssrc:1175220440 mslabel:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG 
a=ssrc:1175220440 label:43d2eec3-7116-4b29-ad33-466c9358bfb3 
a=ssrc:3592114481 cname:v1SBHP7c76XqYcWx 
a=ssrc:3592114481 msid:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG
   43d2eec3-7116-4b29-ad33-466c9358bfb3 
a=ssrc:3592114481 mslabel:6x9ZxQZqpo19FRr3Q0xsWC2JJ1lVsk2JE0sG 
a=ssrc:3592114481 label:43d2eec3-7116-4b29-ad33-466c9358bfb3

SDP Example

Security: Guarding the Communication Channel

WebRTC prioritizes secure communication. Two key protocols ensure data integrity and confidentiality:

  • Secure Real-time Transport Protocol (SRTP): SRTP encrypts the media content (audio/video) being transmitted between peers. This safeguards the content from eavesdroppers on the network.
  • Datagram Transport Layer Security (DTLS): DTLS secures the signaling channel, protecting the SDP and ICE candidates exchanged during session establishment. It establishes a secure connection using digital certificates and encryption.

SCTP: Streamlining Data Delivery

While WebRTC primarily relies on RTP for media transport, it also supports the Stream Control Transmission Protocol (SCTP). SCTP offers several advantages over RTP:

  • Ordered Delivery: SCTP guarantees the order in which data packets are delivered, which is crucial for reliable data communication.
  • Multihoming: A peer can use multiple network interfaces with SCTP, improving reliability and redundancy.
  • Partial Reliability: SCTP allows selective retransmission of lost packets, improving efficiency.

WebRTC might look complex to a beginner, however it is not a new technology. It is infact combination of existing protocols, codecs, networking mechanisms and transport to enable two clients behind firewall start a P2P session to exchange media and data. The beauty of WebRTC is displayed in two humans able to exchange the bond of love despite being continents apart. Lookout for future blogs for more on this amazing technology.

Bibliography: