Understanding WebRTC: Real-Time Communication for the Web and Beyond

Sandip SabaleSandip Sabale
6 min read

WebRTC (Web Real-Time Communication) is an open-source technology that enables real-time communication (RTC) over the web and on various native platforms without requiring third-party plugins. It allows applications to exchange audio, video, and arbitrary data directly between peers, offering powerful capabilities for building real-time communications, such as video calls, audio calls, and even file transfers.

WebRTC is supported by most modern web browsers and can be integrated into native platforms like Android, iOS, Windows, macOS, and Linux, making it a versatile tool for cross-platform communication.

Key WebRTC APIs Summary

WebRTC relies on three main APIs to enable real-time communication:

getUserMedia: Grants applications access to the user's local media devices (e.g., camera and microphone) to capture video and audio streams, which can be shared over a WebRTC connection.

RTCPeerConnection: Manages the connection between peers, handling tasks like network traversal, media encoding/decoding, and ensuring a stable, secure exchange of audio and video streams.

RTCDataChannel: Facilitates peer-to-peer data exchange, allowing arbitrary data (e.g., text, files) to be sent directly between peers with low latency, without requiring an intermediary server.

These APIs collectively enable WebRTC to provide direct and efficient peer-to-peer communication.

How WebRTC Works

The working of WebRTC can be understood by breaking it down into several essential components and processes: Signaling, SDP (Session Description Protocol), and ICE (Interactive Connectivity Establishment). Let's go through each of these steps to understand the flow.

1. Signaling

Signaling is the process by which WebRTC clients exchange information required to establish a peer-to-peer connection. While WebRTC handles media transfer between peers, it does not specify how peers discover each other. That's where signaling comes in.

WebRTC relies on signaling servers to help clients exchange metadata necessary to establish the connection, such as media configurations and network information. This process typically happens over a protocol like WebSocket, which supports bidirectional communication. Once both clients exchange their metadata, they can begin establishing a peer-to-peer connection.

Some key functions of signaling include:

  • Exchanging Session Information: Information like SDP offers/answers is exchanged to describe the media configurations between the peers.

  • Network Setup: Information about each peer's network environment is exchanged to facilitate network traversal, allowing peers to connect even behind firewalls or NATs (Network Address Translators).

It’s important to note that WebRTC itself does not define a signaling mechanism, so developers can use WebSocket, SIP, XMPP, or other signaling protocols to implement the discovery and setup of peers.

2. SDP (Session Description Protocol)

SDP is a protocol that describes the multimedia communication session information exchanged between peers. It specifies crucial details about the media session, such as the codecs used, the media formats supported, encryption methods, and the network information necessary to initiate and maintain the session.

Here’s how SDP works in the WebRTC context:

  • SDP Offer/Answer Model: One peer (say, Client A) generates an SDP offer and sends it to the other peer (Client B). This offer includes information such as the media formats supported by Client A (e.g., video and audio codecs) and the network configurations. Client B responds with an SDP answer, either accepting or modifying the offer. This SDP exchange ensures that both peers agree on the media formats and network setup before starting the connection.

SDP defines several parameters:

  • Session Information: Information like the session ID and expiration time.

  • Media Descriptions: Details about the media types (audio/video) and codecs used (e.g., VP8 for video or Opus for audio).

  • Network Information: IP addresses, ports, and other details to route the media streams.

3. ICE (Interactive Connectivity Establishment)

Once signaling is complete, the next challenge is establishing a peer-to-peer connection. However, connecting two devices on the internet is not straightforward due to firewalls, NATs, and private IP addresses. This is where ICE comes into play.

ICE is a framework used by WebRTC to handle network traversal and establish a connection between peers. It helps peers discover the best path for direct communication, even if they are behind NATs or firewalls.

ICE relies on two key components:

  • STUN (Session Traversal Utilities for NAT): This protocol allows devices to determine their public IP addresses and the type of NAT they are behind. It helps clients share this public information with other peers, allowing them to establish a connection directly when possible.

  • TURN (Traversal Using Relays around NAT): If a direct peer-to-peer connection is not possible (e.g., due to strict firewalls), TURN servers act as intermediaries, relaying the data between the peers. Here’s how ICE works:

Both peers start by gathering possible connection candidates. These can include the local IP address, public IP address (from the STUN server), or a relay address (from a TURN server). The peers exchange these candidates through signaling. The peers then test these candidates to find the best possible connection path. Once an optimal path is found, the peer-to-peer connection is established, and media or data can flow directly between the peers.

Summary of WebRTC Workflow

In a WebRTC communication between two peers (a caller and a callee) and a signaling server, the connection is established through the following steps:

  1. Network Information Gathering: Both the caller and callee connect with STUN or TURN servers to gather their respective network information, such as public IP addresses and port numbers, required for connection setup.

  2. Offer Creation: The caller generates an SDP offer (session description with media configuration) and sends it to the callee via the signaling server.

  3. Offer Acceptance: The callee receives the offer, processes it, and sends an SDP answer (accepting or modifying the offer) back to the caller via the signaling server.

  4. ICE Candidate Generation (Caller): The caller generates ICE candidates (possible network routes) and sends them to the callee through the signaling server.

  5. ICE Candidate Generation (Callee): Similarly, the callee generates its own ICE candidates and sends them back to the caller via the signaling server.

  6. Peer-to-Peer Communication: Once a valid connection is established using the best ICE candidate, the caller and callee begin exchanging data (audio, video, or other) directly over the peer-to-peer connection, without needing further involvement from the signaling server.

WebRTC’s inner workings make it quite fascinating to experiment with. One interesting and fun way one might thing to explore WebRTC is by manually exchanging SDP (Session Description Protocol) between peers to start a video call.

However, while you can manually exchange SDP for fun and learning, the process of sharing ICE candidates (used for network traversal) cannot be done manually. ICE candidates handle the complexities of bypassing NATs and firewalls, even in a local network setting. Without sharing ICE candidates, peers won’t be able to establish a connection. This is why signaling mechanisms are essential.

Conclusion

WebRTC simplifies the creation of applications that involve real-time audio, video, and data communication, making it ideal for building applications like video conferencing, online collaboration tools, and even gaming platforms.

0
Subscribe to my newsletter

Read articles from Sandip Sabale directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sandip Sabale
Sandip Sabale