Webrtc : Steps for starting video/audio calling . PART-A

TARANG SHARMATARANG SHARMA
7 min read

Hey, hi! Welcome to another blog. As I promised, I will bring you three blogs in the WebRTC series. Here I am with the second part of this series. While writing, I realized this part needs its own sub-parts, so this is Part A. In this blog, we will learn in-depth about every step involved in signaling , in Part B will discuss Data transfer. I will try to explain each concept in the simplest way possible.

Lets list out the steps again in webrtc peer to peer communication :

  1. Signalling.

  2. Data transfer .

    SIGNALLING .

    To send a message to a friend on WhatsApp, you need their phone number and both of you must have WhatsApp installed. If these conditions aren't met, you can't communicate.

    Similarly, for WebRTC peer-to-peer communication, you need to establish a connection that meets specific requirements to support the communication. This is what signaling is. In this step, we establish a connection for proper data transfer.

    Let's discuss the steps involved in signaling.

    1. SDP OFFER / ANSWER

    2. ICECANDIDATES transfer (STUN/TURN)

    1. SDP OFFER / ANSWER

      SDP is Session description protocol , an SDP object is shared between two peers for initial connection , it contains several information like :

      • Codec: The codec used for the connection

      • Source address: The address of the source

      • Timing information: The timing information for audio and video

      • Version: The version of the SDP protocol being used

      • Origin: The username, session ID, session version, and network address of the session's originator

      • Session name: The name of the session

      • Connection information: Information about the connection

      • Session activity time: The time the session is active

NOTE:

what is CODEC ?๐Ÿง

CODEC stands for coder + decoder ( or compressor + de-compressor) , As we know videos and audio files can be large which can not be send through the network efficiently , Thus we use some processes (can be hardware or software processes) that reduces the size of file so that it can used for streaming . These processes are called codecs.

Examples : audio codecs : OPUS , video codecs: VP8 , H.264 .

Steps :

1. To initiate the peer connection setup from the calling side, we create a RTCPeerConnection object and then call createOffer() to create a RTCSessionDescription object. This session description is set as the local description using setLocalDescription() and is then sent over our signaling channel to the receiving side. (All these functions are provided by webrtc but it itself does act as signalling server).

async function makeCall() {

const configuration ={'iceServers'[{'urls':'stun:stun.l.google.com:19302'}]}

const peerConnection = new RTCPeerConnection(configuration); signalingChannel.addEventListener('message', async message => { if(message.answer) {

const remoteDesc = new RTCSessionDescription(message.answer); await peerConnection.setRemoteDescription(remoteDesc); } });

const offer = await peerConnection.createOffer();

await peerConnection.setLocalDescription(offer); signalingChannel.send({'offer': offer});

}

Local Description : objects that describe the configuration and media information for / of local peer.

Remote Description : objects that describe the configuration and media information for / of remote peer.

2. On the receiving side, we wait for an incoming offer before we create our RTCPeerConnection instance. Once that is done we set the received offer using setRemoteDescription(). Next, we call createAnswer() to create an answer to the received offer. This answer is set as the local description using setLocalDescription() and then sent to the calling side over our signaling server.

const peerConnection = new RTCPeerConnection(configuration); signalingChannel.addEventListener('message', async message => { if (message.offer) { peerConnection.setRemoteDescription(new RTCSessionDescription(message.offer)); const answer = await peerConnection.createAnswer(); await peerConnection.setLocalDescription(answer); signalingChannel.send({'answer': answer}); } });

Once the two peers have set both the local and remote session descriptions they know the capabilities of the remote peer. This doesn't mean that the connection between the peers is ready. For this to work we need to collect the ICE candidates at each peer and transfer (over the signaling channel) to the other peer.

  1. ICE candidates

    ICE (Internet connectivity establishment) protocol involves sharing public IP addresses and ports with each other for establishing connection .

    But why?

    Nice question , so basically here we have to discuss about NAT first.

    NAT (Network address translation)

    NAT (Network Address Translation) is a way to map private IP addresses and ports (local) to public IP addresses before sending information over the internet. NAT is important because it provides security and allows multiple local devices to share a single public IP address.

    Types :

    here are three main types of NAT:

    1. Full Cone NAT:

    2. Port-Preserving NAT:

    3. Symmetric NAT:

    You can go and learn more about these types by yourself, but I will explain them later when needed.

    So, where does this mapping happen? ๐Ÿง

    NAT mapping occurs at the router or firewall before your public details are exposed. That's why we don't share our private IP address and port in ICE candidates. Instead, we give the public IP and port mapped by NAT for security reasons.

    An example of an ICE candidate might look like this: "a=candidate:1020301000 192.168.1.10 5000 udp 1020301000 192.168.1.10 5000 typ host", where "192.168.1.10" is the IP address, "5000" is the port number, "udp" is the transport protocol, and "host" indicates the candidate type (in this case, a direct connection to the host machine).

    What is TRICKLE ICE ?๐Ÿง

    Trickle ICE is a technique used in WebRTC to speed up the process of establishing a real-time connection between two devices. In traditional ICE (Interactive Connectivity Establishment), all possible connection paths (called ICE candidates) are gathered before checking for a successful connection. This can be time-consuming, especially in complex network environments.

    Trickle ICE improves upon this by allowing ICE candidates to be exchanged incrementally as they become available. This means that as soon as a new candidate is discovered, it can be shared with the other device, and a connection attempt can be made. This can significantly reduce the time it takes to establish a connection, especially in scenarios where network conditions are dynamic or unpredictable.

    https://webrtc.org/getting-started/peer-connections

    STUN AND TURN SERVERS .

    Lets now talk about these two types of servers and why do we need them .

    So as we now know that for communication to happen between two peers we need ip addresses and ports , but we cannot expose our private ip and port , this is where NAT mapping comes in .

    But the thing is we cannot directly access NAT mappings over router or firewall from our device due to security reasons , this is where STUN comes into picture .

    STUN (Session Travel Utilities for NAT )

    STUN basically provides the public ip and port for the device ,which can later be send to other peer in the ice candidates .

    Now lets again go through the whole process till now to get some clarity .

    We're nearly finished establishing the connection. Once complete, you'll be able to share audio and video. However, you might have noticed we haven't discussed TURN.

    Let's address that now.๐Ÿฅณ

    TURN (Traversal Using Relay for NAT)

    There's a specific type of NAT mapping that restricts communication: symmetric NAT. STUN, a tool used for network address translation, doesn't function with symmetric NAT.

    What is symmetric NAT? ๐Ÿง

    It's a type of NAT that assigns a unique external IP address and port to every request originating from a single internal IP address and port.

    It maps all requests from a single internal IP address and port to a single external IP address and port.

    Imagine your internet service provider (ISP) uses dynamic IP addresses, which change over time. In a connection with symmetric NAT, if the ISP assigns a new external IP address, the connection will fail because the original mapping is no longer valid. This is where a TURN server becomes essential.

    TURN (Traversal Using Relay for NAT) works by acting as a mediator between peers. When a peer requests a connection through a TURN server, the server establishes a relay, allowing the peers to communicate indirectly. This circumvents the limitations imposed by symmetric NAT.

    I'll provide a detailed explanation of TURN and STUN in an upcoming blog post

    Now the last thing !

    I am attaching the link of webrtc guide where you can see webrtc api for implementation.

    https://webrtc.org/getting-started/overview

    NOTE :

    1. Webrtc supports signalling server to function but cannot implement by itself .

    2. Working of TURN and STUN is in itself a topic to discuss so will discuss in another blog .

    3. In next blog I will go deeper into webrtc architecture and other things that I might have missed in this blog .

    4. Add comments and your suggestions .

Thank you for reading out my blog , stay tuned for upcoming blogs , take care , keep learning . ๐Ÿ˜‡

0
Subscribe to my newsletter

Read articles from TARANG SHARMA directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

TARANG SHARMA
TARANG SHARMA