WebRTC — Building Peer-to-Peer Video Call Architecture in the Browser

Posted on: 4/27/2026 10:18:36 AM

Table of contents

1. What is WebRTC and How Does It Work?
2. Signaling — The First Step WebRTC Doesn't Define
1. Signaling Server Implementation Tips
3. NAT Traversal — STUN, TURN and the ICE Framework
4. Media Pipeline — Codecs, Encryption and Adaptive Bitrate
5. Production Architecture — P2P, SFU and MCU
6. Open-Source SFUs — LiveKit, mediasoup and Janus
7. Encoded Transform — True End-to-End Encryption
1. Encoded Transform Browser Support (04/2026)
8. Production Deployment — Reference Architecture
1. 8.1 Deployment Checklist
2. 8.2 Code Sample — Signaling Server with ASP.NET Core
9. WebRTC Performance Optimization
10. Real-World Use Cases Beyond Video Calls
1. WHIP & WHEP — New Standards for Live Streaming
Conclusion

Every day, billions of minutes of video calls happen on Google Meet, Zoom, Discord and hundreds of other apps — all running on the same foundation: WebRTC. This set of browser APIs enables direct peer-to-peer transmission of audio, video and arbitrary data without plugins, without Flash, without installing anything. This article dives deep into WebRTC architecture from network protocols to production deployment with SFU, helping you understand how to build large-scale real-time communication systems.

7.7B+ WebRTC minutes per week globally

<200ms Average P2P latency with STUN

98% Browser support for WebRTC (2026)

85% Connections succeed via STUN without TURN

1. What is WebRTC and How Does It Work?

WebRTC (Web Real-Time Communication) is a collection of W3C/IETF standard APIs and protocols that enable browsers and native apps to establish peer-to-peer connections for media and data transmission. Unlike the traditional client-server model, WebRTC allows two devices to communicate directly — reducing latency, saving server bandwidth and simplifying architecture for real-time use cases.

Three core WebRTC APIs in the browser:

MediaStream (getUserMedia) — Access camera, microphone and screen capture
RTCPeerConnection — Establish P2P connections, handle codecs, SRTP encryption and ICE candidate management
RTCDataChannel — Arbitrary data channel (text, files, game state) over SCTP with reliable or unreliable mode

graph TD
    A["getUserMedia()
Camera + Mic"] --> B["MediaStream
Audio/Video Tracks"]
    B --> C["RTCPeerConnection
Encryption + ICE + DTLS"]
    C --> D{"NAT Traversal"}
    D -->|STUN succeeds| E["P2P Direct
~85% of cases"]
    D -->|STUN fails| F["TURN Relay
~15% of cases"]
    E --> G["Remote Peer
Receives stream"]
    F --> G
    C --> H["RTCDataChannel
Arbitrary data"]
    H --> G

    style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff

WebRTC architecture overview — from MediaStream to P2P connection

2. Signaling — The First Step WebRTC Doesn't Define

WebRTC intentionally does not specify a signaling protocol. This is by design — allowing developers to choose any transport channel that fits: WebSocket, HTTP long-polling, Firebase Realtime Database, or even email. Signaling does one thing: exchange the information needed for two peers to find each other and negotiate codecs.

The signaling process involves 3 main steps:

sequenceDiagram
    participant A as Peer A (Caller)
    participant S as Signaling Server
    participant B as Peer B (Callee)

    A->>S: 1. Create Offer (SDP)
    S->>B: Forward Offer
    B->>S: 2. Create Answer (SDP)
    S->>A: Forward Answer
    A->>S: 3. Send ICE Candidates
    S->>B: Forward ICE Candidates
    B->>S: Send ICE Candidates
    S->>A: Forward ICE Candidates
    A-->>B: P2P Connection Established!

Signaling flow — exchanging SDP and ICE Candidates through an intermediary server

SDP (Session Description Protocol) is a text format describing each peer's media capabilities: supported codecs (VP9, H.264, Opus), bandwidth, IP/port addresses. When Peer A creates an offer and Peer B responds with an answer, both sides have agreed on codec and encryption parameters.

Signaling Server Implementation Tips

For small apps (<1,000 concurrent users), a simple WebSocket server on Node.js or ASP.NET Core SignalR is sufficient. When scaling up, use Redis Pub/Sub as a message broker between signaling nodes to ensure all peers receive ICE candidates on time.

3. NAT Traversal — STUN, TURN and the ICE Framework

The biggest challenge for P2P is NAT (Network Address Translation). Most devices sit behind NAT routers without direct public IPs. WebRTC solves this with ICE (Interactive Connectivity Establishment) — a framework that tries all possible paths and selects the best one.

3.1 STUN — Discovering Your Public IP

STUN (Session Traversal Utilities for NAT) servers help clients discover their public IP and port mapping. The client sends a request to the STUN server, which responds with the public address it sees. This process is lightweight — just a few UDP packets. Google provides free STUN servers at stun:stun.l.google.com:19302.

STUN works with approximately 85% of standard NAT configurations (Full Cone, Restricted Cone, Port Restricted Cone). However, Symmetric NAT — common in enterprise networks — blocks STUN because each different destination gets NAT-mapped to a different port.

3.2 TURN — Relay When P2P Fails

TURN (Traversal Using Relays around NAT) is the fallback: all media passes through a TURN server as a relay. This consumes significant server bandwidth — each 720p video stream uses ~1.5 Mbps, doubled through relay — so TURN is only used when STUN fails.

TURN Costs Are Not Cheap

A TURN server handling 500 concurrent 1-on-1 video calls needs ~1.5 Gbps bandwidth. At average cloud pricing of $0.08/GB, bandwidth costs can reach $500–800/day. Always prioritize STUN and only fall back to TURN when necessary. Use coturn (open-source) and deploy close to users to reduce latency.

3.3 ICE — Finding the Best Path

The ICE framework collects all ICE candidates (possible connection addresses) from 3 sources: host candidates (local IP), server reflexive candidates (from STUN) and relay candidates (from TURN). ICE then performs connectivity checks in priority order — preferring direct P2P, falling back through TURN if needed.

graph LR
    A["ICE Agent"] --> B["Host Candidate
Local IP: 192.168.1.5:4532"]
    A --> C["Server Reflexive
STUN: 203.0.113.5:6789"]
    A --> D["Relay Candidate
TURN: 198.51.100.2:3478"]
    B --> E{"Connectivity
Check"}
    C --> E
    D --> E
    E -->|Priority 1| F["Direct P2P"]
    E -->|Priority 2| G["STUN-assisted P2P"]
    E -->|Priority 3| H["TURN Relay"]

    style A fill:#e94560,stroke:#fff,color:#fff
    style E fill:#ff9800,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff

ICE Framework — gathering candidates and selecting the optimal connection path

4. Media Pipeline — Codecs, Encryption and Adaptive Bitrate

4.1 Audio & Video Codecs

WebRTC mandates support for the following codecs:

Type	Mandatory Codec	Optional (Common)	Characteristics
Audio	Opus	G.711, iSAC	Opus: 6–510 kbps, adaptive bitrate, 48kHz. Best for voice + music
Video	VP8, H.264	VP9, AV1, H.265	VP9 saves 30-50% bandwidth vs VP8. AV1 newest but CPU-intensive encoding

4.2 Mandatory Encryption

All WebRTC connections are encrypted by default — there is no option to disable it. The encryption stack consists of:

DTLS (Datagram Transport Layer Security) — Handshake and key exchange, similar to TLS but for UDP
SRTP (Secure Real-time Transport Protocol) — Encrypts audio/video payload with AES-128
SCTP over DTLS — Encrypts data on RTCDataChannel

4.3 Adaptive Bitrate & Congestion Control

WebRTC uses the GCC (Google Congestion Control) algorithm to automatically adjust bitrate based on network conditions. When packet loss or increased latency is detected, the encoder reduces resolution/framerate/bitrate. When the network improves, quality automatically increases. This is why video calls sometimes go "blurry" for a few seconds then clear up — GCC at work.

Since 2025, browsers have supported Simulcast — sending multiple quality layers simultaneously (e.g., 1080p + 720p + 360p). The receiver or SFU selects the appropriate layer for current bandwidth, avoiding CPU-intensive re-encoding.

5. Production Architecture — P2P, SFU and MCU

Pure P2P only works well for 1-on-1 calls. With 3+ participants, the full mesh model (every peer connects to every other peer) doesn't scale — N participants need N×(N-1)/2 connections. With 10 people, each device must encode and send 9 separate streams.

graph TD
    subgraph "P2P Mesh — Max 4-5 people"
        P1["Peer 1"] <--> P2["Peer 2"]
        P1 <--> P3["Peer 3"]
        P2 <--> P3
    end

    subgraph "SFU — Hundreds of people"
        S1["Peer 1"] --> SFU["SFU Server
Forward streams"]
        S2["Peer 2"] --> SFU
        S3["Peer 3"] --> SFU
        S4["Peer N"] --> SFU
        SFU --> S1
        SFU --> S2
        SFU --> S3
        SFU --> S4
    end

    subgraph "MCU — Low bandwidth"
        M1["Peer 1"] --> MCU["MCU Server
Mix + Re-encode"]
        M2["Peer 2"] --> MCU
        M3["Peer 3"] --> MCU
        MCU --> M1
        MCU --> M2
        MCU --> M3
    end

    style SFU fill:#e94560,stroke:#fff,color:#fff
    style MCU fill:#2c3e50,stroke:#fff,color:#fff

Three WebRTC architectures: Mesh (P2P), SFU (Selective Forwarding) and MCU (Mixing)

5.1 SFU — The #1 Choice for Production in 2026

SFU (Selective Forwarding Unit) is the dominant architecture for WebRTC production. Each participant sends 1 stream to the SFU, which forwards it to all other participants — no decoding, no re-encoding. Advantages:

Low server CPU — only forwards packets, no media processing
Low latency — no intermediate decode/encode step
Scales well — each SFU node handles 500-1000 concurrent streams
Simulcast compatible — SFU selects appropriate layer for each receiver

5.2 MCU — When Client Bandwidth Is the Issue

MCU (Multipoint Control Unit) decodes all incoming streams, mixes them into a single layout, re-encodes and sends to each participant. Clients only receive 1 stream — saving downstream bandwidth. But MCU consumes massive server CPU and adds 200-500ms latency from decode/encode. MCU fits: weak IoT devices, 3G mobile connections, or recording/broadcasting.

5.3 Detailed SFU vs MCU Comparison

Criteria	SFU	MCU
Server CPU	Low (forward only)	Very high (decode + mix + encode)
Added Latency	~10-50ms	~200-500ms
Client Bandwidth (downstream)	High (receives N-1 streams)	Low (receives 1 stream)
Scale	Good — 500-1000 streams/node	Limited — 50-100 participants/node
Video Quality	Original (no re-encoding)	Reduced (through re-encoding)
Simulcast	Native support	Not needed (already mixed)
Best Use Case	Video conferencing, live streaming	IoT, legacy devices, recording

6. Open-Source SFUs — LiveKit, mediasoup and Janus

The three most popular open-source SFUs, each suited for different contexts:

6.1 LiveKit — Modern SFU Written in Go

LiveKit has emerged as the top choice for teams wanting to ship fast. Written in Go, leveraging goroutines for concurrent connections. Ships with SDKs for JavaScript, React, Swift, Kotlin, Flutter, Unity and server-side SDKs for Node.js, Python, Go, .NET. LiveKit includes signaling, room management and recording out of the box.

// LiveKit JavaScript Client — connect to room
import { Room, RoomEvent } from 'livekit-client';

const room = new Room();
await room.connect('wss://your-livekit-server.com', token);

room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
  const element = track.attach();
  document.getElementById('remote-video').appendChild(element);
});

// Publish local camera
const localTracks = await room.localParticipant.enableCameraAndMicrophone();

6.2 mediasoup — High-Performance SFU with C++ Core

mediasoup has its core written in C++ for optimal media processing performance, with a Node.js signaling layer. Worker-based architecture: each CPU core runs a Worker process, handling media routing for multiple rooms. mediasoup provides fine-grained control over every transport, producer and consumer — ideal for teams wanting deep customization.

6.3 Janus Gateway — Versatile Plugin Architecture

Janus is written in C, released in 2014, making it the oldest and most versatile SFU. Its plugin architecture enables extension: VideoRoom (SFU), AudioBridge (audio mixing), Streaming (one-to-many), SIP Gateway, Record/Play. Janus fits when you need integration with legacy VoIP/SIP systems.

Criteria	LiveKit	mediasoup	Janus
Language	Go	C++ (core) + Node.js	C
Setup	Fast — all-in-one SDK	Medium — build signaling yourself	Medium — choose plugins
Customization	Medium	Very high	High (plugin system)
Scalability	Built-in multi-node	Self-managed	Self-managed
.NET SDK	Yes (server-side)	No official support	No
Recording	Built-in (Egress)	Self-implement	Record/Play plugin
Best For	Startups, ship fast	Custom platforms, large scale	SIP/VoIP, legacy integration

7. Encoded Transform — True End-to-End Encryption

By default, WebRTC encrypts hop-by-hop with DTLS-SRTP — meaning SFU servers can see media in plaintext when forwarding. For sensitive applications (healthcare, finance), this isn't sufficient.

The WebRTC Encoded Transform API (W3C Working Draft, updated 02/2026) allows inserting a processing step into the pipeline between the encoder and packetizer. Developers can encrypt the payload with a private key before sending to the SFU — the SFU can only forward encrypted payload, unable to read content. This is true E2EE (End-to-End Encryption).

// Encoded Transform — encrypt frames before sending
const sender = peerConnection.getSenders()[0];
const senderStreams = sender.createEncodedStreams();
const transformStream = new TransformStream({
  transform(encodedFrame, controller) {
    // Encrypt payload with AES-GCM
    const encryptedData = encryptFrame(encodedFrame.data, sharedKey);
    encodedFrame.data = encryptedData;
    controller.enqueue(encodedFrame);
  }
});
senderStreams.readable
  .pipeThrough(transformStream)
  .pipeTo(senderStreams.writable);

Encoded Transform Browser Support (04/2026)

Chrome/Edge: full support since Chrome 86+. Firefox: implementing RTCRtpScriptTransform (latest spec). Safari: not yet supported — needs polyfill or fallback. If your target audience primarily uses Chrome/Edge (>75% market share), you can deploy E2EE today.

8. Production Deployment — Reference Architecture

Below is a production architecture for a video conferencing system supporting 10,000+ concurrent users:

graph TD
    Client["Client App
Vue.js + LiveKit SDK"] -->|WebSocket| LB["Load Balancer
Geographic DNS"]
    LB --> SIG["Signaling Cluster
3+ nodes"]
    SIG -->|Redis Pub/Sub| SIG
    SIG --> SFU1["SFU Node 1
Region: Asia"]
    SIG --> SFU2["SFU Node 2
Region: EU"]
    SIG --> SFU3["SFU Node 3
Region: US"]
    SFU1 --> TURN1["TURN Server
coturn — Asia"]
    SFU2 --> TURN2["TURN Server
coturn — EU"]
    SFU3 --> TURN3["TURN Server
coturn — US"]
    SFU1 --> REC["Recording Service
Egress to S3/R2"]
    SFU1 --> MON["Monitoring
Prometheus + Grafana"]

    style Client fill:#e94560,stroke:#fff,color:#fff
    style SIG fill:#2c3e50,stroke:#fff,color:#fff
    style SFU1 fill:#16213e,stroke:#fff,color:#fff
    style SFU2 fill:#16213e,stroke:#fff,color:#fff
    style SFU3 fill:#16213e,stroke:#fff,color:#fff
    style MON fill:#4CAF50,stroke:#fff,color:#fff

Multi-region production architecture for WebRTC — SFU cluster with geographic routing

8.1 Deployment Checklist

TURN server — Deploy coturn in each region, configure TLS (port 443) to bypass corporate firewalls
Bandwidth planning — Each 720p video participant: ~1.5 Mbps up + 1.5×(N-1) Mbps down (SFU mode). With simulcast: downstream drops 40-60%
Monitoring — Track metrics: ICE connection time, packet loss rate, bitrate adaptation, TURN usage percentage
Fallback strategy — When an SFU node is overloaded, redirect participants to another node. LiveKit has built-in load balancing
Recording — Use composite recording (MCU-style) for archives, or individual track recording for post-processing

8.2 Code Sample — Signaling Server with ASP.NET Core

// SignalingHub.cs — WebRTC signaling via SignalR
public class SignalingHub : Hub
{
    public async Task JoinRoom(string roomId)
    {
        await Groups.AddToGroupAsync(Context.ConnectionId, roomId);
        await Clients.OthersInGroup(roomId).SendAsync("user-joined", Context.ConnectionId);
    }

    public async Task SendOffer(string targetId, string sdp)
    {
        await Clients.Client(targetId).SendAsync("offer", Context.ConnectionId, sdp);
    }

    public async Task SendAnswer(string targetId, string sdp)
    {
        await Clients.Client(targetId).SendAsync("answer", Context.ConnectionId, sdp);
    }

    public async Task SendIceCandidate(string targetId, string candidate)
    {
        await Clients.Client(targetId).SendAsync("ice-candidate", Context.ConnectionId, candidate);
    }
}

// Client — Vue.js + WebRTC
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com:443', username: 'user', credential: 'pass' }
  ]
});

const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
stream.getTracks().forEach(track => pc.addTrack(track, stream));

pc.onicecandidate = ({ candidate }) => {
  if (candidate) signalR.invoke('SendIceCandidate', targetId, JSON.stringify(candidate));
};

// Create offer and send via signaling
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signalR.invoke('SendOffer', targetId, JSON.stringify(offer));

9. WebRTC Performance Optimization

9.1 Simulcast & SVC

Send multiple quality layers simultaneously, letting the SFU select the right layer for each receiver. Configuring simulcast in WebRTC:

const sender = pc.addTrack(videoTrack, stream);
const params = sender.getParameters();
params.encodings = [
  { rid: 'low', maxBitrate: 200000, scaleResolutionDownBy: 4 },
  { rid: 'mid', maxBitrate: 700000, scaleResolutionDownBy: 2 },
  { rid: 'high', maxBitrate: 2500000 }
];
await sender.setParameters(params);

9.2 Bandwidth Estimation

Use RTCPeerConnection.getStats() for real-time monitoring:

setInterval(async () => {
  const stats = await pc.getStats();
  stats.forEach(report => {
    if (report.type === 'outbound-rtp' && report.kind === 'video') {
      console.log(`Bitrate: ${report.bytesSent}, Frames: ${report.framesEncoded}`);
      console.log(`Quality limit: ${report.qualityLimitationReason}`);
    }
  });
}, 2000);

9.3 Network Quality Indicator

Measure packet loss and round-trip time to display a quality indicator for users:

Good: packet loss < 1%, RTT < 150ms
Fair: packet loss 1-5%, RTT 150-300ms
Poor: packet loss > 5%, RTT > 300ms → automatically reduce resolution

10. Real-World Use Cases Beyond Video Calls

WebRTC isn't just for video conferencing:

Screen Sharing — getDisplayMedia() API captures screen, window or specific tab
P2P File Transfer — RTCDataChannel enables direct file transfer between browsers, bypassing servers. Speed can reach 100+ Mbps on LAN
Cloud Gaming — Stream gameplay from server, receive input from client via DataChannel. Google Stadia (now shut down) and Xbox Cloud Gaming both used WebRTC
IoT & Robotics — Control robots/drones via DataChannel, receive video feed through media streams
Live Streaming — WHIP (WebRTC HTTP Ingestion Protocol) enables publishing live streams to CDN via WebRTC, replacing traditional RTMP

WHIP & WHEP — New Standards for Live Streaming

WHIP (WebRTC HTTP Ingestion Protocol) standardizes how to publish streams to servers. WHEP (WebRTC HTTP Egress Protocol) standardizes how viewers subscribe to streams. Both are already supported by Cloudflare Stream, AWS IVS and many CDNs. This is the future replacement for RTMP in live streaming with sub-second latency.

Conclusion

WebRTC has matured from a Google experiment into the standard platform for all real-time communication applications on the web. With SFU architecture, you can build video conferencing systems supporting thousands of concurrent users. The Encoded Transform API enables true end-to-end encryption. And with WHIP/WHEP, WebRTC is expanding into live streaming territory.

Key takeaways for deployment: start with LiveKit if you need to ship fast, mediasoup if you need deep customization, and Janus if you need legacy VoIP integration. Always deploy TURN servers in each region and monitor ICE connection metrics to ensure the best user experience.

References:
WebRTC Official — webrtc.org
MDN Web Docs — WebRTC API
W3C WebRTC Specification
W3C WebRTC Encoded Transform
LiveKit Documentation
mediasoup Documentation
Janus Gateway Documentation
BlogGeek.me — WebRTC Open Source Media Servers
V100.ai — Fastest WebRTC Server in 2026

#WebRTC #SFU #LiveKit #mediasoup #Real-Time Communication #system design #ASP.NET Core SignalR #Vue.js

# WebRTC — Building Peer-to-Peer Video Call Architecture in the Browser

Every day, billions of minutes of video calls happen on Google Meet, Zoom, Discord and hundreds of other apps — all running on the same foundation: **WebRTC**. This set of browser APIs enables direct peer-to-peer transmission of audio, video and arbitrary data without plugins, without Flash, without installing anything. This article dives deep into WebRTC architecture from network protocols to production deployment with SFU, helping you understand how to build large-scale real-time communication systems.

7.7B+ WebRTC minutes per week globally

<200ms Average P2P latency with STUN

98% Browser support for WebRTC (2026)

85% Connections succeed via STUN without TURN

## 1. What is WebRTC and How Does It Work?

**WebRTC** (Web Real-Time Communication) is a collection of W3C/IETF standard APIs and protocols that enable browsers and native apps to establish peer-to-peer connections for media and data transmission. Unlike the traditional client-server model, WebRTC allows two devices to communicate directly — reducing latency, saving server bandwidth and simplifying architecture for real-time use cases.

Three core WebRTC APIs in the browser:

- **MediaStream (getUserMedia)** — Access camera, microphone and screen capture
- **RTCPeerConnection** — Establish P2P connections, handle codecs, SRTP encryption and ICE candidate management
- **RTCDataChannel** — Arbitrary data channel (text, files, game state) over SCTP with reliable or unreliable mode

```
graph TD
    A["getUserMedia()  
Camera + Mic"] --> B["MediaStream  
Audio/Video Tracks"]
    B --> C["RTCPeerConnection  
Encryption + ICE + DTLS"]
    C --> D{"NAT Traversal"}
    D -->|STUN succeeds| E["P2P Direct  
~85% of cases"]
    D -->|STUN fails| F["TURN Relay  
~15% of cases"]
    E --> G["Remote Peer  
Receives stream"]
    F --> G
    C --> H["RTCDataChannel  
Arbitrary data"]
    H --> G

style A fill:#e94560,stroke:#fff,color:#fff
    style C fill:#2c3e50,stroke:#fff,color:#fff
    style G fill:#4CAF50,stroke:#fff,color:#fff
    style D fill:#ff9800,stroke:#fff,color:#fff

```
WebRTC architecture overview — from MediaStream to P2P connection

## 2. Signaling — The First Step WebRTC Doesn't Define

WebRTC intentionally does **not** specify a signaling protocol. This is by design — allowing developers to choose any transport channel that fits: WebSocket, HTTP long-polling, Firebase Realtime Database, or even email. Signaling does one thing: exchange the information needed for two peers to find each other and negotiate codecs.

The signaling process involves 3 main steps:

```
sequenceDiagram
    participant A as Peer A (Caller)
    participant S as Signaling Server
    participant B as Peer B (Callee)

A->>S: 1. Create Offer (SDP)
    S->>B: Forward Offer
    B->>S: 2. Create Answer (SDP)
    S->>A: Forward Answer
    A->>S: 3. Send ICE Candidates
    S->>B: Forward ICE Candidates
    B->>S: Send ICE Candidates
    S->>A: Forward ICE Candidates
    A-->>B: P2P Connection Established!

```
Signaling flow — exchanging SDP and ICE Candidates through an intermediary server

**SDP (Session Description Protocol)** is a text format describing each peer's media capabilities: supported codecs (VP9, H.264, Opus), bandwidth, IP/port addresses. When Peer A creates an `offer` and Peer B responds with an `answer`, both sides have agreed on codec and encryption parameters.

#### Signaling Server Implementation Tips

## 3. NAT Traversal — STUN, TURN and the ICE Framework

The biggest challenge for P2P is **NAT (Network Address Translation)**. Most devices sit behind NAT routers without direct public IPs. WebRTC solves this with **ICE (Interactive Connectivity Establishment)** — a framework that tries all possible paths and selects the best one.

### 3.1 STUN — Discovering Your Public IP

**STUN (Session Traversal Utilities for NAT)** servers help clients discover their public IP and port mapping. The client sends a request to the STUN server, which responds with the public address it sees. This process is lightweight — just a few UDP packets. Google provides free STUN servers at `stun:stun.l.google.com:19302`.

STUN works with approximately 85% of standard NAT configurations (Full Cone, Restricted Cone, Port Restricted Cone). However, **Symmetric NAT** — common in enterprise networks — blocks STUN because each different destination gets NAT-mapped to a different port.

### 3.2 TURN — Relay When P2P Fails

**TURN (Traversal Using Relays around NAT)** is the fallback: all media passes through a TURN server as a relay. This consumes significant server bandwidth — each 720p video stream uses ~1.5 Mbps, doubled through relay — so TURN is only used when STUN fails.

#### TURN Costs Are Not Cheap

A TURN server handling 500 concurrent 1-on-1 video calls needs ~1.5 Gbps bandwidth. At average cloud pricing of $0.08/GB, bandwidth costs can reach $500–800/day. Always prioritize STUN and only fall back to TURN when necessary. Use **coturn** (open-source) and deploy close to users to reduce latency.

### 3.3 ICE — Finding the Best Path

The ICE framework collects all **ICE candidates** (possible connection addresses) from 3 sources: host candidates (local IP), server reflexive candidates (from STUN) and relay candidates (from TURN). ICE then performs connectivity checks in priority order — preferring direct P2P, falling back through TURN if needed.

```
graph LR
    A["ICE Agent"] --> B["Host Candidate  
Local IP: 192.168.1.5:4532"]
    A --> C["Server Reflexive  
STUN: 203.0.113.5:6789"]
    A --> D["Relay Candidate  
TURN: 198.51.100.2:3478"]
    B --> E{"Connectivity  
Check"}
    C --> E
    D --> E
    E -->|Priority 1| F["Direct P2P"]
    E -->|Priority 2| G["STUN-assisted P2P"]
    E -->|Priority 3| H["TURN Relay"]

style A fill:#e94560,stroke:#fff,color:#fff
    style E fill:#ff9800,stroke:#fff,color:#fff
    style F fill:#4CAF50,stroke:#fff,color:#fff

```
ICE Framework — gathering candidates and selecting the optimal connection path

## 4. Media Pipeline — Codecs, Encryption and Adaptive Bitrate

### 4.1 Audio & Video Codecs

WebRTC mandates support for the following codecs:

| Type | Mandatory Codec | Optional (Common) | Characteristics |
| --- | --- | --- | --- |
| **Audio** | Opus | G.711, iSAC | Opus: 6–510 kbps, adaptive bitrate, 48kHz. Best for voice + music |
| **Video** | VP8, H.264 | VP9, AV1, H.265 | VP9 saves 30-50% bandwidth vs VP8. AV1 newest but CPU-intensive encoding |

### 4.2 Mandatory Encryption

All WebRTC connections are **encrypted by default** — there is no option to disable it. The encryption stack consists of:

- **DTLS (Datagram Transport Layer Security)** — Handshake and key exchange, similar to TLS but for UDP
- **SRTP (Secure Real-time Transport Protocol)** — Encrypts audio/video payload with AES-128
- **SCTP over DTLS** — Encrypts data on RTCDataChannel

### 4.3 Adaptive Bitrate & Congestion Control

WebRTC uses the **GCC (Google Congestion Control)** algorithm to automatically adjust bitrate based on network conditions. When packet loss or increased latency is detected, the encoder reduces resolution/framerate/bitrate. When the network improves, quality automatically increases. This is why video calls sometimes go "blurry" for a few seconds then clear up — GCC at work.

Since 2025, browsers have supported **Simulcast** — sending multiple quality layers simultaneously (e.g., 1080p + 720p + 360p). The receiver or SFU selects the appropriate layer for current bandwidth, avoiding CPU-intensive re-encoding.

## 5. Production Architecture — P2P, SFU and MCU

Pure P2P only works well for 1-on-1 calls. With 3+ participants, the **full mesh** model (every peer connects to every other peer) doesn't scale — N participants need N×(N-1)/2 connections. With 10 people, each device must encode and send 9 separate streams.

```
graph TD
    subgraph "P2P Mesh — Max 4-5 people"
        P1["Peer 1"] <--> P2["Peer 2"]
        P1 <--> P3["Peer 3"]
        P2 <--> P3
    end

subgraph "SFU — Hundreds of people"
        S1["Peer 1"] --> SFU["SFU Server  
Forward streams"]
        S2["Peer 2"] --> SFU
        S3["Peer 3"] --> SFU
        S4["Peer N"] --> SFU
        SFU --> S1
        SFU --> S2
        SFU --> S3
        SFU --> S4
    end

subgraph "MCU — Low bandwidth"
        M1["Peer 1"] --> MCU["MCU Server  
Mix + Re-encode"]
        M2["Peer 2"] --> MCU
        M3["Peer 3"] --> MCU
        MCU --> M1
        MCU --> M2
        MCU --> M3
    end

style SFU fill:#e94560,stroke:#fff,color:#fff
    style MCU fill:#2c3e50,stroke:#fff,color:#fff

```
Three WebRTC architectures: Mesh (P2P), SFU (Selective Forwarding) and MCU (Mixing)

### 5.1 SFU — The #1 Choice for Production in 2026

**SFU (Selective Forwarding Unit)** is the dominant architecture for WebRTC production. Each participant sends 1 stream to the SFU, which forwards it to all other participants — **no decoding, no re-encoding**. Advantages:

- **Low server CPU** — only forwards packets, no media processing
- **Low latency** — no intermediate decode/encode step
- **Scales well** — each SFU node handles 500-1000 concurrent streams
- **Simulcast compatible** — SFU selects appropriate layer for each receiver

### 5.2 MCU — When Client Bandwidth Is the Issue

**MCU (Multipoint Control Unit)** decodes all incoming streams, mixes them into a single layout, re-encodes and sends to each participant. Clients only receive 1 stream — saving downstream bandwidth. But MCU consumes massive server CPU and adds 200-500ms latency from decode/encode. MCU fits: weak IoT devices, 3G mobile connections, or recording/broadcasting.

### 5.3 Detailed SFU vs MCU Comparison

| Criteria | SFU | MCU |
| --- | --- | --- |
| **Server CPU** | Low (forward only) | Very high (decode + mix + encode) |
| **Added Latency** | ~10-50ms | ~200-500ms |
| **Client Bandwidth (downstream)** | High (receives N-1 streams) | Low (receives 1 stream) |
| **Scale** | Good — 500-1000 streams/node | Limited — 50-100 participants/node |
| **Video Quality** | Original (no re-encoding) | Reduced (through re-encoding) |
| **Simulcast** | Native support | Not needed (already mixed) |
| **Best Use Case** | Video conferencing, live streaming | IoT, legacy devices, recording |

## 6. Open-Source SFUs — LiveKit, mediasoup and Janus

The three most popular open-source SFUs, each suited for different contexts:

### 6.1 LiveKit — Modern SFU Written in Go

**LiveKit** has emerged as the top choice for teams wanting to ship fast. Written in Go, leveraging goroutines for concurrent connections. Ships with SDKs for JavaScript, React, Swift, Kotlin, Flutter, Unity and server-side SDKs for Node.js, Python, Go, .NET. LiveKit includes signaling, room management and recording out of the box.

```typescript
// LiveKit JavaScript Client — connect to room
import { Room, RoomEvent } from 'livekit-client';

const room = new Room();
await room.connect('wss://your-livekit-server.com', token);

room.on(RoomEvent.TrackSubscribed, (track, publication, participant) => {
  const element = track.attach();
  document.getElementById('remote-video').appendChild(element);
});

// Publish local camera
const localTracks = await room.localParticipant.enableCameraAndMicrophone();

```

### 6.2 mediasoup — High-Performance SFU with C++ Core

**mediasoup** has its core written in C++ for optimal media processing performance, with a Node.js signaling layer. Worker-based architecture: each CPU core runs a Worker process, handling media routing for multiple rooms. mediasoup provides fine-grained control over every transport, producer and consumer — ideal for teams wanting deep customization.

### 6.3 Janus Gateway — Versatile Plugin Architecture

**Janus** is written in C, released in 2014, making it the oldest and most versatile SFU. Its plugin architecture enables extension: VideoRoom (SFU), AudioBridge (audio mixing), Streaming (one-to-many), SIP Gateway, Record/Play. Janus fits when you need integration with legacy VoIP/SIP systems.

| Criteria | LiveKit | mediasoup | Janus |
| --- | --- | --- | --- |
| **Language** | Go | C++ (core) + Node.js | C |
| **Setup** | Fast — all-in-one SDK | Medium — build signaling yourself | Medium — choose plugins |
| **Customization** | Medium | Very high | High (plugin system) |
| **Scalability** | Built-in multi-node | Self-managed | Self-managed |
| **.NET SDK** | Yes (server-side) | No official support | No |
| **Recording** | Built-in (Egress) | Self-implement | Record/Play plugin |
| **Best For** | Startups, ship fast | Custom platforms, large scale | SIP/VoIP, legacy integration |

## 7. Encoded Transform — True End-to-End Encryption

By default, WebRTC encrypts hop-by-hop with DTLS-SRTP — meaning SFU servers can see media in plaintext when forwarding. For sensitive applications (healthcare, finance), this isn't sufficient.

The **WebRTC Encoded Transform API** (W3C Working Draft, updated 02/2026) allows inserting a processing step into the pipeline between the encoder and packetizer. Developers can encrypt the payload with a private key before sending to the SFU — the SFU can only forward encrypted payload, unable to read content. This is **true E2EE (End-to-End Encryption)**.

```javascript
// Encoded Transform — encrypt frames before sending
const sender = peerConnection.getSenders()[0];
const senderStreams = sender.createEncodedStreams();
const transformStream = new TransformStream({
  transform(encodedFrame, controller) {
    // Encrypt payload with AES-GCM
    const encryptedData = encryptFrame(encodedFrame.data, sharedKey);
    encodedFrame.data = encryptedData;
    controller.enqueue(encodedFrame);
  }
});
senderStreams.readable
  .pipeThrough(transformStream)
  .pipeTo(senderStreams.writable);

```

#### Encoded Transform Browser Support (04/2026)

Chrome/Edge: full support since Chrome 86+. Firefox: implementing `RTCRtpScriptTransform` (latest spec). Safari: not yet supported — needs polyfill or fallback. If your target audience primarily uses Chrome/Edge (>75% market share), you can deploy E2EE today.

## 8. Production Deployment — Reference Architecture

Below is a production architecture for a video conferencing system supporting 10,000+ concurrent users:

```
graph TD
    Client["Client App  
Vue.js + LiveKit SDK"] -->|WebSocket| LB["Load Balancer  
Geographic DNS"]
    LB --> SIG["Signaling Cluster  
3+ nodes"]
    SIG -->|Redis Pub/Sub| SIG
    SIG --> SFU1["SFU Node 1  
Region: Asia"]
    SIG --> SFU2["SFU Node 2  
Region: EU"]
    SIG --> SFU3["SFU Node 3  
Region: US"]
    SFU1 --> TURN1["TURN Server  
coturn — Asia"]
    SFU2 --> TURN2["TURN Server  
coturn — EU"]
    SFU3 --> TURN3["TURN Server  
coturn — US"]
    SFU1 --> REC["Recording Service  
Egress to S3/R2"]
    SFU1 --> MON["Monitoring  
Prometheus + Grafana"]

style Client fill:#e94560,stroke:#fff,color:#fff
    style SIG fill:#2c3e50,stroke:#fff,color:#fff
    style SFU1 fill:#16213e,stroke:#fff,color:#fff
    style SFU2 fill:#16213e,stroke:#fff,color:#fff
    style SFU3 fill:#16213e,stroke:#fff,color:#fff
    style MON fill:#4CAF50,stroke:#fff,color:#fff

```
Multi-region production architecture for WebRTC — SFU cluster with geographic routing

### 8.1 Deployment Checklist

- **TURN server** — Deploy coturn in each region, configure TLS (port 443) to bypass corporate firewalls
- **Bandwidth planning** — Each 720p video participant: ~1.5 Mbps up + 1.5×(N-1) Mbps down (SFU mode). With simulcast: downstream drops 40-60%
- **Monitoring** — Track metrics: ICE connection time, packet loss rate, bitrate adaptation, TURN usage percentage
- **Fallback strategy** — When an SFU node is overloaded, redirect participants to another node. LiveKit has built-in load balancing
- **Recording** — Use composite recording (MCU-style) for archives, or individual track recording for post-processing

### 8.2 Code Sample — Signaling Server with ASP.NET Core

```csharp
// SignalingHub.cs — WebRTC signaling via SignalR
public class SignalingHub : Hub
{
    public async Task JoinRoom(string roomId)
    {
        await Groups.AddToGroupAsync(Context.ConnectionId, roomId);
        await Clients.OthersInGroup(roomId).SendAsync("user-joined", Context.ConnectionId);
    }

public async Task SendOffer(string targetId, string sdp)
    {
        await Clients.Client(targetId).SendAsync("offer", Context.ConnectionId, sdp);
    }

public async Task SendAnswer(string targetId, string sdp)
    {
        await Clients.Client(targetId).SendAsync("answer", Context.ConnectionId, sdp);
    }

public async Task SendIceCandidate(string targetId, string candidate)
    {
        await Clients.Client(targetId).SendAsync("ice-candidate", Context.ConnectionId, candidate);
    }
}

```

```javascript
// Client — Vue.js + WebRTC
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com:443', username: 'user', credential: 'pass' }
  ]
});

const stream = await navigator.mediaDevices.getUserMedia({ video: true, audio: true });
stream.getTracks().forEach(track => pc.addTrack(track, stream));

pc.onicecandidate = ({ candidate }) => {
  if (candidate) signalR.invoke('SendIceCandidate', targetId, JSON.stringify(candidate));
};

// Create offer and send via signaling
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signalR.invoke('SendOffer', targetId, JSON.stringify(offer));

```

## 9. WebRTC Performance Optimization

### 9.1 Simulcast & SVC

Send multiple quality layers simultaneously, letting the SFU select the right layer for each receiver. Configuring simulcast in WebRTC:

```javascript
const sender = pc.addTrack(videoTrack, stream);
const params = sender.getParameters();
params.encodings = [
  { rid: 'low', maxBitrate: 200000, scaleResolutionDownBy: 4 },
  { rid: 'mid', maxBitrate: 700000, scaleResolutionDownBy: 2 },
  { rid: 'high', maxBitrate: 2500000 }
];
await sender.setParameters(params);

```

### 9.2 Bandwidth Estimation

Use `RTCPeerConnection.getStats()` for real-time monitoring:

```javascript
setInterval(async () => {
  const stats = await pc.getStats();
  stats.forEach(report => {
    if (report.type === 'outbound-rtp' && report.kind === 'video') {
      console.log(`Bitrate: ${report.bytesSent}, Frames: ${report.framesEncoded}`);
      console.log(`Quality limit: ${report.qualityLimitationReason}`);
    }
  });
}, 2000);

```

### 9.3 Network Quality Indicator

Measure packet loss and round-trip time to display a quality indicator for users:

- **Good**: packet loss < 1%, RTT < 150ms
- **Fair**: packet loss 1-5%, RTT 150-300ms
- **Poor**: packet loss > 5%, RTT > 300ms → automatically reduce resolution

## 10. Real-World Use Cases Beyond Video Calls

WebRTC isn't just for video conferencing:

- **Screen Sharing** — `getDisplayMedia()` API captures screen, window or specific tab
- **P2P File Transfer** — RTCDataChannel enables direct file transfer between browsers, bypassing servers. Speed can reach 100+ Mbps on LAN
- **Cloud Gaming** — Stream gameplay from server, receive input from client via DataChannel. Google Stadia (now shut down) and Xbox Cloud Gaming both used WebRTC
- **IoT & Robotics** — Control robots/drones via DataChannel, receive video feed through media streams
- **Live Streaming** — WHIP (WebRTC HTTP Ingestion Protocol) enables publishing live streams to CDN via WebRTC, replacing traditional RTMP

#### WHIP & WHEP — New Standards for Live Streaming

**WHIP** (WebRTC HTTP Ingestion Protocol) standardizes how to publish streams to servers. **WHEP** (WebRTC HTTP Egress Protocol) standardizes how viewers subscribe to streams. Both are already supported by Cloudflare Stream, AWS IVS and many CDNs. This is the future replacement for RTMP in live streaming with sub-second latency.

## Conclusion

**References:**  
[WebRTC Official — webrtc.org](https://webrtc.org/)  
[MDN Web Docs — WebRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API)  
[W3C WebRTC Specification](https://www.w3.org/TR/webrtc/)  
[W3C WebRTC Encoded Transform](https://www.w3.org/TR/webrtc-encoded-transform/)  
[LiveKit Documentation](https://docs.livekit.io/)  
[mediasoup Documentation](https://mediasoup.org/documentation/)  
[Janus Gateway Documentation](https://janus.conf.meetecho.com/docs/)  
[BlogGeek.me — WebRTC Open Source Media Servers](https://bloggeek.me/webrtc-open-source-media-servers-github-2024/)  
[V100.ai — Fastest WebRTC Server in 2026](https://v100.ai/blog/fastest-webrtc-server-2026.html)

NATS JetStream — Ultra-Lightweight Messaging for Event-Driven Microservices

Neon Serverless Postgres — Storage-Compute Separation with Git-like Database Branching

Disclaimer: The opinions expressed in this blog are solely my own and do not reflect the views or opinions of my employer or any affiliated organizations. The content provided is for informational and educational purposes only and should not be taken as professional advice. While I strive to provide accurate and up-to-date information, I make no warranties or guarantees about the completeness, reliability, or accuracy of the content. Readers are encouraged to verify the information and seek independent advice as needed. I disclaim any liability for decisions or actions taken based on the content of this blog.