What Is Webrtc
WebRTC is the browser-native framework for real-time audio, video and data. This guide explains what WebRTC actually means in production — precise latency thresholds, when to choose it, architecture budgets, configuration targets, at least three production recipes, common pitfalls and a rollout checklist you can apply to video calls and webinars. For this workflow, teams usually start with Paywall & access and combine it with Player & embed. If you need a step by step follow-up, read Free Cdn. If you need a step by step follow-up, read Vimeo Pro. If you need a step by step follow-up, read Vimeo Pricing. If you need a step by step follow-up, read Video Hosting. If you need a step by step follow-up, read Video Platforms. If you need a step by step follow-up, read Rtmp. If you need a step by step follow-up, read Obs. If you need a step by step follow-up, read Drm Protected. If you need a step by step follow-up, read Akamai Cdn. If you need a step by step follow-up, read Free Video Hosting. If you need a step by step follow-up, read Aws Elastic Ip. If you need a step by step follow-up, read Live Streaming Software. If you need a step by step follow-up, read Html Video Player. If you need a step by step follow-up, read Video Sharing Platforms. If you need a step by step follow-up, read Upload Video.
What it means
WebRTC is not a single protocol but a stack you get in browsers and many native SDKs: getUserMedia to capture, RTCPeerConnection for RTP transport and negotiation, RTCDataChannel for arbitrary data, and ICE (STUN/TURN) for NAT traversal. Media is encrypted (DTLS + SRTP) and delivered over RTP/RTCP. Typical codecs used are Opus for audio and VP8/VP9 or H.264 for video. WebRTC includes congestion control, retransmission (RTX), and packet repair (NACK/FEC) mechanisms.
Practical thresholds you should use when planning systems that rely on WebRTC:
- Interactive (real-time): end-to-end latency < 500 ms — expectation for 1:1 calls, small meetings where lip-sync and instant feedback matter.
- Near-real-time: 500 ms – 2 s — acceptable for moderated Q&A, some webinar scenarios, or where sub-second is nice but not mandatory.
- Live broadcast: > 2 s — use CDNs/HLS/LL-HLS when you need thousands of simultaneous viewers and can tolerate seconds of latency.
Typical observed end-to-end (capture → render) values in production:
- Good LAN/wired: 50–150 ms one-way (100–300 ms round-trip).
- Good mobile (4G Wi‑Fi): 100–300 ms one-way.
- Poor mobile/3G/high-loss: 300–1000+ ms one-way unless you adjust bitrate and buffers.
Decision guide
Use this quick decision guide to pick WebRTC vs other streaming options and to choose a topology.
- If the primary requirement is sub-500 ms interactive audio/video (calls, collaboration, gaming): use WebRTC.
- If you need very large audiences (hundreds of thousands) where two-way interactivity is not required for all viewers: ingest with WebRTC (for hosts/panelists), then restream to CDN as HLS/LL-HLS for the audience.
- For small groups (up to ~4 participants) a mesh (peer-to-peer) architecture may be acceptable. For larger groups choose an SFU (Selective Forwarding Unit).
- Audience-size thresholds (rule-of-thumb):
- Mesh: up to 3–4 participants (each peer uploads N−1 streams).
- SFU: up to tens or low hundreds of active participants depending on server capacity and whether you use simulcast/SVC.
- SFU + CDN/transcoding: for thousands to millions of passive viewers (host uses WebRTC ingest; audience served via HLS/LL-HLS or CDN).
- If firewall/NAT traversal is a concern, plan TURN capacity — WebRTC will fall back to TURN (relayed media) if direct UDP/TCP paths fail.
Latency budget / architecture budget
Design latency budgets per role and topology so you can reason about where to optimize.
Breakdown of typical components (milliseconds):
- Capture (camera/microphone + OS): 10–30 ms
- Encode (hardware encoder): 10–40 ms; (software encoder): 30–120 ms
- Packetization, encryption, RTCP timing: 5–20 ms
- Network (one-way): depends on RTT/peering — 20–200+ ms (but plan for typical 30–100 ms on modern broadband)
- Jitter buffer in receiver: adaptive, typically 50–200 ms
- Decode + render: 10–40 ms
Examples of architecture budgets:
- Ultra-low WebRTC call target: 150 ms end-to-end
- Capture 20 ms, Encode 30 ms, Network 50 ms one-way, Decode+Render 20 ms, Jitter buffer 30 ms
- Practical production target: 250–350 ms end-to-end (more conservative for mobile and NATs)
- Capture 20 ms, Encode 50 ms, Network 100 ms, Jitter buffer 50 ms, Decode 30 ms
- Webinar using WebRTC ingest + CDN (LL-HLS): 1–3 s depending on part size and CDN propagation
When you target a latency number, budget each component and measure. If network jitter or packet loss dominates, increase retransmission and FEC budget or consider lowering resolution/bitrate.
Practical recipes
The recipes below are production-ready blueprints. Each recipe lists components, configuration targets, and quick reasons why this shape works.
Recipe A — One-to-one browser call (P2P)
- When to use: 1:1 calls, telepresence, support chats where low latency and minimal infra are priorities.
- Components:
- Browser A & Browser B with RTCPeerConnection
- STUN server list + TURN server(s) for fallback
- Signaling server (WebSocket or HTTPS) for SDP and ICE exchange
- Configuration targets:
- getUserMedia: video 1280x720 @30fps if device allows; else scale down to 640x360
- video maxBitrate: 1.5–3 Mbps at 720p, 300–700 kbps at 360p
- audio: Opus, 24–48 kbps for speech
- keyframe interval: 1 second (force keyframe on resolution/quality change)
- MTU / packet payload: target ~1,200 bytes to avoid fragmentation over the internet
- Operational notes:
- Provide at least two geographically distributed TURNs to avoid regional failures.
- Make ICE restarts cheap and fast for network changes (early detection of candidate-pair failures).
Recipe B — Small-group meetings (SFU)
- When to use: 3–50 participants where each person needs to see multiple video streams with low latency.
- Components:
- Clients publish single uplink to an SFU (no mixing), SFU forwards streams selectively.
- Signaling + server-side logic to select which tracks/encodings each subscriber receives.
- Optional transcoders for recording or legacy codecs.
- Configuration targets:
- Enable simulcast with 3 layers (low/med/high) or SVC if supported by clients.
- Layer bitrates (example): low 150–250 kbps (180p), med 400–800 kbps (360p), high 1.5–3 Mbps (720p).
- Use server-side bandwidth estimation and active layer switching on the SFU.
- Ensure SFU handles RTCP feedback (NACK/PLI) and pacing to prevent bursts.
- Operational notes:
- Plan SFU CPU: if forwarding decoded frames (MCU-like), CPU rises quickly. Prefer selective forwarding without decoding.
- Design for 95th-percentile load: factor peak concurrent streams, and allow headroom (20–30%).
Recipe C — Interactive webinar (panelists via WebRTC, audience via CDN)
- When to use: a small set of interactive hosts and large passive audiences (hundreds to millions).
- Components:
- Hosts publish via WebRTC to an SFU/ingest.
- SFU or dedicated transcoder converts host streams to H.264/AAC and produces LL-HLS (CMAF parts) or HLS for the CDN.
- CDN (Global) serves viewers. Use LL-HLS with part sizes 200–600 ms for 1–3 s viewer latency.
- Optional: CDN origin at the SFU for back-to-back scaling.
- Configuration targets:
- Ingest: WebRTC with 720p @ 2–4 Mbps for panelists.
- Transcoding: output 3 CDN renditions, e.g., 360p@700 kbps, [email protected] Mbps, 1080p@5 Mbps.
- LL-HLS part size: 200–400 ms to balance player startup vs CDN overhead; aim for 1–3 s viewer latency end-to-end.
- Operational notes:
- TURN relaying for hosts increases operational cost; prefer TURN only when needed.
- Record hosts at the SFU before transcoding to preserve maximum fidelity for VOD.
Practical configuration targets
Use these measurable defaults as starting points in production. Tune per use-case and monitor.
- Audio
- Codec: Opus
- Bitrate (speech): 16–48 kbps; (music/high-fidelity): 64–128 kbps
- Sample rate: 48 kHz
- Frame size: 20 ms typical
- Video
- Codec priorities: VP8 or H.264 for wide compatibility; VP9 where supported; AV1 for experimental/high-compression if client support exists.
- Resolutions and bitrate (production targets):
- 360p (640x360) @ 15–30 fps: 300–700 kbps
- 480p (854x480) @ 30 fps: 500–1200 kbps
- 720p (1280x720) @ 30 fps: 1.5–3 Mbps
- 1080p (1920x1080) @ 30 fps: 3–6 Mbps
- Keyframe (IDR) interval: 1 second
- Packet payload target: <= 1200 bytes
- Enable NACK and RTX; enable FEC in lossy networks if supported (overhead ~10–20%).
- Transport & network
- MTU: target 1200 bytes to avoid IP fragmentation through the internet
- Jitter buffer: adaptive, 50–200 ms depending on target latency; increase on lossy networks
- Congestion control: use built-in WebRTC BWE/TWCC behavior and monitor bitrate estimates
- Packet loss tolerance: target <1% for high quality; allow up to 3–5% with FEC/Retransmit
- TURN capacity planning
- Estimate outbound TURN bandwidth = sum of client uplinks that are relayed. Example: 50 concurrent hosts at 2 Mbps each → 100 Mbps egress; allow 20–30% headroom.
- Use geographically distributed TURNs and autoscaling for peaks.
Limitations and trade-offs
WebRTC gives low latency and wide browser support but has trade-offs you must account for:
- Scalability vs latency: pure WebRTC scales poorly for very large audiences without a server-side conversion to CDN formats.
- TURN costs: when clients cannot NAT-traverse, relayed media doubles bandwidth cost and adds latency.
- Codec support variability: mobile devices and Safari may prefer H.264 hardware; Chrome/Firefox prefer VP8/VP9. Negotiate conservatively.
- Firewall restrictions: some enterprise/firewalled environments allow only TCP/443 — expect higher latency and overhead when WebRTC falls back to TCP or TLS relays.
- Battery/CPU: mobile software encoding at high resolution is CPU intensive — consider 720p as the upper practical limit for many mobile uplinks.
- No delivery guarantees: WebRTC uses best-effort UDP; it adapts to loss but cannot guarantee perfectly ordered, lossless delivery like TCP.
Common mistakes and fixes
- Mistake: Not providing TURN. Symptom: calls fail behind some NATs/firewalls. Fix: provide at least one TURN per region, monitor TURN usage, autoscale.
- Mistake: No simulcast or SVC for SFU. Symptom: SFU has to transcode or send identical streams, wasting bandwidth. Fix: enable simulcast (3 layers) and have the SFU pick the right layer per subscriber.
- Mistake: MTU too large causing fragmentation and packet loss. Fix: limit RTP payload to ~1200 bytes.
- Mistake: Keyframe interval too long (e.g., 5s). Symptom: slow recovery after packet loss and long stalls. Fix: use 1s keyframe interval and request keyframes on significant packet loss or resolution change.
- Mistake: Ignoring getStats. Symptom: blind to real network conditions. Fix: poll RTCPeerConnection.getStats() and store metrics: bytesSent/Received, packetsLost, jitter, currentRoundTripTime.
- Mistake: No graceful degradation. Symptom: frozen video under congestion. Fix: implement encoder adaptation (reduce resolution/fps, lower bitrate) and server-side layer switching.
- Mistake: Recording from edge clients. Symptom: heterogeneous quality and missing tracks. Fix: record at the server side (SFU/ingest) to guarantee consistent, full-quality recordings.
Rollout checklist
Before you go live, run this checklist in staging and monitor continuously after launch.
- Infrastructure
- Deploy STUN/TURN in multiple regions. Verify connectivity from target networks (corporate, carrier-grade NATs).
- Capacity plan SFU/ingest/transcoder with 20–30% headroom for peak concurrency.
- Quality & compatibility testing
- Test on Chrome, Firefox, Edge, Safari (desktop + iOS/Android web views where applicable).
- Test on representative networks: wired, Wi‑Fi, LTE, congested mobile with packet loss/emulated jitter.
- Metrics & monitoring
- Collect RTCP metrics (bitrate estimates, packet loss, RTT, jitter) and application metrics (join time, reconnections, CPU, memory).
- Set alerts for packet loss > 2–5%, RTT > 300 ms, and high TURN egress cost.
- Fallbacks & UX
- Implement fallback to HLS/LL-HLS for large audiences and poor networks.
- Show clear user messages when bandwidth is low and suggest switching to audio-only.
- Security & compliance
- Ensure DTLS-SRTP is used and keys are rotated as needed.
- If recording, implement consent flows and secure storage.
Example architectures
Concrete examples with components and expected latency ranges.
1) P2P one-to-one
- Clients <--> direct ICE connection (STUN + TURN fallback)
- Signaling server: minimal (SDP exchange)
- Expected latency: 100–300 ms end-to-end on good networks
- When to choose: low infra cost, very low participant counts
2) SFU-based meeting
- Clients → SFU (publish once) → SFU forwards selected encodings → Clients
- Optional recording at SFU and optional transcode for legacy viewers
- Expected latency: 150–400 ms depending on server placement and encoder latency
- When to choose: efficient scaling to dozens/hundreds of active participants
3) Webinar — WebRTC ingest + CDN
- Hosts publish via WebRTC → SFU/ingest → Transcoder → CDN (LL-HLS or HLS)
- Audience consumes via CDN HTML5 players
- Expected latency: 1–3 s with LL-HLS (dependent on part size and CDN)
- When to choose: interactive hosts + massive audience
4) Global low-latency distribution
- Edge SFUs in regions; peered distribution between SFUs for lower network hops; origin for recordings and VOD
- Expected latency: 200–500 ms for regional calls, 500–1000+ ms across hemispheres depending on routing
- When to choose: enterprise/telepresence with strict regional performance requirements
For practical implementation you can map these architectures to products and services such as a managed Video API, SFU hosting and TURN services. See our product pages for managed Video APIs, solutions and pricing for production deployments: Products – Video API, Products, Pricing.
For engineering reference and step-by-step setup see the docs: Docs – Getting started, Docs – WebRTC, Docs – Latency optimization.
Troubleshooting quick wins
If you see quality problems in production, try these quick checks in order. Each item is actionable and typically resolves 70–90% of common issues.
- Check ICE connectivity and TURN usage
- Use the browser's getStats() and check candidatePair states. If many sessions are using TURN, verify TURN capacity and region placement.
- Inspect RTCP stats
- Look at packetsLost, jitter, and currentRoundTripTime. If packetsLost > 1–2%, reduce bitrate or enable FEC/NACK policies.
- Force a keyframe
- If video is stalled, request an IDR frame (via RTCP PLI) to recover quickly.
- Lower resolution or frame rate
- Drop to 640x360 or reduce fps to 15–20 if CPU or network are constrained.
- Check encoder fallback and CPU
- On mobile, prefer hardware encoders and cap resolution to reduce CPU usage and battery drain.
- Review signaling and reconnection behavior
- Ensure ICE restarts and quick reconnections are implemented to handle network transitions (Wi‑Fi → Cellular).
Next step
If you want to try WebRTC at production scale quickly, use a managed Video API that includes SFU hosting, TURN servers and transcoding for CDN output. Review product capabilities and pricing, and try a small PoC with realistic network conditions:
- Start with our managed Video API: https://callaba.io/products/video-api
- Follow the engineering guides: Getting started and WebRTC guide
- Plan latency and sizing using the latency optimization guide: Latency optimization
Need help mapping an architecture to your product requirements? Contact our engineering team or request a demo on the pricing page to discuss capacity planning and a migration plan from proof-of-concept to global scale: https://callaba.io/pricing.
See also Why Is Hls Better Than Mp4.

