media server logo

Stream Live

Mar 09, 2026

This guide is for engineers and operations teams who need to stream live reliably with low latency. It focuses on production‑grade, measurable techniques: when to use SRT, WebRTC or chunked CMAF (LL‑HLS/DASH), exact configuration targets (GOP, part sizes, buffer and bitrates) and step‑by‑step recipes you can run in staging and push to production. If this is your main use case, this practical walkthrough helps: How Do I Stream On Twitch. Before full production rollout, run a Test and QA pass with Generate test videos and streaming quality check and video preview. For this workflow, teams usually combine Player & embed, Ingest & route, and Video platform API. Before full production rollout, run a Test and QA pass with a test app for end-to-end validation.

What it means (definitions and thresholds)

"Stream live" is ambiguous unless we define latency targets and expectations. Below are practical latency classes used in production and which protocols typically meet them. For an implementation variant, compare the approach in Best Webcams For Streaming.

  • Sub‑second / interactive — end‑to‑end 150–500 ms. Use case: video calls, live auctions, two‑way remote control. Typical tech: WebRTC (peer connection / SFU). Player buffer: 100–300 ms.
  • Near real‑time — 500 ms–3 s. Use case: contribution from field cameras to cloud, remote production. Typical tech: SRT for contribution, WebRTC for small scales. Player buffer: 300 ms–1.5 s.
  • Low‑latency distribution — 2–8 s. Use case: live sports, streaming to thousands with limited interactivity. Typical tech: chunked CMAF / LL‑HLS or low‑latency DASH. Player buffer: 2–6 s.
  • Standard HLS/DASH — 8+ s. Use case: legacy streaming and maximum compatibility, wide CDN caching. Player buffer: 8–30 s.

Important protocol notes: SRT is a reliable UDP‑based contribution protocol that uses retransmission and a receiver buffer configured by the latency parameter (value in milliseconds). Typical SRT latency settings for production live contribution range from 200 ms (very stable links) up to 2000 ms (unreliable public internet). CMAF chunking uses fMP4 parts ("parts") with recommended part sizes of 200–500 ms for low latency LL‑HLS. If you need a deeper operational checklist, use Streaming Software For Youtube.

Decision guide

Choose the right tool for the right job — each choice implies different costs and architecture changes. A related implementation reference is Low Latency. Pricing path: validate with bitrate calculator, self hosted streaming solution, and AWS Marketplace listing.

  • Use WebRTC when you need sub‑second, two‑way interactivity. Architect as a browser/native client → SFU → clients. Expect higher CPU on server (SFU) per stream and limited horizontal scaling without an SFU fleet or selective forwarding strategies.
  • Use SRT when you need robust field contribution over the public internet and can tolerate 0.3–3 s of latency. SRT gives packet retransmission and jitter handling. It is the right choice for remote cameras, OB vans and encoders that cannot use WebRTC directly.
  • Use chunked CMAF / LL‑HLS for distribution to large audiences where clients are mainly players or browsers and you need consistent multi‑CDN caching. Target 2–6 s delivery latency with careful packaging and CDN support.
  • Use RTMP only for legacy encoders. It is widely supported for ingest but lacks built‑in reliability and is being phased out in favor of SRT or WebRTC for contribution.

Operational mapping to product pages: use the /products/video-api to integrate custom ingest or player control; use /products/multi-streaming for sending the same live output to social platforms; store or process the live output into VOD via /products/video-on-demand. For self‑hosting plans see /self-hosted-streaming-solution and, if you use marketplace images, consider the appliance on AWS: https://aws.amazon.com/marketplace/pp/prodview-npubds4oydmku.

Latency budget / architecture budget

Always build a latency budget (architecture budget): break the end‑to‑end path into measurable blocks and allocate milliseconds to each. Below are three sample budgets you can use as templates — measure and adjust them in staging.

Target: Sub‑second interactive (300–500 ms)

  • Capture + encoder latency (hardware encoder): 50–150 ms
  • Network (client ↔ SFU): 30–150 ms
  • SFU processing (forwarding): 30–80 ms
  • Decode + render (client): 30–120 ms
  • Jitter buffer/resync: 20–50 ms

Target: Near‑real‑time SRT contribution (1–3 s)

  • Encoder + capture: 100–300 ms
  • SRT network + retransmit buffer: 200–1500 ms (dependent on latency param)
  • Transcoding / composition: 100–400 ms (hardware accelerated with NVENC/QuickSync)
  • Packaging (CMAF parts): 50–300 ms
  • Player buffer: 300–800 ms

Target: Low‑latency distribution via chunked CMAF (3–6 s)

  • Ingest (SRT/RTMP): 200–600 ms
  • Transcode + ABR generation: 200–800 ms
  • Chunked CMAF segmentation (parts = 200–400 ms, segment = 1 s): 1–2 s (accumulated)
  • CDN edge propagation: 100–400 ms (depending on CDN and push/pull)
  • Player buffer & startup: 1–2 s

Measure and instrument each block (see rollout checklist for metrics). If any block exceeds its allocation, you must either increase the total budget or optimize that component (e.g., lower encoder latency, increase SRT latency, faster packaging).

Practical recipes

The following recipes are proven patterns. Each recipe includes minimal configuration targets and quick validation steps.

Recipe 1 — Field contribution: SRT -> Cloud packager -> LL‑HLS

  1. On field encoder (hardware or ffmpeg): send via SRT. Example ffmpeg send command (example values):
    ffmpeg -re -i camera.mkv \\
      -c:v libx264 -preset veryfast -tune zerolatency -g 60 -keyint_min 60 \\
      -b:v 3500k -maxrate 4200k -bufsize 7000k \\
      -c:a aac -b:a 128k -ar 48000 \\
      -f mpegts "srt://ingest.example.com:9000?pkt_size=1316&mode=caller&latency=800"
          
    Notes: use -tune zerolatency and set -g (GOP) to 1–2 seconds (60 for 30 fps = 2 s). The latency query value is in milliseconds; 800 ms is a conservative default for public internet links.
  2. Receiver / ingest: run an SRT listener on the ingest host and hand the stream to a transcode node. Ensure the ingest SRT listener uses the same latency (or a compatible receiver buffer) and packet size (pkt_size=1316 is typical to avoid fragmentation behind some routers).
  3. Transcode / ABR ladder: create renditions with aligned GOP/keyframe intervals. Example target ladder:
    • 1080p30: 4500–6000 kbps (GOP 1–2 s)
    • 720p30: 2500–3500 kbps
    • 480p30: 800–1500 kbps
    • 360p: 400–700 kbps
    Use CBR or constrained VBR with a buffer size roughly 1.5× maxrate to reduce bitrate spikes.
  4. Packaging: write CMAF fMP4 segments with parts sized 200–400 ms and segment durations 1 s (or 2 s). Use a packager that supports chunked CMAF (Shaka Packager or similar). Target a playlist window of 3–6 s for LL‑HLS.
  5. CDN: choose a CDN with support for low‑latency CMAF or configure short TTLs and support for partial content. Test origin capacity under expected concurrent transcodes.
  6. Validation: measure contribution packet loss (SRT stats), end‑to‑end latency (capture timestamp -> player render), and rebuffer rate. If packet loss is high, increase SRT latency by 200–500 ms or improve network link.

Recipe 2 — Interactive: Browser/mobile WebRTC (SFU) with scalable fallback

  1. Use a browser/native WebRTC stack with an SFU for scaling. Select codecs: Opus for audio; VP8 or H.264 (baseline/main) for video depending on client hardware.
  2. Encoder settings: target 150–300 ms encoder latency; for software encoders use -tune zerolatency and low GOP (1 s) where possible. Hardware encoders (NVENC, QuickSync) reduce CPU and give predictable latency.
  3. SFU: forward media without re‑encoding when possible for lowest latency; if you must transcode for distribution, perform it asynchronously and provide a CMAF distribution path for large audiences.
  4. Fallback: for viewers who cannot use WebRTC, provide a transcoded LL‑HLS output with slightly higher latency (2–6 s) to scale to large audiences.
  5. Validation: run simulated clients at scale, monitor P90 latency, packet loss, and SFU CPU. WebRTC targets are 150–500 ms end‑to‑end under good networks.

Recipe 3 — Large audience: Chunked CMAF (LL‑HLS) with SRT ingress

  1. Ingest via SRT from contribution sources, transcode to ABR renditions with aligned keyframe intervals.
  2. Segment into fMP4 parts of 200–400 ms and produce a CMAF master playlist. Typical packaging: parts=250 ms, segment=1 s, playlist target window 3–6 s to keep latency 2–6 s.
  3. Push origin segments to CDN (push model) or rely on a CDN with low TTL and partial object caching for chunked CMAF. Avoid long origin fetch times.
  4. Implement player logic for LL‑HLS using a player with CMAF/LL support; tune player buffer to 2–4 s depending on CDN edge behavior.
  5. Validation: run scale tests to expected concurrency, test regional edge latencies, and monitor edge cache hit rates and origin request rates.

Recipe 4 — Reliable multi‑destination distribution (socials, endpoints)

  1. From your origin or packager generate RTMP or SRT outputs to social destinations. Use /products/multi-streaming if you need productized multi‑destination routing and monitoring.
  2. Keep a single canonical low‑latency source and re‑encode per destination as required (social platforms often demand RTMP with specific bitrates).
  3. Monitor outgoing connection health and auto‑retry within a 500–2000 ms window for transient failures.

Practical configuration targets

Below are concrete, repeatable targets you can use when provisioning encoders, packagers and players. These are starting points — measure and adapt to your network conditions.

  • Encoder
    • GOP / keyframe interval: 1–2 seconds. Example: 30 fps → GOP = 30 (1 s) or 60 (2 s). Align across all ABR renditions.
    • B‑frames: 0 for sub‑second workflows; 0–2 for 1–6 s workflows. B‑frames increase decode lag.
    • Rate control: CBR or constrained VBR. Set bufsize ≈ 1.5× maxrate.
    • Encoder tune: zerolatency for x264/x265 in low latency settings.
    • Audio: AAC or Opus. For live music use 128–256 kbps; for speech 64–96 kbps.
  • SRT
    • latency: 200–1200 ms for stable networks; 1200–2000 ms for lossy public internet. If you see frequent retransmits, increase by 200–500 ms and re‑measure.
    • pkt_size: 1316 is a safe default to avoid IP fragmentation in many networks.
    • Mode: caller for field → ingest (caller) and listener for the ingest host.
  • Packaging (CMAF / LL‑HLS)
    • Part (chunk) size: 200–400 ms (250 ms is a common sweet spot).
    • Segment size: 1 s or 2 s (1 s lowers latency but increases origin request rate).
    • Playlist window: 3–6 s for low latency. For extreme low latency, reduce parts/playlist at the cost of more frequent origin requests.
  • ABR ladder (example)
    • 1080p30: 4.5–6 Mbps
    • 720p30: 2.5–3.5 Mbps
    • 480p30: 800–1400 kbps
    • 360p: 400–700 kbps
  • Player
    • WebRTC buffer: aim for 100–300 ms
    • SRT‑based native players: 300–1500 ms depending on network
    • LL‑HLS player: 2–6 s depending on CDN and part/segment settings

Limitations and trade-offs

Every optimization has trade‑offs. Be explicit about what you accept and what you don’t.

  • Latency vs reliability: Retransmission (SRT) reduces visible artifacts but increases required buffer and therefore latency. To lower latency, you accept more unrecovered packet loss or use stronger forward error correction (which increases bandwidth).
  • CPU vs latency: Lower encoder latency requires faster presets or hardware acceleration, which increases cost. Software encoders at ultrafast accept lower quality for lower CPU.
  • CDN caching vs low latency: Deep CDN caches are incompatible with very small part durations unless the CDN supports chunked CMAF and partial requests. You may need specialized CDN features for LL‑HLS.
  • Compatibility: Not all browsers/players support LL‑HLS; you may need a hybrid approach (WebRTC for interactive, LL‑HLS for mass audience).

Common mistakes and fixes

  1. Mismatched keyframes across renditions
    • Problem: players see stalls or quality shift artifacts during ABR switch.
    • Fix: enforce identical GOP/keyframe intervals across all ABR renditions and align encoder settings.
  2. Using VBR with large buffer and spikes
    • Problem: network spikes or burstiness cause rebuffering.
    • Fix: use CBR or constrained VBR with a bufsize ≈ 1.5× maxrate.
  3. Wrong SRT latency / pkt_size
    • Problem: jitter and out‑of‑order packets cause stalls or high retransmit rates.
    • Fix: match sender/receiver latency, use pkt_size=1316, and increase latency by 200–500 ms for lossy networks.
  4. Too many B‑frames
    • Problem: added decoder buffering increases end‑to‑end latency.
    • Fix: reduce or set B‑frames to 0 for sub‑second targets.
  5. Packager not supporting chunked CMAF
    • Problem: LL‑HLS fails to achieve low latency, or playlists have long delays.
    • Fix: use a packager with chunked CMAF/LL‑HLS support (see /docs/packaging-llhls) and confirm CDN support for partial content.

Rollout checklist

Use this checklist when moving a low‑latency stream from lab to production.

  1. Define latency SLO (e.g., P90 end‑to‑end < 3 s) and acceptable error thresholds (rebuffer < 0.5%, startup < 4 s).
  2. Instrument metrics and logs for each block: capture→encoder time, SRT stats (packet loss, RTT), transcoder durations, packaging latency, CDN edge latency, player rebuffer events.
  3. Run network emulation tests: 0%/0.5%/1% packet loss; jitter 20/50/100 ms; bandwidth caps to simulate mobile networks. Verify SRT behavior under these conditions.
  4. Run scale tests to expected peak concurrency and 2× headroom. Monitor origin CPU, memory, network and disk I/O.
  5. Test ABR transitions under bandwidth drop scenarios and validate keyframe alignment across renditions.
  6. Confirm CDN configuration — partial content and small TTLs must be validated for chunked CMAF.
  7. Validate player behavior on representative devices and browsers (desktop mobile, iOS Safari LL‑HLS behavior is particular).
  8. Prepare rollback plan: ability to fall back to standard HLS or increase SRT latency quickly if conditions degrade.

Example architectures

Illustrative architectures you can adapt to your environment. Each contains the most important metrics to measure.

Architecture A — Field SRT contribution for a broadcast event

  • Flow: Camera → hardware encoder → SRT (latency=800 ms) → Ingest cluster → Transcoder (NVENC) → Packager (CMAF chunking) → Origin → Multi‑CDN → Player
  • Targets to monitor: SRT packet loss < 0.5%, transcoder latency < 300 ms, origin CPU headroom 50% during peak.
  • Use cases: live sports, multi‑camera OB.

Architecture B — Interactive show with WebRTC + LL‑HLS fallback

  • Flow: Browser/mobile → SFU farm (forwarding) → low‑latency recording → packager to produce LL‑HLS for overflow and VOD
  • Targets: SFU P90 forwarding latency < 200 ms, fallback LL‑HLS latency 2–4 s, VOD generation within 5 minutes post event.

Architecture C — Self‑hosted streaming with hybrid distribution

  • Flow: On‑prem SRT ingest → self‑hosted transcode cluster → origin that pushes to a managed multi‑CDN for global scale. Use /self-hosted-streaming-solution for reference deployment patterns.
  • Targets: test the AWS AMI or marketplace appliance for quick bootstrap: https://aws.amazon.com/marketplace/pp/prodview-npubds4oydmku.

Troubleshooting quick wins

  1. If end‑to‑end latency spikes: temporarily increase SRT latency by 500 ms. If latency drops, you were losing packets that required retransmit time.
  2. If viewers see frequent quality switches: verify ABR ladders and keyframe alignment across renditions.
  3. If players fail on mobile Safari with LL‑HLS: verify CMAF fMP4 compatibility and that the CDN supports Apple’s LL‑HLS requirements; consult /docs/packaging-llhls.
  4. If encoder CPU is pegged: move to hardware encode (NVENC/QuickSync) and raise encoder preset to reduce CPU; trade a small quality delta for stability.
  5. Check clock sync (NTP) on all ingest/transcode servers — timestamp drift causes trimming and incorrect segment alignment.
  6. Collect SRT stats and look for sustained retransmits or high RTT. SRT exposes stats you can log on the receiver; correlate with rebuffer events in player telemetry.

Next step

Pick one recipe and test it end‑to‑end in a controlled staging environment. If you need programmatic control over ingest and playback start/stop, or need to integrate with SDKs, evaluate /products/video-api. If you need to stream to multiple social endpoints simultaneously, consider /products/multi-streaming. For converting live streams into searchable VOD assets, see /products/video-on-demand.

If you plan to run the full stack yourself, review /self-hosted-streaming-solution for deployment patterns and the AWS marketplace appliance at https://aws.amazon.com/marketplace/pp/prodview-npubds4oydmku for quick provisioning.

Operational next steps (short list):

  1. Implement one recipe in staging and baseline your latency budget end‑to‑end.
  2. Run network emulation tests to validate SRT behavior at 0.5%–1% packet loss and jitter up to 100 ms.
  3. Scale the origin and CDN path to your expected concurrency and instrument P90/P99 latency, rebuffer and startup times.

For implementation details, consult the integration guides: /docs/ingest, /docs/srt-setup, /docs/packaging-llhls and /docs/player-setup. If you want help designing the architecture for your event, contact our engineering team through the /products/video-api page or evaluate the self‑hosted option linked above.