Video streaming at YouTube scale touches every interesting problem: massive uploads, transcoding pipelines, petabyte storage, global CDN delivery, adaptive bitrate, and ML-driven recommendations. Here’s how I’d design it end-to-end.

Requirements

Functional

  • Upload a video.
  • Watch a video at the best bitrate the network supports.
  • Search.
  • Recommendations.
  • Likes, comments, subscribers.

Non-functional

  • Read-heavy. Watch:upload ratio of 1000:1 is conservative.
  • Sub-second start playback at p95.
  • Smooth bitrate adaptation as network conditions change.
  • High availability — broken playback is fatal.
  • Cheap storage at PB scale.

Out of scope

  • Auth, billing, ad serving.

Capacity

Number
MAU2.7B
DAU1.5B
Hours uploaded per minute500
Hours watched per day1B+
Concurrent live viewers (peak event)100M
Avg upload size~500 MB
Storage growth per day~50 PB (raw + variants)

Yes, that’s “petabytes per day.” Storage is the most expensive line item; CDN bandwidth is second.

API surface (sketch)

POST /api/upload (multipart or chunked)
   returns video_id + upload_url for resumable upload

GET /api/video/{id}
   metadata + manifest URL

GET /watch/{id}.m3u8
   HLS manifest (segment list)

GET /segments/{id}/{quality}/seg-{n}.ts
   individual video chunk

GET /api/recommend?video_id={id}
   list of recommended videos

The watch path is dominated by static segment files served from CDN.

Upload pipeline

Client
   chunked upload (resumable)
  
Edge upload gateway
  
  
Raw object store (S3 / GCS / R2)   single source of truth for the original
  
    (event)
Transcode queue (Kafka / SQS)
  
  
Transcoder workers (GPU)
   per quality variant + per segment
  
Variant object store (HLS/DASH segments)
  
  
CDN origin shield  CDN PoPs  users

Steps:

  1. Resumable upload via tus.io or signed multipart. A 500 MB upload over flaky 4G works.
  2. Antivirus / format validation on the raw upload.
  3. Transcode fan-out. One source video → variants:
    • 144p / 240p / 360p / 480p / 720p / 1080p / 4K
    • HLS (.m3u8 manifest + .ts segments) and DASH variants
    • 2–6 second segments
  4. Generate manifests. Master playlist points to per-quality playlists; each per-quality is a list of segments.
  5. Push to CDN origin shield. Pre-warm popular regions.
  6. Update video metadata (status: ready).

The transcoder pool is the most expensive ongoing cost. Run it on spot GPU instances; jobs are idempotent and retryable.

Storage tiers

TierLatencyCostUsed for
CDN edge cache<10msHighHot videos
CDN origin shield50msMediumRecently popular
Hot object store (S3 / GCS Standard)100msMediumLast 90 days, long tail of popular
Cold object store (S3 IA / Glacier-ish)secondsLowOld videos rarely watched
Archiveminutes-hoursLowestLong-tail backups, originals

Movement between tiers is automated based on view counts. A video unwatched for 6 months drops to cold; a viral spike re-promotes to hot.

Adaptive Bitrate (HLS / DASH)

master.m3u8:
  #EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240
  240p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
  480p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
  720p/playlist.m3u8
  #EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
  1080p/playlist.m3u8

Each per-quality playlist is a sequence of segments:

240p/playlist.m3u8:
  #EXTINF:6.0
  240p/seg-001.ts
  #EXTINF:6.0
  240p/seg-002.ts
  ...

The client (HTML5 <video> + hls.js, or AVPlayer on iOS, or ExoPlayer on Android) measures download speed per segment and picks the next quality. The server doesn’t decide bitrate — the client does.

This is why ABR is so robust: every client adapts to its own conditions.

CDN strategy

Three layers:

  1. CDN edge — closest to the user. Caches popular segments; <50ms.
  2. Mid-tier / regional cache — caches per region; reduces origin load.
  3. Origin — your object store. Last resort.

For YouTube specifically, the layered model is augmented with ISP-level caches (Google Global Cache servers placed inside ISPs). For most teams, S3 + CloudFront / Cloudflare Stream / Bunny.net is enough.

Cache keys include video ID + quality + segment number — heavily cacheable. A single popular video’s 4K stream might be served from cache 99.99% of the time.

Live streaming

Adds two complications:

  • Latency target. “Low-latency HLS” gets you ~3-5s glass-to-glass. WebRTC gets you sub-second.
  • Fan-out. A single ingest source feeds millions of viewers.

Architecture:

Streamer (RTMP / WebRTC)
  
Ingest (RTMP server / SFU)
  
Transcoder (per-quality variants)
  
Manifest writer (HLS LL / DASH)
  
Origin  CDN PoPs (edge transmuxing for LL-HLS)
  
Viewers

Live ingest is one streamer; live distribution is millions of viewers. The fan-out is at the CDN; ingest is a small fleet.

Recommendations

The classic two-stage model:

1. Candidate generation

From billions of videos, narrow to ~thousands the user might want. Multiple sources:

  • Collaborative filtering. Users similar to you watched these.
  • Content-based. Videos similar to what you’ve watched.
  • Trending. Currently popular.
  • Subscriptions. Channels you follow.

These run offline (precomputed) or as fast online services. Each contributes a candidate pool.

2. Ranking

A learned ranker model (gradient-boosted trees, a small neural net) scores each candidate using features:

  • Recency, watch time, like ratio, similarity to user history.
  • Cross features (user × video).

Top N from the ranker → user’s home feed.

Feedback loop

User watches → events to Kafka → train next model. Online learning + nightly retrains. Models versioned, A/B tested.

For the broader patterns see Distributed Systems Fundamentals and the LLM-side rendering of similar ideas in Self-Hosted LLMs in 2026 .

  • Indexed by metadata (title, description, channel) into a search engine (Elasticsearch / OpenSearch).
  • Embeddings for semantic search of “show me videos about X.” See Build a RAG App with pgvector and FastAPI for the underlying pattern, applied at billion-row scale.
  • Personalization by injecting user-affinity features into ranking.

Comments and likes

These are write-heavy social features. Pattern:

  • Comments: Cassandra-style wide-row store keyed by video_id. Append-only writes. Pagination via last_comment_id.
  • Likes: counter incremented in Redis (write-behind to durable store every minute).
  • Notification fan-out for “your video has been commented on” — async via Kafka / NATS .

DRM and access control

For paid content:

  • Encrypted segments. Different keys per video; key delivery gated by license server.
  • Widevine / PlayReady / FairPlay for the major platforms.
  • Signed URLs for time-limited segment access.
  • Token rotation. Manifest URLs include a short-lived token.

Real DRM is complicated. For most non-premium use cases, signed URLs + encrypted-at-rest is enough.

Operational realities

  • Storage is the dominant cost. Plan tier transitions aggressively.
  • CDN bandwidth is line item #2. Negotiate; consider multi-CDN.
  • Transcoding is bursty. Use spot capacity for the queue.
  • Hot-key problem. A viral video → CDN cache miss storms. Pre-warm + stagger keys.
  • Geo-blocking. Compliance requires per-country availability rules. Encode at the manifest layer.

Capacity arithmetic

For 1B watch hours/day, average bitrate ~3 Mbps:

  • ~3 EB/day transferred (3 × 10^18 bytes). Most served from CDN edge cache.
  • Even at 99% cache hit, that’s 30 PB/day from origin → CDN. Plan accordingly.

For 50 PB/day stored, at $0.02/GB/month for hot tier, raw infrastructure cost is ~$30M/month for hot storage. Tiering and deduplication are how the line item stays sustainable.

What interviewers love to dig into

  • “What if a video goes viral?” → CDN auto-scales; mid-tier caches absorb origin pressure; pre-warm popular regions.
  • “How do you prevent a single video from saturating CDN?” → Anycast; per-PoP rate limiting; diversion to peer caches.
  • “How is comment ordering handled?” → Either chronological with cursor pagination, or “top comments” via a learned ranker on engagement features.
  • “What if a transcoder fails mid-job?” → Idempotent jobs; retry with checkpointing; partial outputs discarded.
  • “How do you handle takedowns / DMCA?” → Soft delete in metadata; CDN invalidation; per-region geo-blocks.

What I’d actually build today

For a small video product (1k creators, 100k viewers):

  • Cloudflare Stream or Mux for video hosting + transcode + delivery (managed).
  • Postgres for metadata.
  • Redis for counters.
  • Kafka for events.
  • Postgres + pgvector for semantic search.

Skip the build-from-scratch transcoder until you outgrow managed. Mux/Stream get you to 1M users without thinking about transcoding pipelines.

Read this next

If you want a “small YouTube” reference architecture (Mux + Postgres + Redis + Hono backend), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .