Video streaming at YouTube scale touches every interesting problem: massive uploads, transcoding pipelines, petabyte storage, global CDN delivery, adaptive bitrate, and ML-driven recommendations. Here’s how I’d design it end-to-end.
Requirements
Functional
- Upload a video.
- Watch a video at the best bitrate the network supports.
- Search.
- Recommendations.
- Likes, comments, subscribers.
Non-functional
- Read-heavy. Watch:upload ratio of 1000:1 is conservative.
- Sub-second start playback at p95.
- Smooth bitrate adaptation as network conditions change.
- High availability — broken playback is fatal.
- Cheap storage at PB scale.
Out of scope
- Auth, billing, ad serving.
Capacity
| Number | |
|---|---|
| MAU | 2.7B |
| DAU | 1.5B |
| Hours uploaded per minute | 500 |
| Hours watched per day | 1B+ |
| Concurrent live viewers (peak event) | 100M |
| Avg upload size | ~500 MB |
| Storage growth per day | ~50 PB (raw + variants) |
Yes, that’s “petabytes per day.” Storage is the most expensive line item; CDN bandwidth is second.
API surface (sketch)
POST /api/upload (multipart or chunked)
→ returns video_id + upload_url for resumable upload
GET /api/video/{id}
→ metadata + manifest URL
GET /watch/{id}.m3u8
→ HLS manifest (segment list)
GET /segments/{id}/{quality}/seg-{n}.ts
→ individual video chunk
GET /api/recommend?video_id={id}
→ list of recommended videos
The watch path is dominated by static segment files served from CDN.
Upload pipeline
Client
│ chunked upload (resumable)
▼
Edge upload gateway
│
▼
Raw object store (S3 / GCS / R2) ← single source of truth for the original
│
▼ (event)
Transcode queue (Kafka / SQS)
│
▼
Transcoder workers (GPU)
│ per quality variant + per segment
▼
Variant object store (HLS/DASH segments)
│
▼
CDN origin shield → CDN PoPs → users
Steps:
- Resumable upload via tus.io or signed multipart. A 500 MB upload over flaky 4G works.
- Antivirus / format validation on the raw upload.
- Transcode fan-out. One source video → variants:
- 144p / 240p / 360p / 480p / 720p / 1080p / 4K
- HLS (
.m3u8manifest +.tssegments) and DASH variants - 2–6 second segments
- Generate manifests. Master playlist points to per-quality playlists; each per-quality is a list of segments.
- Push to CDN origin shield. Pre-warm popular regions.
- Update video metadata (
status: ready).
The transcoder pool is the most expensive ongoing cost. Run it on spot GPU instances; jobs are idempotent and retryable.
Storage tiers
| Tier | Latency | Cost | Used for |
|---|---|---|---|
| CDN edge cache | <10ms | High | Hot videos |
| CDN origin shield | 50ms | Medium | Recently popular |
| Hot object store (S3 / GCS Standard) | 100ms | Medium | Last 90 days, long tail of popular |
| Cold object store (S3 IA / Glacier-ish) | seconds | Low | Old videos rarely watched |
| Archive | minutes-hours | Lowest | Long-tail backups, originals |
Movement between tiers is automated based on view counts. A video unwatched for 6 months drops to cold; a viral spike re-promotes to hot.
Adaptive Bitrate (HLS / DASH)
master.m3u8:
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=426x240
240p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1000000,RESOLUTION=854x480
480p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=3000000,RESOLUTION=1280x720
720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=6000000,RESOLUTION=1920x1080
1080p/playlist.m3u8
Each per-quality playlist is a sequence of segments:
240p/playlist.m3u8:
#EXTINF:6.0
240p/seg-001.ts
#EXTINF:6.0
240p/seg-002.ts
...
The client (HTML5 <video> + hls.js, or AVPlayer on iOS, or ExoPlayer on Android) measures download speed per segment and picks the next quality. The server doesn’t decide bitrate — the client does.
This is why ABR is so robust: every client adapts to its own conditions.
CDN strategy
Three layers:
- CDN edge — closest to the user. Caches popular segments; <50ms.
- Mid-tier / regional cache — caches per region; reduces origin load.
- Origin — your object store. Last resort.
For YouTube specifically, the layered model is augmented with ISP-level caches (Google Global Cache servers placed inside ISPs). For most teams, S3 + CloudFront / Cloudflare Stream / Bunny.net is enough.
Cache keys include video ID + quality + segment number — heavily cacheable. A single popular video’s 4K stream might be served from cache 99.99% of the time.
Live streaming
Adds two complications:
- Latency target. “Low-latency HLS” gets you ~3-5s glass-to-glass. WebRTC gets you sub-second.
- Fan-out. A single ingest source feeds millions of viewers.
Architecture:
Streamer (RTMP / WebRTC)
↓
Ingest (RTMP server / SFU)
↓
Transcoder (per-quality variants)
↓
Manifest writer (HLS LL / DASH)
↓
Origin → CDN PoPs (edge transmuxing for LL-HLS)
↓
Viewers
Live ingest is one streamer; live distribution is millions of viewers. The fan-out is at the CDN; ingest is a small fleet.
Recommendations
The classic two-stage model:
1. Candidate generation
From billions of videos, narrow to ~thousands the user might want. Multiple sources:
- Collaborative filtering. Users similar to you watched these.
- Content-based. Videos similar to what you’ve watched.
- Trending. Currently popular.
- Subscriptions. Channels you follow.
These run offline (precomputed) or as fast online services. Each contributes a candidate pool.
2. Ranking
A learned ranker model (gradient-boosted trees, a small neural net) scores each candidate using features:
- Recency, watch time, like ratio, similarity to user history.
- Cross features (user × video).
Top N from the ranker → user’s home feed.
Feedback loop
User watches → events to Kafka → train next model. Online learning + nightly retrains. Models versioned, A/B tested.
For the broader patterns see Distributed Systems Fundamentals and the LLM-side rendering of similar ideas in Self-Hosted LLMs in 2026 .
Search
- Indexed by metadata (title, description, channel) into a search engine (Elasticsearch / OpenSearch).
- Embeddings for semantic search of “show me videos about X.” See Build a RAG App with pgvector and FastAPI for the underlying pattern, applied at billion-row scale.
- Personalization by injecting user-affinity features into ranking.
Comments and likes
These are write-heavy social features. Pattern:
- Comments: Cassandra-style wide-row store keyed by
video_id. Append-only writes. Pagination vialast_comment_id. - Likes: counter incremented in Redis (write-behind to durable store every minute).
- Notification fan-out for “your video has been commented on” — async via Kafka / NATS .
DRM and access control
For paid content:
- Encrypted segments. Different keys per video; key delivery gated by license server.
- Widevine / PlayReady / FairPlay for the major platforms.
- Signed URLs for time-limited segment access.
- Token rotation. Manifest URLs include a short-lived token.
Real DRM is complicated. For most non-premium use cases, signed URLs + encrypted-at-rest is enough.
Operational realities
- Storage is the dominant cost. Plan tier transitions aggressively.
- CDN bandwidth is line item #2. Negotiate; consider multi-CDN.
- Transcoding is bursty. Use spot capacity for the queue.
- Hot-key problem. A viral video → CDN cache miss storms. Pre-warm + stagger keys.
- Geo-blocking. Compliance requires per-country availability rules. Encode at the manifest layer.
Capacity arithmetic
For 1B watch hours/day, average bitrate ~3 Mbps:
- ~3 EB/day transferred (3 × 10^18 bytes). Most served from CDN edge cache.
- Even at 99% cache hit, that’s 30 PB/day from origin → CDN. Plan accordingly.
For 50 PB/day stored, at $0.02/GB/month for hot tier, raw infrastructure cost is ~$30M/month for hot storage. Tiering and deduplication are how the line item stays sustainable.
What interviewers love to dig into
- “What if a video goes viral?” → CDN auto-scales; mid-tier caches absorb origin pressure; pre-warm popular regions.
- “How do you prevent a single video from saturating CDN?” → Anycast; per-PoP rate limiting; diversion to peer caches.
- “How is comment ordering handled?” → Either chronological with cursor pagination, or “top comments” via a learned ranker on engagement features.
- “What if a transcoder fails mid-job?” → Idempotent jobs; retry with checkpointing; partial outputs discarded.
- “How do you handle takedowns / DMCA?” → Soft delete in metadata; CDN invalidation; per-region geo-blocks.
What I’d actually build today
For a small video product (1k creators, 100k viewers):
- Cloudflare Stream or Mux for video hosting + transcode + delivery (managed).
- Postgres for metadata.
- Redis for counters.
- Kafka for events.
- Postgres + pgvector for semantic search.
Skip the build-from-scratch transcoder until you outgrow managed. Mux/Stream get you to 1M users without thinking about transcoding pipelines.
Read this next
- Distributed Systems Fundamentals
- Design Twitter / News Feed — recommendation patterns.
- Caching Strategies in 2026 — the cache layer.
- Kafka vs NATS vs RabbitMQ — the event bus.
If you want a “small YouTube” reference architecture (Mux + Postgres + Redis + Hono backend), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .