Should I use fanout-on-write or fanout-on-read for a news feed?

Hybrid is the production answer. Use fanout-on-write for normal users (push tweets to all followers' feeds) and fanout-on-read for celebrities with millions of followers (compute their tweets into the feed at read time, to avoid millions of writes per tweet).

How do you rank a feed in 2026?

Most production feeds in 2026 use a learned ranker (small ML model) over a candidate pool, plus a few hard rules (fresh > old, friends > strangers, hide blocked). Pure reverse-chronological is increasingly rare for engagement-driven products.

How do you handle a million writes per second?

Partition by user_id, use a write-optimized store (Cassandra, ScyllaDB, or partitioned Postgres) for the timeline, async fanout via Kafka, and a separate read path that merges fanout output with celebrity pull. The architecture in this post handles it.

Design Twitter / News Feed — A System Design Walkthrough

“Design Twitter” is the system design interview classic for a reason. It touches every interesting tradeoff: read vs write asymmetry, fanout, caching, ranking, and the celebrity problem. Here’s how I’d actually design it.

Requirements

Functional

Post a tweet (≤280 chars).
Follow other users.
See a home timeline (tweets from people you follow).
See a profile timeline (a user’s own tweets).
Like, retweet, reply.

Non-functional

Read-heavy. Roughly 100:1 reads-to-writes typical.
Low latency for the home timeline — sub-200ms p99.
Eventual consistency acceptable — a tweet doesn’t need to appear in followers’ feeds in the same second.

Out of scope

Auth, billing, abuse detection. Pretend they exist.

Capacity

	Number
MAU	500M
DAU	200M
Tweets per day	500M (avg 2.5/user)
Reads per day	50B (avg 250/DAU)
Avg followers per user	200
Top user followers	100M+
Avg tweet size	~1 KB (text + metadata)
Daily storage growth	~500 GB

The interesting numbers:

500M tweets/day — ~6,000 writes/sec sustained, peak ~30,000/sec.
50B reads/day — ~600,000 reads/sec sustained, peak ~3M/sec.
Average tweet has 200 followers → 200 timeline writes per tweet → 1.2M timeline writes/sec if we naively fanout-on-write.
One celebrity tweet could fan out to 100M timelines. That’s a problem.

API

POST /api/tweet
  body: {"text": "..."}
  → 201 {"id": "...", "created_at": "..."}

GET /api/timeline/home?cursor=<opaque>
  → 200 {"tweets": [...], "next_cursor": "..."}

GET /api/timeline/user/{user_id}?cursor=<opaque>
  → 200 {"tweets": [...], "next_cursor": "..."}

POST /api/follow/{user_id}
DELETE /api/follow/{user_id}

POST /api/tweet/{id}/like
POST /api/tweet/{id}/retweet

Cursor pagination, not page-numbered. Page-numbered breaks under inserts at the head of the feed.

Storage layout

Tweet store

CREATE TABLE tweets (
    id          BIGINT PRIMARY KEY,        -- snowflake-style, time-ordered
    user_id     BIGINT NOT NULL,
    text        TEXT NOT NULL,
    media_id    UUID,
    reply_to    BIGINT,
    retweet_of  BIGINT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX tweets_user_created ON tweets (user_id, created_at DESC);

Partitioned by user_id hash for horizontal scale. Single Postgres can’t hold this; use partitioning + sharding (Citus / vitess for MySQL flavor) or move to Cassandra/ScyllaDB.

Follow store

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX follows_followee ON follows (followee_id, follower_id);

You query both directions:

“who do I follow?” → WHERE follower_id = ?
“who follows me?” → WHERE followee_id = ?

Both queries need their own index.

Timeline store

This is where the design choice happens.

Fanout-on-write (push)

When user X tweets, write the tweet ID into each follower’s home timeline at write time. Reads are O(1) — just read the precomputed timeline.

Tweet posted by X (200 followers):
  for f in followers(X):
    redis.zadd(f"timeline:{f}", score=tweet.created_at, member=tweet.id)

Read home timeline for user U:
  ids = redis.zrevrange(f"timeline:{U}", 0, 99)
  return load_tweets(ids)

Reads are blazingly fast. Redis ZREVRANGE on a sorted set is microseconds.
Writes are heavy. A tweet with 200 followers = 200 timeline writes.
Killer problem: a user with 100M followers tweets → 100M timeline writes. That’s a few minutes of write storm even with parallelism.

Fanout-on-read (pull)

When user X tweets, only write to the tweet store. Compute the home timeline at read time by querying tweets from each followee.

Read home timeline for user U:
  followees = follows(U)
  tweets = []
  for f in followees:
    tweets += latest_tweets(f, n=100)
  return merge_sort_by_created_at(tweets)[:100]

Writes are cheap. One row per tweet, regardless of follower count.
Reads are expensive. Following 5,000 users? You query 5,000 tweet streams.
Latency spikes for users with many followees.

For most apps, fanout-on-read is the wrong default. For high-fanout (celebrity-heavy) accounts, it’s the right exception.

Hybrid (the production answer)

The real design uses both:

For normal users: fanout-on-write. Tweet goes into all followers’ timelines.
For celebrities (>1M followers, configurable threshold): don’t fan out. Mark them special.
At read time for user U: read U’s precomputed timeline (fast) + pull recent tweets from each celebrity U follows + merge.

def get_home_timeline(user_id):
    # Fast path: precomputed from fanout-on-write
    base = redis.zrevrange(f"timeline:{user_id}", 0, 199)
    base_tweets = load_tweets(base)

    # Slow path: celebrities U follows
    celebs = [f for f in follows(user_id) if is_celebrity(f)]
    celeb_tweets = []
    for c in celebs:
        celeb_tweets += redis.zrevrange(f"profile:{c}", 0, 99)
    celeb_tweets = load_tweets(celeb_tweets)

    # Merge by score (created_at), dedupe
    return merge_dedupe_by_score(base_tweets + celeb_tweets)[:100]

The threshold for “celebrity” is tunable. Twitter historically used something like 10k–100k. Below: fanout-on-write. Above: pull at read time.

This eliminates the celebrity write storm and keeps read latency bounded (a user follows ~ a few hundred celebs at most).

Fanout pipeline

Tweet write:
  ↓
  Postgres / Cassandra
  ↓
  Kafka topic: tweet.created
  ↓
  Fanout consumer (many parallel)
  ↓
  Redis ZADD into each follower's timeline:{follower_id}

The fanout consumer:

async def fanout_worker():
    async for msg in kafka.consume("tweet.created"):
        tweet = json.loads(msg.value)
        if is_celebrity(tweet["user_id"]):
            continue                                  # don't fanout
        followers = await db.fetch_followers(tweet["user_id"])
        async with redis.pipeline(transaction=False) as p:
            for f in followers:
                p.zadd(f"timeline:{f}", {tweet["id"]: tweet["created_at_ts"]})
                p.zremrangebyrank(f"timeline:{f}", 0, -800)   # cap at last 800
            await p.execute()

Two important details:

Cap timeline length. No user reads beyond ~500 tweets back. Keep timeline:{user} bounded so memory is predictable.
Pipeline writes. A pipelined Redis call sends all the writes in one round trip.

Ranking

Pure reverse-chronological feeds are fine for some products. Most engagement-optimized feeds use a learned ranker:

Candidates (chronological merge, top 500)
  ↓
  Per-tweet feature extraction
  (recency, engagement, follower affinity, mutual interest, ...)
  ↓
  Lightweight ML model scores each
  ↓
  Top 100 by score, returned to user

The ranker is small enough to score 500 candidates in a few ms. It runs in the timeline read path on a serving GPU/CPU pool. See Self-Hosted LLMs in 2026 for the inference patterns.

For a system design interview, knowing this exists and that the ranker is offline-trained, online-served, and feature-store-backed is enough. You don’t need to derive the model in 45 minutes.

Caching

Aggressive layers of caching:

Layer	TTL	What
CDN / edge	30s for celebrity profile pages	High-traffic public timelines
Application-level cache	1–5s	Hot user metadata, follower counts
Redis	Persistent (timeline structures)	Per-user timeline, profile timeline, like counts
Postgres / Cassandra	n/a	Source of truth

Cache patterns from Caching Strategies in 2026 — single-flight on hot keys, stale-while-revalidate on profile reads.

Reads — the read path summarized

GET /timeline/home?cursor=...
  ↓
  Auth (validate session) — ~1ms
  ↓
  Read base timeline IDs from Redis ZREVRANGEBYSCORE — ~1ms
  ↓
  Pull celeb tweets + merge — ~5ms
  ↓
  Hydrate tweet metadata (Postgres / Cassandra; cached) — ~5ms
  ↓
  Rank — ~5ms
  ↓
  Return — total ~15–20ms

For p99 < 200ms the budget has plenty of headroom for fanout misses, ranker stalls, and one slow downstream.

Writes — the write path

POST /tweet
  ↓
  Auth — 1ms
  ↓
  Validate (length, content rules) — <1ms
  ↓
  Persist to tweet store — 5–20ms
  ↓
  Emit to Kafka tweet.created — 2ms
  ↓
  Return 201 — total ~10–25ms

The fanout happens asynchronously off the response. The user gets their 201 fast. Followers see the tweet seconds later (acceptable).

What if Redis dies

Hot path is Redis. Plan:

Replicas with read scaling. Reads can fail over to a replica.
Postgres / Cassandra fallback. Compute the timeline live from the tweet store. Slower (10× latency) but the service stays up.
Circuit breaker to fail fast and use the fallback after sustained errors.

Operational notes

Snowflake IDs for tweets — sortable, partition-friendly, no central coordinator.
Outbox pattern to ensure tweet write + Kafka emit are both-or-neither. See Idempotency, Retries, and Exactly-Once Illusions .
Backpressure on the fanout consumers — if Redis is slow, the queue grows; alert before it explodes.
Eventual deletion of tweets requires garbage-collecting timeline entries that point to deleted tweets. Tombstones + lazy expiry, not synchronous delete-fanout.

What interviewers love to dig into

“What happens if a celebrity goes from 1M to 10M followers overnight?” → Configurable threshold; reclassify; switch to pull-at-read for them; bulk-clear their fanout entries from existing timelines if you care about consistency.
“How do you handle blocked / muted users?” → Filter at read time, not at fanout. Fanout would be wrong if the muting is added after the fanout occurred.
“What if you need to delete a tweet?” → Soft-delete in tweet store; readers skip soft-deleted IDs. Garbage-collect from timelines later.
“How do you handle replies and threading?” → Replies are tweets with reply_to. Threading is a separate read API that walks the reply graph; cache hot threads.
“What about edits?” → Postgres allows it but timeline-as-IDs design means clients refetch when they render. Edit is just an UPDATE on the tweet row.

What I’d actually build today

For a small-to-mid scale (1M users):

Postgres with partitioning for tweets and follows.
Redis cluster for timelines.
Kafka (or NATS JetStream — see Kafka vs NATS vs RabbitMQ ) for fanout.
A small Go service for the timeline read path.
Cloudflare in front for static and DDoS.

For Twitter scale (500M users) the above evolves to Cassandra/ScyllaDB for tweets, Vitess for follows, and a multi-region replication strategy. The core architecture stays the same.

Read this next

If you want a worked-out hybrid-fanout proof-of-concept (Postgres + Redis + Kafka + Python workers), it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Requirements#

Functional#

Non-functional#

Out of scope#

Capacity#

API#

Storage layout#

Tweet store#

Follow store#

Timeline store#

Fanout-on-write (push)#

Fanout-on-read (pull)#

Hybrid (the production answer)#

Fanout pipeline#

Ranking#

Caching#

Reads — the read path summarized#

Writes — the write path#

What if Redis dies#

Operational notes#

What interviewers love to dig into#

What I’d actually build today#

Read this next#