“Design Twitter” is the system design interview classic for a reason. It touches every interesting tradeoff: read vs write asymmetry, fanout, caching, ranking, and the celebrity problem. Here’s how I’d actually design it.

Requirements

Functional

  • Post a tweet (≤280 chars).
  • Follow other users.
  • See a home timeline (tweets from people you follow).
  • See a profile timeline (a user’s own tweets).
  • Like, retweet, reply.

Non-functional

  • Read-heavy. Roughly 100:1 reads-to-writes typical.
  • Low latency for the home timeline — sub-200ms p99.
  • Eventual consistency acceptable — a tweet doesn’t need to appear in followers’ feeds in the same second.

Out of scope

  • Auth, billing, abuse detection. Pretend they exist.

Capacity

Number
MAU500M
DAU200M
Tweets per day500M (avg 2.5/user)
Reads per day50B (avg 250/DAU)
Avg followers per user200
Top user followers100M+
Avg tweet size~1 KB (text + metadata)
Daily storage growth~500 GB

The interesting numbers:

  • 500M tweets/day — ~6,000 writes/sec sustained, peak ~30,000/sec.
  • 50B reads/day — ~600,000 reads/sec sustained, peak ~3M/sec.
  • Average tweet has 200 followers → 200 timeline writes per tweet → 1.2M timeline writes/sec if we naively fanout-on-write.
  • One celebrity tweet could fan out to 100M timelines. That’s a problem.

API

POST /api/tweet
  body: {"text": "..."}
  → 201 {"id": "...", "created_at": "..."}

GET /api/timeline/home?cursor=<opaque>
  → 200 {"tweets": [...], "next_cursor": "..."}

GET /api/timeline/user/{user_id}?cursor=<opaque>
  → 200 {"tweets": [...], "next_cursor": "..."}

POST /api/follow/{user_id}
DELETE /api/follow/{user_id}

POST /api/tweet/{id}/like
POST /api/tweet/{id}/retweet

Cursor pagination, not page-numbered. Page-numbered breaks under inserts at the head of the feed.

Storage layout

Tweet store

CREATE TABLE tweets (
    id          BIGINT PRIMARY KEY,        -- snowflake-style, time-ordered
    user_id     BIGINT NOT NULL,
    text        TEXT NOT NULL,
    media_id    UUID,
    reply_to    BIGINT,
    retweet_of  BIGINT,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX tweets_user_created ON tweets (user_id, created_at DESC);

Partitioned by user_id hash for horizontal scale. Single Postgres can’t hold this; use partitioning + sharding (Citus / vitess for MySQL flavor) or move to Cassandra/ScyllaDB.

Follow store

CREATE TABLE follows (
    follower_id BIGINT NOT NULL,
    followee_id BIGINT NOT NULL,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (follower_id, followee_id)
);

CREATE INDEX follows_followee ON follows (followee_id, follower_id);

You query both directions:

  • “who do I follow?” → WHERE follower_id = ?
  • “who follows me?” → WHERE followee_id = ?

Both queries need their own index.

Timeline store

This is where the design choice happens.

Fanout-on-write (push)

When user X tweets, write the tweet ID into each follower’s home timeline at write time. Reads are O(1) — just read the precomputed timeline.

Tweet posted by X (200 followers):
  for f in followers(X):
    redis.zadd(f"timeline:{f}", score=tweet.created_at, member=tweet.id)
Read home timeline for user U:
  ids = redis.zrevrange(f"timeline:{U}", 0, 99)
  return load_tweets(ids)
  • Reads are blazingly fast. Redis ZREVRANGE on a sorted set is microseconds.
  • Writes are heavy. A tweet with 200 followers = 200 timeline writes.
  • Killer problem: a user with 100M followers tweets → 100M timeline writes. That’s a few minutes of write storm even with parallelism.

Fanout-on-read (pull)

When user X tweets, only write to the tweet store. Compute the home timeline at read time by querying tweets from each followee.

Read home timeline for user U:
  followees = follows(U)
  tweets = []
  for f in followees:
    tweets += latest_tweets(f, n=100)
  return merge_sort_by_created_at(tweets)[:100]
  • Writes are cheap. One row per tweet, regardless of follower count.
  • Reads are expensive. Following 5,000 users? You query 5,000 tweet streams.
  • Latency spikes for users with many followees.

For most apps, fanout-on-read is the wrong default. For high-fanout (celebrity-heavy) accounts, it’s the right exception.

Hybrid (the production answer)

The real design uses both:

  1. For normal users: fanout-on-write. Tweet goes into all followers’ timelines.
  2. For celebrities (>1M followers, configurable threshold): don’t fan out. Mark them special.
  3. At read time for user U: read U’s precomputed timeline (fast) + pull recent tweets from each celebrity U follows + merge.
def get_home_timeline(user_id):
    # Fast path: precomputed from fanout-on-write
    base = redis.zrevrange(f"timeline:{user_id}", 0, 199)
    base_tweets = load_tweets(base)

    # Slow path: celebrities U follows
    celebs = [f for f in follows(user_id) if is_celebrity(f)]
    celeb_tweets = []
    for c in celebs:
        celeb_tweets += redis.zrevrange(f"profile:{c}", 0, 99)
    celeb_tweets = load_tweets(celeb_tweets)

    # Merge by score (created_at), dedupe
    return merge_dedupe_by_score(base_tweets + celeb_tweets)[:100]

The threshold for “celebrity” is tunable. Twitter historically used something like 10k–100k. Below: fanout-on-write. Above: pull at read time.

This eliminates the celebrity write storm and keeps read latency bounded (a user follows ~ a few hundred celebs at most).

Fanout pipeline

Tweet write:
  Postgres / Cassandra
  Kafka topic: tweet.created
  Fanout consumer (many parallel)
  Redis ZADD into each follower's timeline:{follower_id}

The fanout consumer:

async def fanout_worker():
    async for msg in kafka.consume("tweet.created"):
        tweet = json.loads(msg.value)
        if is_celebrity(tweet["user_id"]):
            continue                                  # don't fanout
        followers = await db.fetch_followers(tweet["user_id"])
        async with redis.pipeline(transaction=False) as p:
            for f in followers:
                p.zadd(f"timeline:{f}", {tweet["id"]: tweet["created_at_ts"]})
                p.zremrangebyrank(f"timeline:{f}", 0, -800)   # cap at last 800
            await p.execute()

Two important details:

  • Cap timeline length. No user reads beyond ~500 tweets back. Keep timeline:{user} bounded so memory is predictable.
  • Pipeline writes. A pipelined Redis call sends all the writes in one round trip.

Ranking

Pure reverse-chronological feeds are fine for some products. Most engagement-optimized feeds use a learned ranker:

Candidates (chronological merge, top 500)
  Per-tweet feature extraction
  (recency, engagement, follower affinity, mutual interest, ...)
  Lightweight ML model scores each
  Top 100 by score, returned to user

The ranker is small enough to score 500 candidates in a few ms. It runs in the timeline read path on a serving GPU/CPU pool. See Self-Hosted LLMs in 2026 for the inference patterns.

For a system design interview, knowing this exists and that the ranker is offline-trained, online-served, and feature-store-backed is enough. You don’t need to derive the model in 45 minutes.

Caching

Aggressive layers of caching:

LayerTTLWhat
CDN / edge30s for celebrity profile pagesHigh-traffic public timelines
Application-level cache1–5sHot user metadata, follower counts
RedisPersistent (timeline structures)Per-user timeline, profile timeline, like counts
Postgres / Cassandran/aSource of truth

Cache patterns from Caching Strategies in 2026 — single-flight on hot keys, stale-while-revalidate on profile reads.

Reads — the read path summarized

GET /timeline/home?cursor=...
  Auth (validate session) — ~1ms
  Read base timeline IDs from Redis ZREVRANGEBYSCORE — ~1ms
  Pull celeb tweets + merge — ~5ms
  Hydrate tweet metadata (Postgres / Cassandra; cached) — ~5ms
  Rank — ~5ms
  Return — total ~15–20ms

For p99 < 200ms the budget has plenty of headroom for fanout misses, ranker stalls, and one slow downstream.

Writes — the write path

POST /tweet
  Auth — 1ms
  Validate (length, content rules) — <1ms
  Persist to tweet store — 5–20ms
  Emit to Kafka tweet.created — 2ms
  Return 201 — total ~10–25ms

The fanout happens asynchronously off the response. The user gets their 201 fast. Followers see the tweet seconds later (acceptable).

What if Redis dies

Hot path is Redis. Plan:

  • Replicas with read scaling. Reads can fail over to a replica.
  • Postgres / Cassandra fallback. Compute the timeline live from the tweet store. Slower (10× latency) but the service stays up.
  • Circuit breaker to fail fast and use the fallback after sustained errors.

Operational notes

  • Snowflake IDs for tweets — sortable, partition-friendly, no central coordinator.
  • Outbox pattern to ensure tweet write + Kafka emit are both-or-neither. See Idempotency, Retries, and Exactly-Once Illusions .
  • Backpressure on the fanout consumers — if Redis is slow, the queue grows; alert before it explodes.
  • Eventual deletion of tweets requires garbage-collecting timeline entries that point to deleted tweets. Tombstones + lazy expiry, not synchronous delete-fanout.

What interviewers love to dig into

  • “What happens if a celebrity goes from 1M to 10M followers overnight?” → Configurable threshold; reclassify; switch to pull-at-read for them; bulk-clear their fanout entries from existing timelines if you care about consistency.
  • “How do you handle blocked / muted users?” → Filter at read time, not at fanout. Fanout would be wrong if the muting is added after the fanout occurred.
  • “What if you need to delete a tweet?” → Soft-delete in tweet store; readers skip soft-deleted IDs. Garbage-collect from timelines later.
  • “How do you handle replies and threading?” → Replies are tweets with reply_to. Threading is a separate read API that walks the reply graph; cache hot threads.
  • “What about edits?” → Postgres allows it but timeline-as-IDs design means clients refetch when they render. Edit is just an UPDATE on the tweet row.

What I’d actually build today

For a small-to-mid scale (1M users):

  • Postgres with partitioning for tweets and follows.
  • Redis cluster for timelines.
  • Kafka (or NATS JetStream — see Kafka vs NATS vs RabbitMQ ) for fanout.
  • A small Go service for the timeline read path.
  • Cloudflare in front for static and DDoS.

For Twitter scale (500M users) the above evolves to Cassandra/ScyllaDB for tweets, Vitess for follows, and a multi-region replication strategy. The core architecture stays the same.

Read this next

If you want a worked-out hybrid-fanout proof-of-concept (Postgres + Redis + Kafka + Python workers), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .