Postgres or Cassandra for messages?

Postgres up to ~100M messages/day with partitioning. Cassandra/ScyllaDB for write-heavy at higher scale (WhatsApp, Discord-class). Most apps never need to graduate.

WebSocket or polling?

WebSocket for active conversations. SSE for read-only feeds. Polling as fallback for restrictive networks. Hybrid is common: WS first, fall back to long-polling.

Design a Chat System — From DM to a Million Users

Chat systems trick teams into thinking they’re simple. Then a million-member group, offline delivery, read receipts, end-to-end encryption, and global users meet you. This post is the working design at every scale.

Phase 1: simple chat

CREATE TABLE messages (
    id            bigserial PRIMARY KEY,
    conversation_id bigint NOT NULL,
    sender_id     bigint NOT NULL,
    body          text NOT NULL,
    created_at    timestamptz DEFAULT now()
);
CREATE INDEX ON messages (conversation_id, created_at);

CREATE TABLE conversation_members (
    conversation_id bigint,
    user_id         bigint,
    last_read_at    timestamptz,
    PRIMARY KEY (conversation_id, user_id)
);

Send: insert. Fetch: query by conversation_id. WebSocket pushes new messages. Up to ~100k MAU on a single Postgres.

WebSocket fanout

# Each user connects via WS
connections: dict[int, set[WebSocket]] = defaultdict(set)

async def on_connect(user_id, ws):
    connections[user_id].add(ws)

async def send_message(conv_id, msg):
    # Persist
    await db.execute(...)
    # Fanout
    members = await db.fetch("SELECT user_id FROM conversation_members WHERE conversation_id = $1", conv_id)
    for m in members:
        for ws in connections[m["user_id"]]:
            await ws.send_json({"type": "msg", "msg": msg})

Single-process. For multi-process: pub/sub (Redis) between processes.

Multi-process fanout

[Client A] ──WS──▶ [App-1]
[Client B] ──WS──▶ [App-2]
                       │
                  Both subscribe to Redis pub/sub

# On send
await redis.publish(f"conv:{conv_id}", json.dumps(msg))

# On WS server
async for message in redis.subscribe(f"conv:{conv_id}"):
    for ws in conv_connections[conv_id]:
        await ws.send_text(message)

Now any app process delivers messages to its locally-connected clients. See Django Channels and FastAPI Streaming .

Phase 2: scale the message store

-- Partition by month
CREATE TABLE messages (...) PARTITION BY RANGE (created_at);
CREATE TABLE messages_2026_05 PARTITION OF messages FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

See Postgres Partitioning . Drops old partitions; keeps hot partitions tiny.

For really high write rates: ScyllaDB / Cassandra. Discord migrated to ScyllaDB at billion-message scale; before that, MongoDB and Cassandra. Most apps don’t reach this.

Conversation list

SELECT
    c.id,
    c.last_message_at,
    (SELECT body FROM messages WHERE conversation_id = c.id ORDER BY created_at DESC LIMIT 1) AS preview,
    cm.last_read_at
FROM conversations c
JOIN conversation_members cm ON cm.conversation_id = c.id AND cm.user_id = $1
ORDER BY c.last_message_at DESC
LIMIT 50;

Common N+1 trap. Either denormalize last_message_id and last_message_at on the conversation row, or use a separate “conversation_summary” view that’s kept up to date.

Read receipts / unread counts

-- Simple
SELECT COUNT(*) FROM messages
WHERE conversation_id = $1 AND created_at > $2;

For high-scale: maintain a counter per (user, conversation) updated on each new message, decremented on mark-read.

Presence

async def on_connect(user_id):
    await redis.set(f"presence:{user_id}", "online", ex=60)
    asyncio.create_task(heartbeat(user_id))

async def heartbeat(user_id):
    while connected:
        await redis.expire(f"presence:{user_id}", 60)
        await asyncio.sleep(30)

async def on_disconnect(user_id):
    await redis.delete(f"presence:{user_id}")

Auto-expires if user crashes. For status to friends: use Redis pub/sub on presence:user:* events.

Offline delivery

CREATE TABLE pending_deliveries (
    user_id    bigint,
    message_id bigint,
    created_at timestamptz DEFAULT now(),
    PRIMARY KEY (user_id, message_id)
);

If user is offline when message sent: insert to pending_deliveries. On connect: deliver pending. On ack: delete.

Or more simply: client tracks last-seen-message-id; on connect, fetch all messages > that id.

Push notifications

When a user is offline (no WS, no presence) and gets a message: push via APNs / FCM.

async def deliver(user_id, msg):
    if user_id in connections:
        for ws in connections[user_id]:
            await ws.send_json(msg)
    else:
        await push_notification(user_id, msg)

See Notification System .

Group messages

For groups of N members, fanout is N writes to deliver-states. For a 100k-member group, this is significant.

Options:

Pull model: clients fetch new messages on connect. Cheap to write; requires polling/connect.
Push model with capped fanout: push to active users; offline users see messages on connect.
Topic-based (Discord): clients subscribe to channels; servers fanout per channel.

End-to-end encryption (E2EE)

For Signal-class privacy:

Client-side keys (X3DH, double ratchet).
Server stores ciphertext only.
Search and previews become impossible (or local-only).

Heavy lift. For most chat apps: TLS in transit + at-rest encryption is enough. E2EE only when privacy is the product.

Search

For “find that message about X”:

Postgres FTS for moderate scale.
Elasticsearch / Meilisearch for big.
Per-user search (each user’s index) for E2EE.

See Search System Design .

Common mistakes

1. WebSocket alone, no offline delivery

User goes offline; misses messages; they never appear when they reconnect. Always have a persistent message log + sync on connect.

2. Single-process WS

Three replicas; messages from app-1’s WS users don’t reach app-2’s WS users. Use pub/sub.

3. No pagination on history

SELECT * FROM messages WHERE conversation_id = X returns 1M messages. Use cursor pagination.

4. Polling everything

CPU + DB load explodes. WebSocket for active; HTTP for sync.

5. Sync push notifications

Send-API → push API in foreground. Push API timeout → message slow. Always async.

What I’d ship today

For a new chat app (≤1M MAU):

Postgres for messages with monthly partitions.
WebSocket + Redis pub/sub for fanout.
Presence in Redis.
Push notifications for offline.
Cursor pagination for history.
Postgres FTS for search initially.
TLS in transit + at-rest encryption; not E2EE unless required.
Last-message denormalization on conversation row.

Read this next

If you want my chat reference architecture (FastAPI + Postgres + Redis), it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Phase 1: simple chat#

WebSocket fanout#

Multi-process fanout#

Phase 2: scale the message store#

Conversation list#

Read receipts / unread counts#

Presence#

Offline delivery#

Push notifications#

Group messages#

End-to-end encryption (E2EE)#

Search#

Common mistakes#

1. WebSocket alone, no offline delivery#

2. Single-process WS#

3. No pagination on history#

4. Polling everything#

5. Sync push notifications#

What I’d ship today#

Read this next#