A distributed cache is the system design interview classic. Even if you’ll never build one, designing it teaches you consistent hashing, replication, and the cache-coherence tradeoffs that show up everywhere.

Requirements

  • Get / set / delete by key.
  • Sub-millisecond latency.
  • Survive node failure.
  • Scale horizontally.
  • Eviction when full.

Architecture

Client SDK (consistent-hashes key → server)
Cache nodes (in-memory store)
(optional) Replica per shard for HA

Stateless clients. Stateful servers. No central coordinator.

Consistent hashing

Naive: shard = hash(key) % N. Adding a node moves nearly all keys.

Consistent hash: keys mapped to a virtual ring; each node owns an arc. Adding a node moves only its arc’s worth of keys.

class ConsistentHash:
    def __init__(self, nodes, virtual=150):
        self.ring = {}
        for node in nodes:
            for v in range(virtual):
                key = hash(f"{node}:{v}")
                self.ring[key] = node
        self.sorted_keys = sorted(self.ring)

    def get(self, key):
        h = hash(key)
        # find first ring position >= h
        for k in self.sorted_keys:
            if k >= h:
                return self.ring[k]
        return self.ring[self.sorted_keys[0]]   # wrap

Virtual nodes (150 per real node) smooth out distribution.

Replication

For HA: each shard has a primary + 1–2 replicas. On primary failure, a replica is promoted.

For Memcached-style ephemeral cache: often no replication. A node failure → those keys are lost; clients refetch from origin. Cheap and simple.

For Redis-style: Sentinel or Cluster handles failover.

Eviction policies

When memory fills:

  • LRU (least recently used): evict the oldest-touched key. Simple; good for typical caches.
  • LFU (least frequently used): evict the least-used key. Better for skewed access.
  • FIFO: oldest first. Rare.
  • Random: pick a random key. Simple; surprisingly effective.

Memcached uses slab-based LRU. Redis exposes policies via maxmemory-policy .

TTL handling

Each entry has an expiry timestamp. Active expiration (sweep periodically) + lazy expiration (check on access). Both Memcached and Redis use a hybrid.

Hot keys

A single key receives 90% of traffic; one node bears the load. Solutions:

  • Client-side cache (small in-process LRU) catches most reads before the server.
  • Replicate hot keys to multiple nodes; client picks randomly.
  • Sub-shard a hot key: key:{key}:{0..N}; aggregate.

Same idea as Design a Leaderboard at Scale .

Cache stampede

Hot key expires; thousands of concurrent misses hit origin. Mitigations:

  • Single-flight: only one fetch per key in flight; others wait.
  • Probabilistic early refresh: refresh near (not at) TTL.
  • Stale-while-revalidate: serve stale; refresh in background.

For the patterns see Caching Strategies in 2026 .

Capacity

For a typical cache:

  • Per-node: 64–256 GB RAM, 32–64 vCPUs.
  • Items: ~10–100M per node.
  • Throughput: 100k–1M ops/sec/node.

A 16-node cluster → 1B items, 16M ops/sec.

What I’d actually use

For a real product:

  • Memcached for ephemeral caching (sessions, computed values).
  • Redis / Valkey when data structures matter (queues, sorted sets, streams).
  • Application-side LRU in-process for the hottest 1k items.

Don’t build a custom cache. The tradeoff space is solved.

Read this next

If you want a Python consistent-hashing client + Redis cache library, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .