A distributed cache is the system design interview classic. Even if you’ll never build one, designing it teaches you consistent hashing, replication, and the cache-coherence tradeoffs that show up everywhere.
Requirements
- Get / set / delete by key.
- Sub-millisecond latency.
- Survive node failure.
- Scale horizontally.
- Eviction when full.
Architecture
Client SDK (consistent-hashes key → server)
↓
Cache nodes (in-memory store)
↓
(optional) Replica per shard for HA
Stateless clients. Stateful servers. No central coordinator.
Consistent hashing
Naive: shard = hash(key) % N. Adding a node moves nearly all keys.
Consistent hash: keys mapped to a virtual ring; each node owns an arc. Adding a node moves only its arc’s worth of keys.
class ConsistentHash:
def __init__(self, nodes, virtual=150):
self.ring = {}
for node in nodes:
for v in range(virtual):
key = hash(f"{node}:{v}")
self.ring[key] = node
self.sorted_keys = sorted(self.ring)
def get(self, key):
h = hash(key)
# find first ring position >= h
for k in self.sorted_keys:
if k >= h:
return self.ring[k]
return self.ring[self.sorted_keys[0]] # wrap
Virtual nodes (150 per real node) smooth out distribution.
Replication
For HA: each shard has a primary + 1–2 replicas. On primary failure, a replica is promoted.
For Memcached-style ephemeral cache: often no replication. A node failure → those keys are lost; clients refetch from origin. Cheap and simple.
For Redis-style: Sentinel or Cluster handles failover.
Eviction policies
When memory fills:
- LRU (least recently used): evict the oldest-touched key. Simple; good for typical caches.
- LFU (least frequently used): evict the least-used key. Better for skewed access.
- FIFO: oldest first. Rare.
- Random: pick a random key. Simple; surprisingly effective.
Memcached uses slab-based LRU. Redis exposes policies via maxmemory-policy
.
TTL handling
Each entry has an expiry timestamp. Active expiration (sweep periodically) + lazy expiration (check on access). Both Memcached and Redis use a hybrid.
Hot keys
A single key receives 90% of traffic; one node bears the load. Solutions:
- Client-side cache (small in-process LRU) catches most reads before the server.
- Replicate hot keys to multiple nodes; client picks randomly.
- Sub-shard a hot key:
key:{key}:{0..N}; aggregate.
Same idea as Design a Leaderboard at Scale .
Cache stampede
Hot key expires; thousands of concurrent misses hit origin. Mitigations:
- Single-flight: only one fetch per key in flight; others wait.
- Probabilistic early refresh: refresh near (not at) TTL.
- Stale-while-revalidate: serve stale; refresh in background.
For the patterns see Caching Strategies in 2026 .
Capacity
For a typical cache:
- Per-node: 64–256 GB RAM, 32–64 vCPUs.
- Items: ~10–100M per node.
- Throughput: 100k–1M ops/sec/node.
A 16-node cluster → 1B items, 16M ops/sec.
What I’d actually use
For a real product:
- Memcached for ephemeral caching (sessions, computed values).
- Redis / Valkey when data structures matter (queues, sorted sets, streams).
- Application-side LRU in-process for the hottest 1k items.
Don’t build a custom cache. The tradeoff space is solved.
Read this next
- Caching Strategies in 2026
- Design a Leaderboard / Counter System
- Distributed Systems Fundamentals
- Design a Distributed Rate Limiter at Scale
If you want a Python consistent-hashing client + Redis cache library, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .