A cache is an optimization that becomes a hazard the day it gets stale, runs out of memory, develops a hot key, or experiences a stampede. This post is the working set of caching strategies and the gotchas that bite at scale.
I’ll also cover the Redis/Valkey split that landed in 2024 and stabilized through 2026 — relevant to almost every team running Redis.
Redis vs Valkey — what changed
In 2024 Redis switched away from BSD-3 to a source-available license. The Linux Foundation forked Redis 7.2.4 as Valkey, BSD-3 forever. Two years later:
- Valkey is now the default in major Linux distros, AWS ElastiCache, GCP Memorystore.
- Redis 8 added native vector search, JSON, time-series, hybrid search, semantic caching — broader feature set, but commercial license at scale.
- API compatibility for the core command set is essentially identical. Most clients work with both.
For new projects in 2026: Valkey is the default unless you specifically need Redis 8’s vector/JSON/TS features. Existing Redis users on managed services have already been migrated transparently.
I’ll say “Redis” generically below — substitute Valkey in your head for new deployments.
The caching mental model
A cache is a temporary, faster copy of data whose source of truth is somewhere slower. Three things every cache decision touches:
- Coherence — when does cache content become stale, and what do we do about it?
- Eviction — what gets thrown out when memory fills?
- Failure — what happens when the cache is unavailable?
If a design ignores any of these, it’ll fail at the worst time.
Pattern 1 — Cache-aside (lazy loading)
Most common. Application checks cache, falls back to DB, populates cache.
async def get_user(user_id: int) -> User:
cached = await redis.get(f"u:{user_id}")
if cached:
return User.parse_raw(cached)
user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
if user:
await redis.set(f"u:{user_id}", user.json(), ex=300)
return user
- Pros: Simple. Cache contains only what was actually requested.
- Cons: First request after expiry is slow. Doesn’t survive partial cache outages well without thought.
90% of caching is cache-aside. Start here.
Negative caching
Don’t forget to cache misses:
async def get_user(user_id: int) -> User | None:
cached = await redis.get(f"u:{user_id}")
if cached == b"": # tombstone
return None
if cached:
return User.parse_raw(cached)
user = await db.fetch_one(...)
if user:
await redis.set(f"u:{user_id}", user.json(), ex=300)
else:
await redis.set(f"u:{user_id}", b"", ex=60) # short TTL — DB might add the row soon
return user
Without negative caching, a brute-force scanner pummels your DB on missing IDs.
Pattern 2 — Write-through
App writes go to cache and DB synchronously.
async def update_user(user: User):
async with db.transaction():
await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
await redis.set(f"u:{user.id}", user.json(), ex=300)
- Pros: Cache is always coherent.
- Cons: Every write costs an extra round trip. Doesn’t help reads warmed by other paths.
Use when reads vastly outnumber writes and staleness is unacceptable.
Pattern 3 — Write-behind (write-back)
App writes to cache; cache asynchronously persists to DB.
- Pros: Lowest write latency.
- Cons: Real risk of data loss on cache failure. Hard to get right.
I almost never use write-behind for primary state. It’s appropriate for high-frequency, lossy data: counters, telemetry, analytics buffers.
Pattern 4 — Read-through
The cache itself fetches from DB on miss. Some libraries support this (DataLoader-style); pure Redis does not. Effectively the same as cache-aside but the cache mediates.
TTL design
The single most important caching decision. Three rules:
1. Don’t pick TTL by guess
Pick by how stale users tolerate this data:
- Public profile: 1–5 min.
- Pricing: 30s–2min.
- Search index: minutes to hours.
- User session: hours.
- Static reference data: hours to days.
Then floor it by your write frequency: a 5-min TTL on data that updates every 10 seconds is mostly waste.
2. Add jitter
ttl = base_ttl + random.randint(0, base_ttl // 5)
Without jitter, every key set at the same load-test minute expires together → stampede. Jitter spreads expirations.
3. Stale-while-revalidate (SWR)
If staleness is mildly tolerable, return stale data while refreshing in the background:
async def get_user(user_id: int) -> User:
cached = await redis.get(f"u:{user_id}")
if cached:
ttl = await redis.ttl(f"u:{user_id}")
if ttl < 30: # nearing expiry
asyncio.create_task(refresh_user(user_id)) # background
return User.parse_raw(cached)
return await refresh_user(user_id)
SWR is the difference between users seeing perfect freshness vs. seeing 200ms tail latency on every cache miss. Almost always worth it.
Stampede prevention
A hot key expires; 1000 simultaneous misses hit the DB. Defenses:
1. Single-flight (request coalescing)
Only one request per key in flight; others wait for it.
inflight: dict[str, asyncio.Future] = {}
async def get_user_coalesced(user_id: int) -> User:
key = f"u:{user_id}"
cached = await redis.get(key)
if cached:
return User.parse_raw(cached)
if key in inflight:
return await inflight[key] # piggyback
fut = asyncio.Future()
inflight[key] = fut
try:
user = await db.fetch_one(...)
await redis.set(key, user.json(), ex=300)
fut.set_result(user)
finally:
inflight.pop(key, None)
return user
Within a single process. For across-process coalescing, see #2.
2. Distributed locks for refresh
async def refresh_with_lock(user_id: int):
key = f"u:{user_id}"
lock = f"lock:{key}"
if await redis.set(lock, "1", nx=True, ex=10): # got the lock
try:
user = await db.fetch_one(...)
await redis.set(key, user.json(), ex=300)
finally:
await redis.delete(lock)
else:
await asyncio.sleep(0.05) # someone else is refreshing
cached = await redis.get(key)
if cached:
return User.parse_raw(cached)
return await db.fetch_one(...) # fallback
For high-traffic refreshes, this is essential.
3. Probabilistic early expiration
Refresh some requests slightly before the TTL hits, weighted by remaining TTL. Smooths out the expiration cliff.
Hot keys
A handful of keys serve 90% of traffic. The instance hosting that key is overwhelmed.
Mitigations:
- Local cache in front of Redis. A 10ms TTL in-process cache cuts hot-key load by an order of magnitude with negligible staleness.
- Sharding the value — split a popular item into N keys; pick one at random per request. Aggregate on update.
- Replicas with read scaling — Redis replicas serve reads; primary serves writes. Read-mostly hot keys go to replicas.
Cache key design
- Namespace:
{service}:{type}:{id}—payments:txn:42. Keeps multi-service Redis tidy. - Versioned:
{namespace}:v3:{id}— bumping the version invalidates everything cleanly during deploys. - Avoid embedding query strings:
q=foo&page=2&sort=dateproduces infinite cache space. Hash queries to bound key count. - Bound cardinality: don’t
SETonce per unique query if your tail is huge. Cache the popular ones; let the rest be slow.
Eviction policies
Redis policies:
noeviction— fail writes when out of memory. Default in older Redis. Surprising in production.allkeys-lru— evict least-recently-used across all keys. Sane default for caches.allkeys-lfu— least-frequently-used. Better for skewed access (a few hot keys).volatile-ttl— evict keys closest to expiry. Useful when keys without TTLs are state.
For pure caching: allkeys-lru or allkeys-lfu. The choice depends on whether your access pattern is recency-biased or frequency-biased.
Multi-layer caching
Browser cache → CDN → Edge → Application local cache → Redis → DB
Each layer:
- Is faster than the next.
- Has shorter TTLs (typically).
- Catches a majority of traffic.
The art is making layers compose:
- HTTP
Cache-Controlheaders govern browser + CDN. - Application sets its own per-key TTL.
- Redis sits between application instances for cross-instance coherence.
A 1-second in-process cache backed by Redis backed by Postgres serves 99% of GET traffic from somewhere fast — without ever staling for more than a second.
Failure modes — what happens when Redis dies
You must have a story. Three options:
1. Cache-bypass (fail open)
try:
cached = await redis.get(key)
if cached:
return cached
except RedisError:
redis_circuit_breaker.trip()
return await db.fetch_one(...)
DB takes the load. Latency degrades. Application keeps serving.
2. Stale serve (last-known-good)
A separate, slower fallback (e.g., Postgres cached_responses table) serves stale-but-acceptable data when Redis is down.
3. Fail-closed
Refuse traffic. Rare and brutal; only appropriate when stale data would be unsafe (financial, security).
I default to #1 with a circuit breaker that closes after a brief recovery window. Most apps survive a Redis incident with degraded latency rather than an outage.
Coherence — the real hard part
Cache-aside makes the source of truth the DB. But what about cache invalidation on writes? Two strategies:
1. Invalidate on write
async def update_user(user: User):
await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
await redis.delete(f"u:{user.id}") # next read repopulates
The race: writer deletes; reader inserts old value just before write commits. Real but rare; mitigations include double-delete (delete, sleep 100ms, delete again) or short TTLs that bound staleness.
2. Update on write (write-through)
Already covered. Eliminates the race.
3. PubSub / change-data-capture invalidation
The DB or an outbox emits change events; consumers invalidate caches. Best for cross-region or service-wide caches. Needs infrastructure (Debezium, Postgres logical replication).
Things you shouldn’t cache
- Ultra-short-lived data that changes every request.
- Personalized content without the user in the cache key (or use server-side cache with a per-user namespace).
- Sensitive data without thinking about TTLs and security boundaries.
- The very thing you’re trying to scale, when scaling the underlying store would be easier.
The “just add a cache” instinct often hides the real problem.
A checklist for adding caching
- Why are we caching this? (Latency? DB load? Cost?)
- What’s the TTL, and what’s the worst stale-data outcome?
- What happens when the cache is down?
- How do we invalidate on writes?
- Stampede defense?
- Hot-key plan?
- Metrics: hit rate, miss rate, p95 latency, evictions?
- Memory budget set, eviction policy chosen?
- How do we observe staleness in production?
If you can answer these, you’re ready to ship the cache.
Read this next
- Redis Caching Strategies — the original post; goes deeper on Redis specifics.
- Distributed Systems Fundamentals — the model.
- pgvector Deep Dive — when your “cache” is a vector index.
If you want a small Python+Redis library wrapping cache-aside + SWR + single-flight + circuit breaker, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .