A cache is an optimization that becomes a hazard the day it gets stale, runs out of memory, develops a hot key, or experiences a stampede. This post is the working set of caching strategies and the gotchas that bite at scale.

I’ll also cover the Redis/Valkey split that landed in 2024 and stabilized through 2026 — relevant to almost every team running Redis.

Redis vs Valkey — what changed

In 2024 Redis switched away from BSD-3 to a source-available license. The Linux Foundation forked Redis 7.2.4 as Valkey, BSD-3 forever. Two years later:

  • Valkey is now the default in major Linux distros, AWS ElastiCache, GCP Memorystore.
  • Redis 8 added native vector search, JSON, time-series, hybrid search, semantic caching — broader feature set, but commercial license at scale.
  • API compatibility for the core command set is essentially identical. Most clients work with both.

For new projects in 2026: Valkey is the default unless you specifically need Redis 8’s vector/JSON/TS features. Existing Redis users on managed services have already been migrated transparently.

I’ll say “Redis” generically below — substitute Valkey in your head for new deployments.

The caching mental model

A cache is a temporary, faster copy of data whose source of truth is somewhere slower. Three things every cache decision touches:

  1. Coherence — when does cache content become stale, and what do we do about it?
  2. Eviction — what gets thrown out when memory fills?
  3. Failure — what happens when the cache is unavailable?

If a design ignores any of these, it’ll fail at the worst time.

Pattern 1 — Cache-aside (lazy loading)

Most common. Application checks cache, falls back to DB, populates cache.

async def get_user(user_id: int) -> User:
    cached = await redis.get(f"u:{user_id}")
    if cached:
        return User.parse_raw(cached)
    user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
    if user:
        await redis.set(f"u:{user_id}", user.json(), ex=300)
    return user
  • Pros: Simple. Cache contains only what was actually requested.
  • Cons: First request after expiry is slow. Doesn’t survive partial cache outages well without thought.

90% of caching is cache-aside. Start here.

Negative caching

Don’t forget to cache misses:

async def get_user(user_id: int) -> User | None:
    cached = await redis.get(f"u:{user_id}")
    if cached == b"":             # tombstone
        return None
    if cached:
        return User.parse_raw(cached)
    user = await db.fetch_one(...)
    if user:
        await redis.set(f"u:{user_id}", user.json(), ex=300)
    else:
        await redis.set(f"u:{user_id}", b"", ex=60)    # short TTL — DB might add the row soon
    return user

Without negative caching, a brute-force scanner pummels your DB on missing IDs.

Pattern 2 — Write-through

App writes go to cache and DB synchronously.

async def update_user(user: User):
    async with db.transaction():
        await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
        await redis.set(f"u:{user.id}", user.json(), ex=300)
  • Pros: Cache is always coherent.
  • Cons: Every write costs an extra round trip. Doesn’t help reads warmed by other paths.

Use when reads vastly outnumber writes and staleness is unacceptable.

Pattern 3 — Write-behind (write-back)

App writes to cache; cache asynchronously persists to DB.

  • Pros: Lowest write latency.
  • Cons: Real risk of data loss on cache failure. Hard to get right.

I almost never use write-behind for primary state. It’s appropriate for high-frequency, lossy data: counters, telemetry, analytics buffers.

Pattern 4 — Read-through

The cache itself fetches from DB on miss. Some libraries support this (DataLoader-style); pure Redis does not. Effectively the same as cache-aside but the cache mediates.

TTL design

The single most important caching decision. Three rules:

1. Don’t pick TTL by guess

Pick by how stale users tolerate this data:

  • Public profile: 1–5 min.
  • Pricing: 30s–2min.
  • Search index: minutes to hours.
  • User session: hours.
  • Static reference data: hours to days.

Then floor it by your write frequency: a 5-min TTL on data that updates every 10 seconds is mostly waste.

2. Add jitter

ttl = base_ttl + random.randint(0, base_ttl // 5)

Without jitter, every key set at the same load-test minute expires together → stampede. Jitter spreads expirations.

3. Stale-while-revalidate (SWR)

If staleness is mildly tolerable, return stale data while refreshing in the background:

async def get_user(user_id: int) -> User:
    cached = await redis.get(f"u:{user_id}")
    if cached:
        ttl = await redis.ttl(f"u:{user_id}")
        if ttl < 30:                      # nearing expiry
            asyncio.create_task(refresh_user(user_id))     # background
        return User.parse_raw(cached)
    return await refresh_user(user_id)

SWR is the difference between users seeing perfect freshness vs. seeing 200ms tail latency on every cache miss. Almost always worth it.

Stampede prevention

A hot key expires; 1000 simultaneous misses hit the DB. Defenses:

1. Single-flight (request coalescing)

Only one request per key in flight; others wait for it.

inflight: dict[str, asyncio.Future] = {}

async def get_user_coalesced(user_id: int) -> User:
    key = f"u:{user_id}"
    cached = await redis.get(key)
    if cached:
        return User.parse_raw(cached)

    if key in inflight:
        return await inflight[key]                # piggyback

    fut = asyncio.Future()
    inflight[key] = fut
    try:
        user = await db.fetch_one(...)
        await redis.set(key, user.json(), ex=300)
        fut.set_result(user)
    finally:
        inflight.pop(key, None)
    return user

Within a single process. For across-process coalescing, see #2.

2. Distributed locks for refresh

async def refresh_with_lock(user_id: int):
    key = f"u:{user_id}"
    lock = f"lock:{key}"
    if await redis.set(lock, "1", nx=True, ex=10):    # got the lock
        try:
            user = await db.fetch_one(...)
            await redis.set(key, user.json(), ex=300)
        finally:
            await redis.delete(lock)
    else:
        await asyncio.sleep(0.05)                     # someone else is refreshing
        cached = await redis.get(key)
        if cached:
            return User.parse_raw(cached)
        return await db.fetch_one(...)                # fallback

For high-traffic refreshes, this is essential.

3. Probabilistic early expiration

Refresh some requests slightly before the TTL hits, weighted by remaining TTL. Smooths out the expiration cliff.

Hot keys

A handful of keys serve 90% of traffic. The instance hosting that key is overwhelmed.

Mitigations:

  • Local cache in front of Redis. A 10ms TTL in-process cache cuts hot-key load by an order of magnitude with negligible staleness.
  • Sharding the value — split a popular item into N keys; pick one at random per request. Aggregate on update.
  • Replicas with read scaling — Redis replicas serve reads; primary serves writes. Read-mostly hot keys go to replicas.

Cache key design

  • Namespace: {service}:{type}:{id}payments:txn:42. Keeps multi-service Redis tidy.
  • Versioned: {namespace}:v3:{id} — bumping the version invalidates everything cleanly during deploys.
  • Avoid embedding query strings: q=foo&page=2&sort=date produces infinite cache space. Hash queries to bound key count.
  • Bound cardinality: don’t SET once per unique query if your tail is huge. Cache the popular ones; let the rest be slow.

Eviction policies

Redis policies:

  • noeviction — fail writes when out of memory. Default in older Redis. Surprising in production.
  • allkeys-lru — evict least-recently-used across all keys. Sane default for caches.
  • allkeys-lfu — least-frequently-used. Better for skewed access (a few hot keys).
  • volatile-ttl — evict keys closest to expiry. Useful when keys without TTLs are state.

For pure caching: allkeys-lru or allkeys-lfu. The choice depends on whether your access pattern is recency-biased or frequency-biased.

Multi-layer caching

Browser cache → CDN → Edge → Application local cache → Redis → DB

Each layer:

  • Is faster than the next.
  • Has shorter TTLs (typically).
  • Catches a majority of traffic.

The art is making layers compose:

  • HTTP Cache-Control headers govern browser + CDN.
  • Application sets its own per-key TTL.
  • Redis sits between application instances for cross-instance coherence.

A 1-second in-process cache backed by Redis backed by Postgres serves 99% of GET traffic from somewhere fast — without ever staling for more than a second.

Failure modes — what happens when Redis dies

You must have a story. Three options:

1. Cache-bypass (fail open)

try:
    cached = await redis.get(key)
    if cached:
        return cached
except RedisError:
    redis_circuit_breaker.trip()
return await db.fetch_one(...)

DB takes the load. Latency degrades. Application keeps serving.

2. Stale serve (last-known-good)

A separate, slower fallback (e.g., Postgres cached_responses table) serves stale-but-acceptable data when Redis is down.

3. Fail-closed

Refuse traffic. Rare and brutal; only appropriate when stale data would be unsafe (financial, security).

I default to #1 with a circuit breaker that closes after a brief recovery window. Most apps survive a Redis incident with degraded latency rather than an outage.

Coherence — the real hard part

Cache-aside makes the source of truth the DB. But what about cache invalidation on writes? Two strategies:

1. Invalidate on write

async def update_user(user: User):
    await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
    await redis.delete(f"u:{user.id}")          # next read repopulates

The race: writer deletes; reader inserts old value just before write commits. Real but rare; mitigations include double-delete (delete, sleep 100ms, delete again) or short TTLs that bound staleness.

2. Update on write (write-through)

Already covered. Eliminates the race.

3. PubSub / change-data-capture invalidation

The DB or an outbox emits change events; consumers invalidate caches. Best for cross-region or service-wide caches. Needs infrastructure (Debezium, Postgres logical replication).

Things you shouldn’t cache

  • Ultra-short-lived data that changes every request.
  • Personalized content without the user in the cache key (or use server-side cache with a per-user namespace).
  • Sensitive data without thinking about TTLs and security boundaries.
  • The very thing you’re trying to scale, when scaling the underlying store would be easier.

The “just add a cache” instinct often hides the real problem.

A checklist for adding caching

  • Why are we caching this? (Latency? DB load? Cost?)
  • What’s the TTL, and what’s the worst stale-data outcome?
  • What happens when the cache is down?
  • How do we invalidate on writes?
  • Stampede defense?
  • Hot-key plan?
  • Metrics: hit rate, miss rate, p95 latency, evictions?
  • Memory budget set, eviction policy chosen?
  • How do we observe staleness in production?

If you can answer these, you’re ready to ship the cache.

Read this next

If you want a small Python+Redis library wrapping cache-aside + SWR + single-flight + circuit breaker, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .