Caching Strategies in 2026 — Redis, Valkey, and the Patterns That Actually Work

A cache is an optimization that becomes a hazard the day it gets stale, runs out of memory, develops a hot key, or experiences a stampede. This post is the working set of caching strategies and the gotchas that bite at scale.

I’ll also cover the Redis/Valkey split that landed in 2024 and stabilized through 2026 — relevant to almost every team running Redis.

Redis vs Valkey — what changed

In 2024 Redis switched away from BSD-3 to a source-available license. The Linux Foundation forked Redis 7.2.4 as Valkey, BSD-3 forever. Two years later:

Valkey is now the default in major Linux distros, AWS ElastiCache, GCP Memorystore.
Redis 8 added native vector search, JSON, time-series, hybrid search, semantic caching — broader feature set, but commercial license at scale.
API compatibility for the core command set is essentially identical. Most clients work with both.

For new projects in 2026: Valkey is the default unless you specifically need Redis 8’s vector/JSON/TS features. Existing Redis users on managed services have already been migrated transparently.

I’ll say “Redis” generically below — substitute Valkey in your head for new deployments.

The caching mental model

A cache is a temporary, faster copy of data whose source of truth is somewhere slower. Three things every cache decision touches:

Coherence — when does cache content become stale, and what do we do about it?
Eviction — what gets thrown out when memory fills?
Failure — what happens when the cache is unavailable?

If a design ignores any of these, it’ll fail at the worst time.

Pattern 1 — Cache-aside (lazy loading)

Most common. Application checks cache, falls back to DB, populates cache.

async def get_user(user_id: int) -> User:
    cached = await redis.get(f"u:{user_id}")
    if cached:
        return User.parse_raw(cached)
    user = await db.fetch_one("SELECT * FROM users WHERE id = $1", user_id)
    if user:
        await redis.set(f"u:{user_id}", user.json(), ex=300)
    return user

Pros: Simple. Cache contains only what was actually requested.
Cons: First request after expiry is slow. Doesn’t survive partial cache outages well without thought.

90% of caching is cache-aside. Start here.

Negative caching

Don’t forget to cache misses:

async def get_user(user_id: int) -> User | None:
    cached = await redis.get(f"u:{user_id}")
    if cached == b"":             # tombstone
        return None
    if cached:
        return User.parse_raw(cached)
    user = await db.fetch_one(...)
    if user:
        await redis.set(f"u:{user_id}", user.json(), ex=300)
    else:
        await redis.set(f"u:{user_id}", b"", ex=60)    # short TTL — DB might add the row soon
    return user

Without negative caching, a brute-force scanner pummels your DB on missing IDs.

Pattern 2 — Write-through

App writes go to cache and DB synchronously.

async def update_user(user: User):
    async with db.transaction():
        await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
        await redis.set(f"u:{user.id}", user.json(), ex=300)

Pros: Cache is always coherent.
Cons: Every write costs an extra round trip. Doesn’t help reads warmed by other paths.

Use when reads vastly outnumber writes and staleness is unacceptable.

Pattern 3 — Write-behind (write-back)

App writes to cache; cache asynchronously persists to DB.

Pros: Lowest write latency.
Cons: Real risk of data loss on cache failure. Hard to get right.

I almost never use write-behind for primary state. It’s appropriate for high-frequency, lossy data: counters, telemetry, analytics buffers.

Pattern 4 — Read-through

The cache itself fetches from DB on miss. Some libraries support this (DataLoader-style); pure Redis does not. Effectively the same as cache-aside but the cache mediates.

TTL design

The single most important caching decision. Three rules:

1. Don’t pick TTL by guess

Pick by how stale users tolerate this data:

Public profile: 1–5 min.
Pricing: 30s–2min.
Search index: minutes to hours.
User session: hours.
Static reference data: hours to days.

Then floor it by your write frequency: a 5-min TTL on data that updates every 10 seconds is mostly waste.

2. Add jitter

ttl = base_ttl + random.randint(0, base_ttl // 5)

Without jitter, every key set at the same load-test minute expires together → stampede. Jitter spreads expirations.

3. Stale-while-revalidate (SWR)

If staleness is mildly tolerable, return stale data while refreshing in the background:

async def get_user(user_id: int) -> User:
    cached = await redis.get(f"u:{user_id}")
    if cached:
        ttl = await redis.ttl(f"u:{user_id}")
        if ttl < 30:                      # nearing expiry
            asyncio.create_task(refresh_user(user_id))     # background
        return User.parse_raw(cached)
    return await refresh_user(user_id)

SWR is the difference between users seeing perfect freshness vs. seeing 200ms tail latency on every cache miss. Almost always worth it.

Stampede prevention

A hot key expires; 1000 simultaneous misses hit the DB. Defenses:

1. Single-flight (request coalescing)

Only one request per key in flight; others wait for it.

inflight: dict[str, asyncio.Future] = {}

async def get_user_coalesced(user_id: int) -> User:
    key = f"u:{user_id}"
    cached = await redis.get(key)
    if cached:
        return User.parse_raw(cached)

    if key in inflight:
        return await inflight[key]                # piggyback

    fut = asyncio.Future()
    inflight[key] = fut
    try:
        user = await db.fetch_one(...)
        await redis.set(key, user.json(), ex=300)
        fut.set_result(user)
    finally:
        inflight.pop(key, None)
    return user

Within a single process. For across-process coalescing, see #2.

2. Distributed locks for refresh

async def refresh_with_lock(user_id: int):
    key = f"u:{user_id}"
    lock = f"lock:{key}"
    if await redis.set(lock, "1", nx=True, ex=10):    # got the lock
        try:
            user = await db.fetch_one(...)
            await redis.set(key, user.json(), ex=300)
        finally:
            await redis.delete(lock)
    else:
        await asyncio.sleep(0.05)                     # someone else is refreshing
        cached = await redis.get(key)
        if cached:
            return User.parse_raw(cached)
        return await db.fetch_one(...)                # fallback

For high-traffic refreshes, this is essential.

3. Probabilistic early expiration

Refresh some requests slightly before the TTL hits, weighted by remaining TTL. Smooths out the expiration cliff.

Hot keys

A handful of keys serve 90% of traffic. The instance hosting that key is overwhelmed.

Mitigations:

Local cache in front of Redis. A 10ms TTL in-process cache cuts hot-key load by an order of magnitude with negligible staleness.
Sharding the value — split a popular item into N keys; pick one at random per request. Aggregate on update.
Replicas with read scaling — Redis replicas serve reads; primary serves writes. Read-mostly hot keys go to replicas.

Cache key design

Namespace: {service}:{type}:{id} — payments:txn:42. Keeps multi-service Redis tidy.
Versioned: {namespace}:v3:{id} — bumping the version invalidates everything cleanly during deploys.
Avoid embedding query strings: q=foo&page=2&sort=date produces infinite cache space. Hash queries to bound key count.
Bound cardinality: don’t SET once per unique query if your tail is huge. Cache the popular ones; let the rest be slow.

Eviction policies

Redis policies:

noeviction — fail writes when out of memory. Default in older Redis. Surprising in production.
allkeys-lru — evict least-recently-used across all keys. Sane default for caches.
allkeys-lfu — least-frequently-used. Better for skewed access (a few hot keys).
volatile-ttl — evict keys closest to expiry. Useful when keys without TTLs are state.

For pure caching: allkeys-lru or allkeys-lfu. The choice depends on whether your access pattern is recency-biased or frequency-biased.

Multi-layer caching

Browser cache → CDN → Edge → Application local cache → Redis → DB

Each layer:

Is faster than the next.
Has shorter TTLs (typically).
Catches a majority of traffic.

The art is making layers compose:

HTTP Cache-Control headers govern browser + CDN.
Application sets its own per-key TTL.
Redis sits between application instances for cross-instance coherence.

A 1-second in-process cache backed by Redis backed by Postgres serves 99% of GET traffic from somewhere fast — without ever staling for more than a second.

Failure modes — what happens when Redis dies

You must have a story. Three options:

1. Cache-bypass (fail open)

try:
    cached = await redis.get(key)
    if cached:
        return cached
except RedisError:
    redis_circuit_breaker.trip()
return await db.fetch_one(...)

DB takes the load. Latency degrades. Application keeps serving.

2. Stale serve (last-known-good)

A separate, slower fallback (e.g., Postgres cached_responses table) serves stale-but-acceptable data when Redis is down.

3. Fail-closed

Refuse traffic. Rare and brutal; only appropriate when stale data would be unsafe (financial, security).

I default to #1 with a circuit breaker that closes after a brief recovery window. Most apps survive a Redis incident with degraded latency rather than an outage.

Coherence — the real hard part

Cache-aside makes the source of truth the DB. But what about cache invalidation on writes? Two strategies:

1. Invalidate on write

async def update_user(user: User):
    await db.execute("UPDATE users SET ... WHERE id = $1", user.id)
    await redis.delete(f"u:{user.id}")          # next read repopulates

The race: writer deletes; reader inserts old value just before write commits. Real but rare; mitigations include double-delete (delete, sleep 100ms, delete again) or short TTLs that bound staleness.

2. Update on write (write-through)

Already covered. Eliminates the race.

3. PubSub / change-data-capture invalidation

The DB or an outbox emits change events; consumers invalidate caches. Best for cross-region or service-wide caches. Needs infrastructure (Debezium, Postgres logical replication).

Things you shouldn’t cache

Ultra-short-lived data that changes every request.
Personalized content without the user in the cache key (or use server-side cache with a per-user namespace).
Sensitive data without thinking about TTLs and security boundaries.
The very thing you’re trying to scale, when scaling the underlying store would be easier.

The “just add a cache” instinct often hides the real problem.

A checklist for adding caching

Why are we caching this? (Latency? DB load? Cost?)
What’s the TTL, and what’s the worst stale-data outcome?
What happens when the cache is down?
How do we invalidate on writes?
Stampede defense?
Hot-key plan?
Metrics: hit rate, miss rate, p95 latency, evictions?
Memory budget set, eviction policy chosen?
How do we observe staleness in production?

If you can answer these, you’re ready to ship the cache.

Redis vs Valkey — what changed#

The caching mental model#

Pattern 1 — Cache-aside (lazy loading)#

Negative caching#

Pattern 2 — Write-through#

Pattern 3 — Write-behind (write-back)#

Pattern 4 — Read-through#

TTL design#

1. Don’t pick TTL by guess#

2. Add jitter#

3. Stale-while-revalidate (SWR)#

Stampede prevention#

1. Single-flight (request coalescing)#

2. Distributed locks for refresh#

3. Probabilistic early expiration#

Hot keys#

Cache key design#

Eviction policies#

Multi-layer caching#

Failure modes — what happens when Redis dies#

1. Cache-bypass (fail open)#

2. Stale serve (last-known-good)#

3. Fail-closed#

Coherence — the real hard part#

1. Invalidate on write#

2. Update on write (write-through)#

3. PubSub / change-data-capture invalidation#

Things you shouldn’t cache#

A checklist for adding caching#

Read this next#