Distributed locks are how multiple workers / processes / services coordinate access to shared resources. Done wrong, you get split-brain, double-spend, or deadlock. This post is the working playbook.
What you actually need
Before reaching for a distributed lock, ask: can you avoid it? Patterns that don’t need locks:
- Idempotent operations with deduplication (Idempotency, Retries, and Exactly-Once ).
- Database row locks (
SELECT FOR UPDATE) for resource-bound coordination. - Single-leader with leader election (one node owns it; others wait or read replicas).
- Eventually consistent operations that tolerate concurrent execution.
Distributed locks should be the last resort. They add complexity and failure modes.
Redis SET NX
# Acquire
got = await redis.set(f"lock:{key}", node_id, nx=True, ex=30)
if got:
try:
do_critical_work()
finally:
# Release ONLY if we still own it
await redis.eval("""
if redis.call("get", KEYS[1]) == ARGV[1] then
return redis.call("del", KEYS[1])
end
""", 1, f"lock:{key}", node_id)
SET NX EX atomically sets if not exists with a TTL. The Lua script for release ensures we only delete if we still own the lock (not someone else who acquired after our TTL expired).
Tradeoff: if your work takes longer than the TTL, another node can acquire while you’re still working. Always use fencing tokens (below).
Redlock
Salvatore (Redis author) proposed Redlock: lock against N independent Redis instances; need majority. Designed for higher availability.
In practice: Redlock has documented issues. Martin Kleppmann argued single-Redis SET NX with fencing tokens is just as good for most use cases. Redlock adds complexity without proportional safety.
For most apps in 2026: single Redis SET NX + fencing token. Reach for Redlock only if Redis HA matters more than locking complexity.
Fencing tokens
After acquiring a lock, get a monotonically increasing token. Pass it to the resource:
async def acquire_with_token(key):
token = await redis.incr(f"counter:{key}")
got = await redis.set(f"lock:{key}", str(token), nx=True, ex=30)
return token if got else None
async def write_with_fence(token, key, data):
# Resource server keeps a `last_token` per key
if token < await redis.get(f"last_token:{key}"):
raise StaleToken
await redis.set(f"last_token:{key}", token)
do_write(data)
Even if the original holder’s TTL expires and another acquires, the resource rejects writes with stale tokens. Eliminates “two writers at once” silently corrupting state.
Postgres advisory locks
-- Try to acquire a session-level lock
SELECT pg_try_advisory_lock(42);
-- Released on session end or explicit
SELECT pg_advisory_unlock(42);
-- Transaction-level: released on commit/rollback
SELECT pg_try_advisory_xact_lock(42);
Pros:
- Simple. No extra infrastructure if you already have Postgres.
- Tied to a connection — connection dies, lock releases automatically.
- Plays well with PostgreSQL MVCC .
Cons:
- Tied to one Postgres. If you have read replicas, the lock is on primary.
- 32-bit integer key; namespace via hashing.
Excellent for cron-leader-election, batch coordination, anything Postgres-resident.
etcd / ZooKeeper
For consensus-based locks (true safety): etcd’s Lease + Compare-And-Swap, Zookeeper’s ephemeral nodes. Used by Kubernetes (etcd) and many old-school distributed systems.
Pros: provably safe (Raft consensus). Cons: ops overhead.
For most apps: don’t run a dedicated ZK / etcd just for locks. Use Redis + fencing tokens or Postgres advisory locks.
Common patterns
Leader election
One process owns the role; others standby:
async def acquire_leader():
while True:
got = await redis.set("leader", node_id, nx=True, ex=10)
if got:
asyncio.create_task(refresh_leader_lease())
return True
await asyncio.sleep(2)
async def refresh_leader_lease():
while True:
await asyncio.sleep(5)
await redis.eval("""
if redis.call("get", KEYS[1]) == ARGV[1] then
redis.call("expire", KEYS[1], ARGV[2])
end
""", 1, "leader", node_id, "10")
If the leader dies, lease expires, another acquires. Standard pattern for cron jobs that should run on exactly one node.
Job deduplication
Two workers shouldn’t process the same job:
got = await redis.set(f"processing:{job_id}", node_id, nx=True, ex=300)
if not got:
return # someone else has it
try:
process(job_id)
finally:
await redis.delete(f"processing:{job_id}")
For task queue patterns
FOR UPDATE SKIP LOCKED is often simpler.
Common mistakes
1. No TTL
Lock forever if process dies. Always TTL.
2. TTL too short
Work takes longer than TTL; another acquires; chaos. Lengthen or use fencing tokens.
3. Releasing without checking ownership
Someone else now holds the lock; you delete; they think they still have it. Lua script that checks.
4. No fencing token for critical writes
Two writers at the same time silently corrupt. Fence.
5. Trusting clock-based locks for correctness
Clock skew exists. Locks based on wall time can fail. Logical / token-based is safer.
Read this next
- Distributed Systems Fundamentals
- PostgreSQL MVCC, Isolation, Locking
- Idempotency, Retries, and Exactly-Once Illusions
- Design a Distributed Task Queue
If you want a Redis-lock library with fencing tokens + leader election, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .