Most backend bugs that look like “race conditions” are actually misunderstandings about Postgres transactions. MVCC, isolation, and locking are the three concepts that explain what’s actually going on. This post is the working understanding.
MVCC — the foundation
Postgres uses MVCC (Multi-Version Concurrency Control). Two facts that explain almost everything:
- Writers never block readers. Readers see a consistent snapshot from the moment their statement (or transaction) started.
- Updates produce new row versions. The old version isn’t deleted immediately; it’s marked dead and cleaned up later by
VACUUM.
-- Tx A
BEGIN;
SELECT balance FROM accounts WHERE id = 1; -- reads version v1: $100
-- Tx B (concurrent)
BEGIN;
UPDATE accounts SET balance = 200 WHERE id = 1;
COMMIT; -- writes version v2: $200
-- Tx A (still in same transaction)
SELECT balance FROM accounts WHERE id = 1; -- depends on isolation level
What Tx A sees on the second SELECT depends on isolation. Default Postgres (READ COMMITTED): sees v2 ($200). Higher isolation (REPEATABLE READ): still sees v1 ($100), the snapshot from when Tx A started.
That’s MVCC. Each transaction has a snapshot; the snapshot determines visibility.
Isolation levels
Postgres supports four ANSI levels (technically — READ UNCOMMITTED is treated as READ COMMITTED):
| Level | Phantom reads | Non-repeatable reads | Dirty reads |
|---|---|---|---|
| READ UNCOMMITTED | Allowed | Allowed | Disallowed in PG |
| READ COMMITTED (default) | Allowed | Allowed | Disallowed |
| REPEATABLE READ | Postgres prevents | Disallowed | Disallowed |
| SERIALIZABLE | Disallowed | Disallowed | Disallowed |
What each prevents:
- Dirty read — seeing uncommitted data from another transaction. Postgres always prevents this.
- Non-repeatable read — re-reading a row gives a different value. Default-prevented at REPEATABLE READ.
- Phantom read — re-running a query returns different rows. Postgres’s REPEATABLE READ prevents this for snapshot-based phantoms, unlike standard SQL.
Use READ COMMITTED unless you have a reason. It’s the default for a reason — it lets concurrent work proceed without blocking on conflicts that don’t matter.
When to use REPEATABLE READ
BEGIN ISOLATION LEVEL REPEATABLE READ;
SELECT * FROM users WHERE id = 1;
-- ... do other work ...
SELECT * FROM users WHERE id = 1; -- guaranteed same result as first read
COMMIT;
Use cases:
- Reports and aggregations that span multiple SELECTs. You want a consistent point-in-time view.
- Backups with
pg_dump(which uses REPEATABLE READ internally). - Calculations like “sum customers’ balances” where the result must reflect a single moment.
The tradeoff: long-running REPEATABLE READ transactions hold a snapshot that prevents VACUUM from cleaning old row versions in the whole database. Bloat ensues. Keep these transactions short.
When to use SERIALIZABLE
BEGIN ISOLATION LEVEL SERIALIZABLE;
-- ... transactional logic ...
COMMIT; -- might fail with serialization_failure
SERIALIZABLE prevents anomalies that REPEATABLE READ doesn’t — specifically write skew, where two transactions read overlapping data and write disjoint rows in a way that breaks an invariant.
The classic example:
-- Invariant: at most one doctor "on call" can be off-shift simultaneously.
-- Both transactions read "2 doctors on call" and decide it's safe to go off.
-- Both commit. Now zero doctors on call. 💥
SERIALIZABLE detects this and aborts one transaction with 40001 serialization_failure. Your application must retry on this error.
When to use:
- Financial systems where invariants must hold strictly.
- Inventory / booking systems with tight constraints.
- Anywhere a write skew bug would be catastrophic.
The cost: failed transactions that need retry, more aborts under contention. Most production code uses READ COMMITTED with explicit SELECT FOR UPDATE to lock specific rows where it matters, instead of going to SERIALIZABLE.
Row-level locks
-- Acquire exclusive lock on a row
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
-- Acquire shared lock (other readers OK; no writers)
SELECT * FROM accounts WHERE id = 1 FOR SHARE;
-- Try to acquire; fail immediately if can't
SELECT * FROM jobs WHERE status = 'pending' LIMIT 1 FOR UPDATE NOWAIT;
-- Try to acquire; skip locked rows
SELECT * FROM jobs WHERE status = 'pending' LIMIT 10 FOR UPDATE SKIP LOCKED;
FOR UPDATE — the workhorse
The pattern:
BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- ... compute new balance in app ...
UPDATE accounts SET balance = $new WHERE id = 1;
COMMIT;
The lock is held until commit. Other transactions trying to UPDATE this row wait. This serializes the read-modify-write cycle.
SKIP LOCKED — the queue pattern
The killer pattern for queue-shaped workloads:
-- Worker fetches a batch of jobs no one else is processing
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY id
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Mark as in-progress
UPDATE jobs SET status = 'processing' WHERE id = ANY($ids);
COMMIT;
Multiple worker processes can pull from the queue concurrently without contention. Postgres becomes a respectable job queue. Pair with the outbox pattern for cross-system reliability.
NOWAIT — fail fast
SELECT * FROM resources WHERE id = $1 FOR UPDATE NOWAIT;
If any other transaction holds the lock, you get an error immediately. Good for “either I get the lock now or I’ll retry later” patterns where waiting is worse than failing.
Advisory locks — application-level coordination
Sometimes you need to coordinate work that isn’t a single row. Postgres has advisory locks — application-defined integer keys you can lock against:
-- Try to take a session-level lock on the integer key 42. Returns true if acquired.
SELECT pg_try_advisory_lock(42);
-- Released on session end or explicit release
SELECT pg_advisory_unlock(42);
-- Transaction-level: released on commit/rollback
SELECT pg_try_advisory_xact_lock(42);
Use cases:
- Cron / one-of-many leader election. Only one worker should run a periodic job; the others see
falseand skip. - Slow batch jobs where you want to coordinate across processes without rows to lock.
- Distributed lock on a logical resource that spans multiple tables.
async def with_advisory_lock(conn, key: int, fn):
got = await conn.fetchval("SELECT pg_try_advisory_lock($1)", key)
if not got:
return None # someone else holds it
try:
return await fn()
finally:
await conn.execute("SELECT pg_advisory_unlock($1)", key)
Don’t reach for advisory locks first — but when row locks don’t fit, they’re the right tool.
Deadlocks
A deadlock happens when two transactions hold locks the other needs:
Tx A: locks row 1, then waits for row 2
Tx B: locks row 2, then waits for row 1
Postgres detects this (deadlock_timeout = 1s by default) and aborts one transaction. Your app gets 40P01 deadlock_detected.
Avoiding deadlocks
The single most important rule: always acquire locks in the same order. If transaction A locks rows by ID ascending, transaction B should too. Sounds simple, gets violated all the time.
Other tactics:
- Keep transactions short. The shorter the lock holding period, the less chance of overlap.
- Use SKIP LOCKED for queues. Eliminates the queue contention class of deadlocks.
- Lower isolation when possible. Higher isolation acquires more / longer locks.
- Retry on
40P01. Treat it like a transient network error — back off, retry.
import asyncio
from psycopg.errors import DeadlockDetected, SerializationFailure
async def with_retry(coro_factory, attempts=3):
for i in range(attempts):
try:
return await coro_factory()
except (DeadlockDetected, SerializationFailure):
if i == attempts - 1:
raise
await asyncio.sleep(0.05 * (2 ** i))
Snapshot bloat — the silent enemy
Long-running transactions hold a snapshot. Postgres can’t VACUUM rows that some live transaction might still see. So a forgotten BEGIN in a Python REPL holds back vacuum across the entire database.
Symptoms:
- Tables growing far larger than their row count would suggest.
- Slow queries because indexes have many dead tuples.
pg_stat_activityshows a transaction inidle in transactionfor hours.
Mitigations:
idle_in_transaction_session_timeout = '5min'— Postgres kills idle-in-tx sessions automatically.- Audit your code: every
BEGINmust have aCOMMITorROLLBACKon the success / error paths. - For analytical / report transactions, run them on a logical replica so they don’t hold back vacuum on the primary.
Common mistakes
1. “I added a transaction so it’s atomic”
Wrapping in BEGIN/COMMIT doesn’t add isolation guarantees beyond your default level. If you read a value, compute, then write, two transactions can both read the same value, both compute, both write. You get the last-write-wins, not “atomic.” Use FOR UPDATE to serialize.
2. “I’ll just use SERIALIZABLE everywhere”
You’ll get many serialization_failure errors and slow throughput. SERIALIZABLE is right when correctness > throughput, not as a default.
3. Long transactions in pgbouncer transaction-pooling mode
pgbouncer in transaction pool mode reuses the same DB connection across transactions from different sessions. Things like LISTEN, SET LOCAL, and prepared statements break. Configure the pool mode that matches your traffic.
4. Locking more rows than necessary
SELECT * FROM accounts WHERE user_id = $1 FOR UPDATE; -- locks all matching rows
If you only need one row, fetch the IDs first, then lock them specifically. Or use LIMIT 1. Locks have a memory and contention cost.
5. Treating advisory locks as durable
Advisory locks live in shared memory, not on disk. A cluster restart or failover loses them. They’re ephemeral coordination, not state.
A debugging checklist
When transactions misbehave:
- What isolation level is each transaction using?
- Are any transactions held open longer than seconds?
- Is
idle_in_transaction_session_timeoutset? - Are deadlocks logging to
log_min_messages? - What does
pg_stat_activityshow during the problem? - What does
pg_locksshow? (Joined withpg_stat_activityis gold.) - Are connections pooled in transaction mode? Could that be breaking session state?
- Are you accidentally re-fetching after an UPDATE within the same transaction (and getting cached MVCC reads)?
Read this next
- PostgreSQL Indexing and EXPLAIN
- PostgreSQL 18 — What’s New
- Idempotency, Retries, and Exactly-Once Illusions — patterns that lean on Postgres for correctness.
- Distributed Systems Fundamentals — the broader concurrency context.
If you want my Postgres concurrency runbook with EXPLAIN snippets and a pg_locks debugger query set, it’s at rajpoot.dev
.
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .