Most backend bugs that look like “race conditions” are actually misunderstandings about Postgres transactions. MVCC, isolation, and locking are the three concepts that explain what’s actually going on. This post is the working understanding.

MVCC — the foundation

Postgres uses MVCC (Multi-Version Concurrency Control). Two facts that explain almost everything:

  1. Writers never block readers. Readers see a consistent snapshot from the moment their statement (or transaction) started.
  2. Updates produce new row versions. The old version isn’t deleted immediately; it’s marked dead and cleaned up later by VACUUM.
-- Tx A
BEGIN;
SELECT balance FROM accounts WHERE id = 1;     -- reads version v1: $100

  -- Tx B (concurrent)
  BEGIN;
  UPDATE accounts SET balance = 200 WHERE id = 1;
  COMMIT;                                       -- writes version v2: $200

-- Tx A (still in same transaction)
SELECT balance FROM accounts WHERE id = 1;     -- depends on isolation level

What Tx A sees on the second SELECT depends on isolation. Default Postgres (READ COMMITTED): sees v2 ($200). Higher isolation (REPEATABLE READ): still sees v1 ($100), the snapshot from when Tx A started.

That’s MVCC. Each transaction has a snapshot; the snapshot determines visibility.

Isolation levels

Postgres supports four ANSI levels (technically — READ UNCOMMITTED is treated as READ COMMITTED):

LevelPhantom readsNon-repeatable readsDirty reads
READ UNCOMMITTEDAllowedAllowedDisallowed in PG
READ COMMITTED (default)AllowedAllowedDisallowed
REPEATABLE READPostgres preventsDisallowedDisallowed
SERIALIZABLEDisallowedDisallowedDisallowed

What each prevents:

  • Dirty read — seeing uncommitted data from another transaction. Postgres always prevents this.
  • Non-repeatable read — re-reading a row gives a different value. Default-prevented at REPEATABLE READ.
  • Phantom read — re-running a query returns different rows. Postgres’s REPEATABLE READ prevents this for snapshot-based phantoms, unlike standard SQL.

Use READ COMMITTED unless you have a reason. It’s the default for a reason — it lets concurrent work proceed without blocking on conflicts that don’t matter.

When to use REPEATABLE READ

BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT * FROM users WHERE id = 1;
-- ... do other work ...
SELECT * FROM users WHERE id = 1;     -- guaranteed same result as first read

COMMIT;

Use cases:

  • Reports and aggregations that span multiple SELECTs. You want a consistent point-in-time view.
  • Backups with pg_dump (which uses REPEATABLE READ internally).
  • Calculations like “sum customers’ balances” where the result must reflect a single moment.

The tradeoff: long-running REPEATABLE READ transactions hold a snapshot that prevents VACUUM from cleaning old row versions in the whole database. Bloat ensues. Keep these transactions short.

When to use SERIALIZABLE

BEGIN ISOLATION LEVEL SERIALIZABLE;
-- ... transactional logic ...
COMMIT;          -- might fail with serialization_failure

SERIALIZABLE prevents anomalies that REPEATABLE READ doesn’t — specifically write skew, where two transactions read overlapping data and write disjoint rows in a way that breaks an invariant.

The classic example:

-- Invariant: at most one doctor "on call" can be off-shift simultaneously.
-- Both transactions read "2 doctors on call" and decide it's safe to go off.
-- Both commit. Now zero doctors on call. 💥

SERIALIZABLE detects this and aborts one transaction with 40001 serialization_failure. Your application must retry on this error.

When to use:

  • Financial systems where invariants must hold strictly.
  • Inventory / booking systems with tight constraints.
  • Anywhere a write skew bug would be catastrophic.

The cost: failed transactions that need retry, more aborts under contention. Most production code uses READ COMMITTED with explicit SELECT FOR UPDATE to lock specific rows where it matters, instead of going to SERIALIZABLE.

Row-level locks

-- Acquire exclusive lock on a row
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;

-- Acquire shared lock (other readers OK; no writers)
SELECT * FROM accounts WHERE id = 1 FOR SHARE;

-- Try to acquire; fail immediately if can't
SELECT * FROM jobs WHERE status = 'pending' LIMIT 1 FOR UPDATE NOWAIT;

-- Try to acquire; skip locked rows
SELECT * FROM jobs WHERE status = 'pending' LIMIT 10 FOR UPDATE SKIP LOCKED;

FOR UPDATE — the workhorse

The pattern:

BEGIN;
SELECT balance FROM accounts WHERE id = 1 FOR UPDATE;
-- ... compute new balance in app ...
UPDATE accounts SET balance = $new WHERE id = 1;
COMMIT;

The lock is held until commit. Other transactions trying to UPDATE this row wait. This serializes the read-modify-write cycle.

SKIP LOCKED — the queue pattern

The killer pattern for queue-shaped workloads:

-- Worker fetches a batch of jobs no one else is processing
SELECT id, payload FROM jobs
WHERE status = 'pending'
ORDER BY id
LIMIT 10
FOR UPDATE SKIP LOCKED;

-- Mark as in-progress
UPDATE jobs SET status = 'processing' WHERE id = ANY($ids);
COMMIT;

Multiple worker processes can pull from the queue concurrently without contention. Postgres becomes a respectable job queue. Pair with the outbox pattern for cross-system reliability.

NOWAIT — fail fast

SELECT * FROM resources WHERE id = $1 FOR UPDATE NOWAIT;

If any other transaction holds the lock, you get an error immediately. Good for “either I get the lock now or I’ll retry later” patterns where waiting is worse than failing.

Advisory locks — application-level coordination

Sometimes you need to coordinate work that isn’t a single row. Postgres has advisory locks — application-defined integer keys you can lock against:

-- Try to take a session-level lock on the integer key 42. Returns true if acquired.
SELECT pg_try_advisory_lock(42);

-- Released on session end or explicit release
SELECT pg_advisory_unlock(42);

-- Transaction-level: released on commit/rollback
SELECT pg_try_advisory_xact_lock(42);

Use cases:

  • Cron / one-of-many leader election. Only one worker should run a periodic job; the others see false and skip.
  • Slow batch jobs where you want to coordinate across processes without rows to lock.
  • Distributed lock on a logical resource that spans multiple tables.
async def with_advisory_lock(conn, key: int, fn):
    got = await conn.fetchval("SELECT pg_try_advisory_lock($1)", key)
    if not got:
        return None                  # someone else holds it
    try:
        return await fn()
    finally:
        await conn.execute("SELECT pg_advisory_unlock($1)", key)

Don’t reach for advisory locks first — but when row locks don’t fit, they’re the right tool.

Deadlocks

A deadlock happens when two transactions hold locks the other needs:

Tx A: locks row 1, then waits for row 2
Tx B: locks row 2, then waits for row 1

Postgres detects this (deadlock_timeout = 1s by default) and aborts one transaction. Your app gets 40P01 deadlock_detected.

Avoiding deadlocks

The single most important rule: always acquire locks in the same order. If transaction A locks rows by ID ascending, transaction B should too. Sounds simple, gets violated all the time.

Other tactics:

  • Keep transactions short. The shorter the lock holding period, the less chance of overlap.
  • Use SKIP LOCKED for queues. Eliminates the queue contention class of deadlocks.
  • Lower isolation when possible. Higher isolation acquires more / longer locks.
  • Retry on 40P01. Treat it like a transient network error — back off, retry.
import asyncio
from psycopg.errors import DeadlockDetected, SerializationFailure

async def with_retry(coro_factory, attempts=3):
    for i in range(attempts):
        try:
            return await coro_factory()
        except (DeadlockDetected, SerializationFailure):
            if i == attempts - 1:
                raise
            await asyncio.sleep(0.05 * (2 ** i))

Snapshot bloat — the silent enemy

Long-running transactions hold a snapshot. Postgres can’t VACUUM rows that some live transaction might still see. So a forgotten BEGIN in a Python REPL holds back vacuum across the entire database.

Symptoms:

  • Tables growing far larger than their row count would suggest.
  • Slow queries because indexes have many dead tuples.
  • pg_stat_activity shows a transaction in idle in transaction for hours.

Mitigations:

  • idle_in_transaction_session_timeout = '5min' — Postgres kills idle-in-tx sessions automatically.
  • Audit your code: every BEGIN must have a COMMIT or ROLLBACK on the success / error paths.
  • For analytical / report transactions, run them on a logical replica so they don’t hold back vacuum on the primary.

Common mistakes

1. “I added a transaction so it’s atomic”

Wrapping in BEGIN/COMMIT doesn’t add isolation guarantees beyond your default level. If you read a value, compute, then write, two transactions can both read the same value, both compute, both write. You get the last-write-wins, not “atomic.” Use FOR UPDATE to serialize.

2. “I’ll just use SERIALIZABLE everywhere”

You’ll get many serialization_failure errors and slow throughput. SERIALIZABLE is right when correctness > throughput, not as a default.

3. Long transactions in pgbouncer transaction-pooling mode

pgbouncer in transaction pool mode reuses the same DB connection across transactions from different sessions. Things like LISTEN, SET LOCAL, and prepared statements break. Configure the pool mode that matches your traffic.

4. Locking more rows than necessary

SELECT * FROM accounts WHERE user_id = $1 FOR UPDATE;     -- locks all matching rows

If you only need one row, fetch the IDs first, then lock them specifically. Or use LIMIT 1. Locks have a memory and contention cost.

5. Treating advisory locks as durable

Advisory locks live in shared memory, not on disk. A cluster restart or failover loses them. They’re ephemeral coordination, not state.

A debugging checklist

When transactions misbehave:

  • What isolation level is each transaction using?
  • Are any transactions held open longer than seconds?
  • Is idle_in_transaction_session_timeout set?
  • Are deadlocks logging to log_min_messages?
  • What does pg_stat_activity show during the problem?
  • What does pg_locks show? (Joined with pg_stat_activity is gold.)
  • Are connections pooled in transaction mode? Could that be breaking session state?
  • Are you accidentally re-fetching after an UPDATE within the same transaction (and getting cached MVCC reads)?

Read this next

If you want my Postgres concurrency runbook with EXPLAIN snippets and a pg_locks debugger query set, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .