Postgres VACUUM is the most operationally important feature most app developers don’t know they need. Without it, your database silently bloats, indexes degrade, and queries slow down. This post is the operator’s manual.

Why VACUUM exists

Postgres uses MVCC (PostgreSQL MVCC, Isolation, Locking ). Updates produce new row versions; old versions become “dead tuples” that take space until VACUUM cleans them.

UPDATE users SET name='B' WHERE id=42;
-- old (id=42, name='A') marked dead
-- new (id=42, name='B') inserted
-- VACUUM later reclaims the space

A table with 1M dead tuples behaves like a table with 2M live tuples for query planning. Slower scans. Bigger indexes. Worse perf.

Autovacuum

Postgres ships autovacuum on by default. It periodically vacuums tables that have accumulated dead tuples.

Default trigger: 20% of rows are dead. For high-write tables, this is way too late.

-- per-table override
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,    -- vacuum at 5%, not 20%
    autovacuum_analyze_scale_factor = 0.02
);

For frequently-updated hot tables, set to 1–5%. Vacuum runs more often, briefer each time.

What blocks vacuum

The killer: long-running transactions. Postgres can’t vacuum rows that some live transaction might still see. So:

  • A forgotten BEGIN in a Python REPL holds vacuum back across the cluster.
  • A long analytical query keeps a snapshot.
  • A logical replication slot that’s far behind retains WAL.

Symptoms: tables grow much larger than row count. Index sizes blow up. Queries slow.

Mitigations:

-- kill idle-in-transaction sessions
ALTER SYSTEM SET idle_in_transaction_session_timeout = '5min';

-- find long transactions
SELECT pid, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
ORDER BY duration DESC LIMIT 5;

Bloat detection

SELECT
    schemaname || '.' || relname AS table,
    pg_size_pretty(pg_total_relation_size(relid)) AS total_size,
    n_live_tup,
    n_dead_tup,
    round(n_dead_tup::numeric / NULLIF(n_live_tup, 0), 2) AS dead_ratio
FROM pg_stat_user_tables
ORDER BY dead_ratio DESC NULLS LAST
LIMIT 20;

A dead_ratio > 0.2 consistently is autovacuum lagging. Tune.

For more accurate bloat numbers, use the pgstattuple extension or the check_postgres bloat queries.

pg_repack — online compaction

When a table is heavily bloated and autovacuum can’t catch up:

pg_repack -t orders -j 4

Rebuilds the table online (no exclusive lock) into a compact form. Drops the bloated original; renames. Take a backup first; coordinate with peak hours.

VACUUM FULL does the same thing but takes an exclusive lock — never use in production.

Index bloat

Indexes bloat too. Symptoms: index size grows faster than table; queries slow.

REINDEX INDEX CONCURRENTLY orders_user_idx;

CONCURRENTLY doesn’t take an exclusive lock. Standard pattern for refreshing bloated indexes.

Partitioning as an alternative

For time-series data, partition by date (Postgres Partitioning ). Old partitions get dropped, not vacuumed:

DROP TABLE events_2024_q1;     -- instant; no bloat to clean

Eliminates the vacuum problem for cold data. Live partitions stay small enough for autovacuum to handle.

Wraparound — the doomsday

Postgres uses 32-bit transaction IDs. Without VACUUM, eventually IDs wrap and rows become invisible. Postgres aggressively VACUUM-prevents wraparound, sometimes hard-shutting down the database to protect data.

Symptom of imminent wraparound: warnings in logs about mxid_age or xid_age over a billion. Action: run aggressive VACUUM FREEZE or shrink long transactions.

This is rare on healthy systems but catastrophic when it hits. Monitor:

SELECT datname, age(datfrozenxid) AS xid_age FROM pg_database
ORDER BY xid_age DESC;

Should stay well below the autovacuum_freeze_max_age (default 200M).

Tuning checklist

For a write-heavy production cluster:

autovacuum_naptime = 30s
autovacuum_max_workers = 6
autovacuum_vacuum_scale_factor = 0.1     # default 0.2; tighter for hot tables
autovacuum_analyze_scale_factor = 0.05
maintenance_work_mem = 2GB                # speeds up VACUUM dramatically
idle_in_transaction_session_timeout = '5min'

Per-hot-table further tuning via ALTER TABLE ... SET.

Read this next

If you want my Postgres bloat-detection + alert templates, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .