Cheatsheet for Postgres replication and failover.
Streaming replication (primary → standby)
primary’s postgresql.conf:
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on
primary’s pg_hba.conf:
host replication replicator 10.0.0.0/8 scram-sha-256
Standby setup:
pg_basebackup -h primary -U replicator -D /var/lib/postgresql/data -X stream
touch /var/lib/postgresql/data/standby.signal
# postgresql.conf:
primary_conninfo = 'host=primary user=replicator'
Logical replication (per-table)
primary:
ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();
CREATE PUBLICATION my_pub FOR TABLE users, orders;
subscriber:
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=primary user=replicator'
PUBLICATION my_pub;
Use logical replication for major version upgrades, partial replication, multi-master patterns.
Cascading replication
Primary → Replica1 → Replica2
Replica1’s standby.signal + primary_conninfo points to Primary. Replica2’s points to Replica1.
Reduces load on Primary.
Monitoring lag
-- On primary
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sent_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag
FROM pg_stat_replication;
Alert on replay_lag > threshold (e.g., 100MB).
Replication slots
SELECT pg_create_physical_replication_slot('replica1');
Slots ensure primary retains WAL needed by the replica. Risk: disconnected replica with active slot pins WAL forever → disk fills.
Monitor:
SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;
Drop unused slots.
Patroni (HA orchestrator)
scope: my-pg-cluster
namespace: /db/
name: pg-node-1
restapi:
listen: 0.0.0.0:8008
connect_address: 10.0.0.1:8008
etcd:
hosts: ["etcd1:2379", "etcd2:2379", "etcd3:2379"]
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
parameters:
wal_level: replica
hot_standby: on
max_wal_senders: 10
Patroni elects a leader via etcd/consul/zookeeper. Auto-promotes a replica on primary failure.
App connects to a stable endpoint (HAProxy / pgcat). Patroni updates routing.
Multi-host failover URL
"postgresql+asyncpg://app@primary:5432,replica:5432/db?target_session_attrs=read-write"
asyncpg tries hosts in order; selects writable. On failover: reconnects to the new primary.
Read-replica routing
write_engine = create_async_engine(WRITER_URL)
read_engine = create_async_engine(
READER_URL,
connect_args={"server_settings": {"default_transaction_read_only": "on"}},
)
Per-handler choice via Depends.
Read-after-write handling
User writes; immediate read might hit replica (still lagged):
RECENT_WRITER_TTL = 5
# After write
await redis.set(f"recent_writer:{user_id}", "1", ex=RECENT_WRITER_TTL)
# In get_db
if await redis.exists(f"recent_writer:{user_id}"):
return WriteSession() # route to primary briefly
else:
return ReadSession()
Synchronous replication
synchronous_commit = on
synchronous_standby_names = 'replica1'
Primary waits for replica ACK before returning to client. Zero-data-loss; higher latency.
For most apps: async + tight lag monitoring is the sweet spot.
Major version upgrade via logical replication
Day 0: PG14 in prod.
Day 1: spin up PG17. Set up logical replication PG14 → PG17.
Day N: caught up; lag < 1s.
Day N+1: stop writes briefly. Swap app to PG17. Drop PG14.
Total downtime: seconds. Without this: hours of pg_dump/restore.
Failover testing
Quarterly:
- Announce window.
- Stop primary (gracefully or kill -9).
- Time the failover.
- Verify app reconnected.
- Document RTO actuals.
Without testing: failover is theory.
Cross-region DR
Region A (active): primary + 1 sync replica.
Region B (DR): async replica + WAL archive.
On region-A loss: promote region-B replica. Expect RPO = lag at moment of failure.
Common mistakes
- Single replica = no HA (replica down + primary down = no failover candidate).
- Replication slot leaks → disk full.
- Async without lag monitoring → silent slow failover.
- Failover never tested.
- Pool not configured with pool_pre_ping → stale connections after promotion.
Read this next
If you want my Patroni + HAProxy + read-replica routing template, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .