Postgres replication options multiplied; topology decisions scale with team. This post is the working set: when each fits, what breaks, what to monitor.
Streaming replication
Primary writes WAL → ships to replicas → they replay.
Primary ──WAL──▶ Replica-1 (hot standby)
▶ Replica-2 (hot standby)
Hot standbys: read-only queries possible. Use for:
- HA: failover when primary dies.
- Read scale-out (carefully — read-after-write becomes async).
- Backup source (pg_basebackup off a replica).
# postgresql.conf on primary
wal_level = replica
max_wal_senders = 10
# pg_hba.conf
host replication repuser 10.0.0.0/8 scram-sha-256
Replica:
primary_conninfo = 'host=primary user=repuser'
hot_standby = on
Tools: [pg_auto_failover, Patroni, repmgr] orchestrate failover.
Logical replication
Per-table; cross-version; selective.
-- On publisher
CREATE PUBLICATION mypub FOR TABLE accounts, orders;
-- On subscriber
CREATE SUBSCRIPTION mysub
CONNECTION 'host=publisher dbname=app user=repl'
PUBLICATION mypub;
Use for:
- Major version upgrades with minimal downtime: replicate from PG14 to PG17, switch reads, drop old.
- Splitting a database: replicate a subset of tables to a new cluster.
- Cross-region read replicas with selective tables.
- Bidirectional is possible (pglogical / built-in active-active in 17+) but tricky.
Limitations: no DDL replicated; sequences manual; large initial copies are slow.
Cascading
Primary → Replica-1 → Replica-A
→ Replica-2 → Replica-B
→ Replica-C
Replicas can replicate from other replicas. Reduces WAL traffic on primary. Failover gets trickier.
Sync vs async
synchronous_commit = on # primary waits for replica ACK
synchronous_standby_names = 'r1' # which replica
Sync = zero data loss on failover. Cost: write latency = primary + RTT to replica.
For many shops: async + tight monitoring on lag is the sweet spot.
Lag monitoring
-- on primary
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn) AS sent_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lag,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag
FROM pg_stat_replication;
Alert if replay_lag exceeds (e.g., 100MB or 10 seconds). Lag spikes precede failover problems.
Read-after-write trap
Application writes to primary; reads from replica; sees stale data.
Mitigations:
- Read your writes from primary (recent-writes flag in session).
synchronous_commit = remote_apply— primary waits for replay (slow).- Causal reads (per-session token).
For most apps: route logged-in user’s writes-then-reads to primary; route public reads to replicas.
Connection pooling
PgBouncer in transaction-pooling mode in front of replicas:
[App] → PgBouncer → Primary
→ Replica-1 (read-only pool)
→ Replica-2 (read-only pool)
App uses two DSNs: writes go to primary pool, reads to replica pool. See Connection Pooling .
Failover
Patroni / pg_auto_failover handle:
- Detect primary down (consensus, not single node).
- Promote a replica.
- Reconfigure remaining replicas to follow new primary.
- Update DNS / load balancer / connection string.
DIY this and you’ll get split-brain at 3am. Use a battle-tested tool.
Major version upgrades via logical replication
Day 0: PG14 primary in prod.
Day 1: spin up PG17. Set up logical replication PG14 → PG17.
Day N: PG17 caught up; lag < 1s.
Day N+1: stop writes briefly, swap app to PG17, drop PG14.
Total downtime: seconds. Without this: hours of pg_dump/restore.
Cross-region
For DR:
Region A (primary) → Region A replica (HA)
→ Region B replica (DR)
Async. RTT 50–200ms. On region-A outage, promote region-B; expect data loss = lag at moment of failure.
Common mistakes
1. No replication slot
Without slots, primary can recycle WAL the replica still needs. Replica falls behind, can’t catch up. Use replication_slot for streaming.
2. Slot leaks
Disconnected replica with active slot keeps WAL pinned forever. Disk fills. Drop unused slots.
3. Single replica for HA
Replica down → primary down = no HA. Need ≥2 sync candidates.
4. App doesn’t know about replicas
App always uses primary; read scale-out wasted. Route reads explicitly.
5. Trusting pg_basebackup over WAN
Basebackup over flaky links fails halfway. Use pg_basebackup locally + ship base backup, then start streaming.
What I’d ship today
For a typical production Postgres:
- Streaming replication: 1 sync standby + 1 async standby + 1 cross-region async DR.
- Patroni managing failover.
- PgBouncer in transaction-pooling.
- Read replicas behind a separate DSN for read-heavy paths.
- Lag monitoring alerting at 100MB or 30s.
- Logical replication reserved for major-version upgrades.
Read this next
- Postgres Connection Pooling
- Postgres Performance Tuning 2026
- Postgres Backup and PITR 2026
- Postgres Operators on Kubernetes
If you want my Patroni + PgBouncer + monitoring runbook, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .