Postgres replication options multiplied; topology decisions scale with team. This post is the working set: when each fits, what breaks, what to monitor.

Streaming replication

Primary writes WAL → ships to replicas → they replay.

Primary ──WAL──▶ Replica-1 (hot standby)
              ▶ Replica-2 (hot standby)

Hot standbys: read-only queries possible. Use for:

  • HA: failover when primary dies.
  • Read scale-out (carefully — read-after-write becomes async).
  • Backup source (pg_basebackup off a replica).
# postgresql.conf on primary
wal_level = replica
max_wal_senders = 10

# pg_hba.conf
host replication repuser 10.0.0.0/8 scram-sha-256

Replica:

primary_conninfo = 'host=primary user=repuser'
hot_standby = on

Tools: [pg_auto_failover, Patroni, repmgr] orchestrate failover.

Logical replication

Per-table; cross-version; selective.

-- On publisher
CREATE PUBLICATION mypub FOR TABLE accounts, orders;

-- On subscriber
CREATE SUBSCRIPTION mysub
  CONNECTION 'host=publisher dbname=app user=repl'
  PUBLICATION mypub;

Use for:

  • Major version upgrades with minimal downtime: replicate from PG14 to PG17, switch reads, drop old.
  • Splitting a database: replicate a subset of tables to a new cluster.
  • Cross-region read replicas with selective tables.
  • Bidirectional is possible (pglogical / built-in active-active in 17+) but tricky.

Limitations: no DDL replicated; sequences manual; large initial copies are slow.

Cascading

Primary → Replica-1 → Replica-A
       → Replica-2 → Replica-B
                  → Replica-C

Replicas can replicate from other replicas. Reduces WAL traffic on primary. Failover gets trickier.

Sync vs async

synchronous_commit = on            # primary waits for replica ACK
synchronous_standby_names = 'r1'   # which replica

Sync = zero data loss on failover. Cost: write latency = primary + RTT to replica.

For many shops: async + tight monitoring on lag is the sweet spot.

Lag monitoring

-- on primary
SELECT
  client_addr,
  state,
  pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)  AS sent_lag,
  pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lag,
  pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lag,
  pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag
FROM pg_stat_replication;

Alert if replay_lag exceeds (e.g., 100MB or 10 seconds). Lag spikes precede failover problems.

Read-after-write trap

Application writes to primary; reads from replica; sees stale data.

Mitigations:

  • Read your writes from primary (recent-writes flag in session).
  • synchronous_commit = remote_apply — primary waits for replay (slow).
  • Causal reads (per-session token).

For most apps: route logged-in user’s writes-then-reads to primary; route public reads to replicas.

Connection pooling

PgBouncer in transaction-pooling mode in front of replicas:

[App] → PgBouncer → Primary
                  → Replica-1 (read-only pool)
                  → Replica-2 (read-only pool)

App uses two DSNs: writes go to primary pool, reads to replica pool. See Connection Pooling .

Failover

Patroni / pg_auto_failover handle:

  1. Detect primary down (consensus, not single node).
  2. Promote a replica.
  3. Reconfigure remaining replicas to follow new primary.
  4. Update DNS / load balancer / connection string.

DIY this and you’ll get split-brain at 3am. Use a battle-tested tool.

Major version upgrades via logical replication

Day 0: PG14 primary in prod.
Day 1: spin up PG17. Set up logical replication PG14 → PG17.
Day N: PG17 caught up; lag < 1s.
Day N+1: stop writes briefly, swap app to PG17, drop PG14.

Total downtime: seconds. Without this: hours of pg_dump/restore.

Cross-region

For DR:

Region A (primary) → Region A replica (HA)
                   → Region B replica (DR)

Async. RTT 50–200ms. On region-A outage, promote region-B; expect data loss = lag at moment of failure.

Common mistakes

1. No replication slot

Without slots, primary can recycle WAL the replica still needs. Replica falls behind, can’t catch up. Use replication_slot for streaming.

2. Slot leaks

Disconnected replica with active slot keeps WAL pinned forever. Disk fills. Drop unused slots.

3. Single replica for HA

Replica down → primary down = no HA. Need ≥2 sync candidates.

4. App doesn’t know about replicas

App always uses primary; read scale-out wasted. Route reads explicitly.

5. Trusting pg_basebackup over WAN

Basebackup over flaky links fails halfway. Use pg_basebackup locally + ship base backup, then start streaming.

What I’d ship today

For a typical production Postgres:

  • Streaming replication: 1 sync standby + 1 async standby + 1 cross-region async DR.
  • Patroni managing failover.
  • PgBouncer in transaction-pooling.
  • Read replicas behind a separate DSN for read-heavy paths.
  • Lag monitoring alerting at 100MB or 30s.
  • Logical replication reserved for major-version upgrades.

Read this next

If you want my Patroni + PgBouncer + monitoring runbook, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .