Cheatsheet for Postgres replication and failover.

Streaming replication (primary → standby)

primary’s postgresql.conf:

wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
hot_standby = on

primary’s pg_hba.conf:

host replication replicator 10.0.0.0/8 scram-sha-256

Standby setup:

pg_basebackup -h primary -U replicator -D /var/lib/postgresql/data -X stream
touch /var/lib/postgresql/data/standby.signal
# postgresql.conf:
primary_conninfo = 'host=primary user=replicator'

Logical replication (per-table)

primary:

ALTER SYSTEM SET wal_level = logical;
SELECT pg_reload_conf();

CREATE PUBLICATION my_pub FOR TABLE users, orders;

subscriber:

CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=primary user=replicator'
    PUBLICATION my_pub;

Use logical replication for major version upgrades, partial replication, multi-master patterns.

Cascading replication

Primary → Replica1 → Replica2

Replica1’s standby.signal + primary_conninfo points to Primary. Replica2’s points to Replica1.

Reduces load on Primary.

Monitoring lag

-- On primary
SELECT
    client_addr,
    state,
    pg_wal_lsn_diff(pg_current_wal_lsn(), sent_lsn)  AS sent_lag,
    pg_wal_lsn_diff(pg_current_wal_lsn(), write_lsn) AS write_lag,
    pg_wal_lsn_diff(pg_current_wal_lsn(), flush_lsn) AS flush_lag,
    pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS replay_lag
FROM pg_stat_replication;

Alert on replay_lag > threshold (e.g., 100MB).

Replication slots

SELECT pg_create_physical_replication_slot('replica1');

Slots ensure primary retains WAL needed by the replica. Risk: disconnected replica with active slot pins WAL forever → disk fills.

Monitor:

SELECT slot_name, active, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;

Drop unused slots.

Patroni (HA orchestrator)

scope: my-pg-cluster
namespace: /db/
name: pg-node-1

restapi:
  listen: 0.0.0.0:8008
  connect_address: 10.0.0.1:8008

etcd:
  hosts: ["etcd1:2379", "etcd2:2379", "etcd3:2379"]

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      parameters:
        wal_level: replica
        hot_standby: on
        max_wal_senders: 10

Patroni elects a leader via etcd/consul/zookeeper. Auto-promotes a replica on primary failure.

App connects to a stable endpoint (HAProxy / pgcat). Patroni updates routing.

Multi-host failover URL

"postgresql+asyncpg://app@primary:5432,replica:5432/db?target_session_attrs=read-write"

asyncpg tries hosts in order; selects writable. On failover: reconnects to the new primary.

Read-replica routing

write_engine = create_async_engine(WRITER_URL)
read_engine = create_async_engine(
    READER_URL,
    connect_args={"server_settings": {"default_transaction_read_only": "on"}},
)

Per-handler choice via Depends.

Read-after-write handling

User writes; immediate read might hit replica (still lagged):

RECENT_WRITER_TTL = 5

# After write
await redis.set(f"recent_writer:{user_id}", "1", ex=RECENT_WRITER_TTL)

# In get_db
if await redis.exists(f"recent_writer:{user_id}"):
    return WriteSession()    # route to primary briefly
else:
    return ReadSession()

Synchronous replication

synchronous_commit = on
synchronous_standby_names = 'replica1'

Primary waits for replica ACK before returning to client. Zero-data-loss; higher latency.

For most apps: async + tight lag monitoring is the sweet spot.

Major version upgrade via logical replication

Day 0: PG14 in prod.
Day 1: spin up PG17. Set up logical replication PG14 → PG17.
Day N: caught up; lag < 1s.
Day N+1: stop writes briefly. Swap app to PG17. Drop PG14.

Total downtime: seconds. Without this: hours of pg_dump/restore.

Failover testing

Quarterly:

  1. Announce window.
  2. Stop primary (gracefully or kill -9).
  3. Time the failover.
  4. Verify app reconnected.
  5. Document RTO actuals.

Without testing: failover is theory.

Cross-region DR

Region A (active): primary + 1 sync replica.
Region B (DR): async replica + WAL archive.

On region-A loss: promote region-B replica. Expect RPO = lag at moment of failure.

Common mistakes

  • Single replica = no HA (replica down + primary down = no failover candidate).
  • Replication slot leaks → disk full.
  • Async without lag monitoring → silent slow failover.
  • Failover never tested.
  • Pool not configured with pool_pre_ping → stale connections after promotion.

Read this next

If you want my Patroni + HAProxy + read-replica routing template, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .