A Postgres primary that goes down without automated failover is a 30-minute outage. With proper HA, it’s 30 seconds. This post is the working playbook for Postgres HA in 2026.

Replication shapes

Streaming (physical) replication

Block-level binary replay. Replicas are byte-identical to the primary.

  • Pros: Fast, low overhead, exact copy.
  • Cons: Same Postgres major version. All-or-nothing (can’t partial-replicate).
  • Best for: HA failover, read replicas.

Logical replication

Decodes WAL into row-level events; subscribers apply them.

  • Pros: Cross-version. Partial table sets. Feed non-Postgres consumers (Postgres CDC ).
  • Cons: Higher overhead. Slot management is operational work.
  • Best for: Migrations, CDC pipelines.

Use both. Streaming for HA, logical for streaming pipelines.

Auto-failover tools

Patroni

The mature standard. Uses etcd / ZooKeeper / Consul for consensus; manages primary election, replica promotion, fencing.

# patroni.yml
scope: prod-pg
namespace: /patroni
name: pg-1
restapi:
  listen: 0.0.0.0:8008
etcd:
  hosts: etcd-1:2379, etcd-2:2379, etcd-3:2379
postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/18/main

Battle-tested. Most production self-hosted Postgres clusters in 2026 run Patroni.

pg_auto_failover

Citus’s tool. Lighter than Patroni, simpler config. Only does failover.

Stolon

Spotify’s tool. Kubernetes-friendly. Less popular in 2026.

Cloud-managed

RDS Multi-AZ, Cloud SQL HA, Crunchy Bridge — handle failover automatically. The right choice for most teams.

RPO and RTO targets

  • RPO (Recovery Point Objective): how much data you can lose. Synchronous replication = 0; async streaming = seconds.
  • RTO (Recovery Time Objective): how long until service resumes. Auto-failover = 30s; manual = minutes.

Synchronous replication is expensive (every write waits for the replica to ack) but RPO=0 is required for some workloads. Pick consciously:

-- per-transaction
SET LOCAL synchronous_commit = on;
SET LOCAL synchronous_standby_names = 'replica1';
COMMIT;

Topology in 2026

Common production topology:

     Primary (write + sync replica ack)
       ├──── Sync replica (same AZ) ── for RPO=0
       ├──── Async replica (different AZ) ── for HA
       └──── Async replica (different region) ── for DR

Plus a logical replication slot to Kafka / search index.

Backups (don’t skip)

HA is not backup. Both can fail. Backups:

  • pg_basebackup for full snapshots.
  • WAL archiving for point-in-time recovery (PITR).
  • Tools: pgBackRest, Barman, WAL-G — pick one.
  • Test restores monthly. An untested backup is a hope.

Connection pooling

A single Postgres handles ~hundreds of concurrent connections well; thousands poorly. Use a pooler:

  • PgBouncer — classic, transaction-pooling mode.
  • Pgpool — heavier, more features.
  • Supavisor / Hyperdrive — Postgres-flavored cloud poolers.
# pgbouncer.ini
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 25

Apps connect to PgBouncer; PgBouncer multiplexes to the small DB pool.

Watch out: transaction-mode breaks LISTEN/NOTIFY, prepared statements (without protocol-level fix), and SET LOCAL. Plan around it.

Read scaling

Async replicas serve reads:

  • Application reads from a connection pool that points to replicas.
  • Writes always to primary.
  • Reads tolerate small lag.

For read-after-write consistency, route those reads to primary. SQLAlchemy’s bind_routing and Postgres’s synchronous_commit_* can help.

Operational realities

  • Replication lag is a metric. Alert on it.
  • WAL size if a slot is stuck. Monitor pg_replication_slots.
  • Failover testing — schedule it. A failover that’s never been tested doesn’t work.
  • Vacuum on the primary holds back replicas if there’s a long transaction. See PostgreSQL MVCC, Isolation, Locking .

Read this next

If you want a Patroni + PgBouncer + pgBackRest reference setup, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .