Streaming or logical replication?

Streaming for HA and read replicas (binary-identical, fast, lockstep). Logical for cross-version migrations, partial replication, or feeding non-Postgres targets like Kafka. Use both — streaming for HA, logical for CDC.

Patroni or managed Postgres?

Managed (RDS, Cloud SQL, Aiven, Crunchy) for most teams — they handle failover. Patroni / Stolon if you self-host on Kubernetes and want full control. Pick managed unless you have a specific reason.

Postgres Replication and HA in 2026 — Streaming, Logical, and Auto-Failover

A Postgres primary that goes down without automated failover is a 30-minute outage. With proper HA, it’s 30 seconds. This post is the working playbook for Postgres HA in 2026.

Replication shapes

Streaming (physical) replication

Block-level binary replay. Replicas are byte-identical to the primary.

Pros: Fast, low overhead, exact copy.
Cons: Same Postgres major version. All-or-nothing (can’t partial-replicate).
Best for: HA failover, read replicas.

Logical replication

Decodes WAL into row-level events; subscribers apply them.

Pros: Cross-version. Partial table sets. Feed non-Postgres consumers (Postgres CDC ).
Cons: Higher overhead. Slot management is operational work.
Best for: Migrations, CDC pipelines.

Use both. Streaming for HA, logical for streaming pipelines.

Auto-failover tools

Patroni

The mature standard. Uses etcd / ZooKeeper / Consul for consensus; manages primary election, replica promotion, fencing.

# patroni.yml
scope: prod-pg
namespace: /patroni
name: pg-1
restapi:
  listen: 0.0.0.0:8008
etcd:
  hosts: etcd-1:2379, etcd-2:2379, etcd-3:2379
postgresql:
  listen: 0.0.0.0:5432
  data_dir: /var/lib/postgresql/18/main

Battle-tested. Most production self-hosted Postgres clusters in 2026 run Patroni.

pg_auto_failover

Citus’s tool. Lighter than Patroni, simpler config. Only does failover.

Stolon

Spotify’s tool. Kubernetes-friendly. Less popular in 2026.

Cloud-managed

RDS Multi-AZ, Cloud SQL HA, Crunchy Bridge — handle failover automatically. The right choice for most teams.

RPO and RTO targets

RPO (Recovery Point Objective): how much data you can lose. Synchronous replication = 0; async streaming = seconds.
RTO (Recovery Time Objective): how long until service resumes. Auto-failover = 30s; manual = minutes.

Synchronous replication is expensive (every write waits for the replica to ack) but RPO=0 is required for some workloads. Pick consciously:

-- per-transaction
SET LOCAL synchronous_commit = on;
SET LOCAL synchronous_standby_names = 'replica1';
COMMIT;

Topology in 2026

Common production topology:

     Primary (write + sync replica ack)
       │
       ├──── Sync replica (same AZ) ── for RPO=0
       │
       ├──── Async replica (different AZ) ── for HA
       │
       └──── Async replica (different region) ── for DR

Plus a logical replication slot to Kafka / search index.

Backups (don’t skip)

HA is not backup. Both can fail. Backups:

pg_basebackup for full snapshots.
WAL archiving for point-in-time recovery (PITR).
Tools: pgBackRest, Barman, WAL-G — pick one.
Test restores monthly. An untested backup is a hope.

Connection pooling

A single Postgres handles ~hundreds of concurrent connections well; thousands poorly. Use a pooler:

PgBouncer — classic, transaction-pooling mode.
Pgpool — heavier, more features.
Supavisor / Hyperdrive — Postgres-flavored cloud poolers.

# pgbouncer.ini
pool_mode = transaction
max_client_conn = 10000
default_pool_size = 25

Apps connect to PgBouncer; PgBouncer multiplexes to the small DB pool.

Watch out: transaction-mode breaks LISTEN/NOTIFY, prepared statements (without protocol-level fix), and SET LOCAL. Plan around it.

Read scaling

Async replicas serve reads:

Application reads from a connection pool that points to replicas.
Writes always to primary.
Reads tolerate small lag.

For read-after-write consistency, route those reads to primary. SQLAlchemy’s bind_routing and Postgres’s synchronous_commit_* can help.

Operational realities

Replication lag is a metric. Alert on it.
WAL size if a slot is stuck. Monitor pg_replication_slots.
Failover testing — schedule it. A failover that’s never been tested doesn’t work.
Vacuum on the primary holds back replicas if there’s a long transaction. See PostgreSQL MVCC, Isolation, Locking .

Read this next

If you want a Patroni + PgBouncer + pgBackRest reference setup, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Replication shapes#

Streaming (physical) replication#

Logical replication#

Auto-failover tools#

Patroni#

pg_auto_failover#

Stolon#

Cloud-managed#

RPO and RTO targets#

Topology in 2026#

Backups (don’t skip)#

Connection pooling#

Read scaling#

Operational realities#

Read this next#