Should I use Debezium or the outbox pattern?

Outbox if you want strong control and don't already run Kafka Connect — write events to an outbox table in the same transaction, relay to broker via a worker. Debezium when you have many tables to replicate and run Kafka anyway. Both produce reliable change streams; outbox is simpler to operate at small scale.

Does CDC work without Kafka?

Yes. You can read Postgres logical replication directly from any compatible client — Debezium Server (no Kafka), Materialize, RisingWave, or hand-rolled consumers using libraries like pgoutput-rs or psycopg's replication API. Kafka is one consumer, not a requirement.

What's the performance overhead of logical replication?

Minimal — typically a few percent CPU on the primary. The main cost is WAL retention: Postgres can't recycle WAL until the slowest replication slot has consumed it. Monitor slot lag carefully; an offline consumer can fill the disk.

Postgres CDC and Logical Replication in 2026 — Debezium, pgoutput, and the Outbox Alternatives

By 2026 every serious Postgres-backed product has a change-data-capture (CDC) pipeline. Stream changes to a search index, a data warehouse, a downstream service, an AI pipeline. This post is the working knowledge — how Postgres CDC actually works, the tools, the patterns, and the gotchas.

Why CDC

The shape of a modern data pipeline:

Postgres (source of truth)
    │ logical replication
    ▼
CDC pipeline (Debezium / outbox / Materialize)
    │
    ├──▶ Kafka / NATS topics
    │    ├──▶ Search index (OpenSearch)
    │    ├──▶ Analytics (ClickHouse, BigQuery)
    │    ├──▶ Cache invalidation
    │    └──▶ AI feature pipeline
    │
    └──▶ Downstream service

Postgres stays the single source of truth. Changes fan out asynchronously and reliably. Application code doesn’t have to dual-write to Postgres + N other places.

Postgres logical replication

The foundation: Postgres ships every committed change in the WAL. Logical replication decodes the WAL into row-level events:

-- Set replication mode (postgresql.conf or per-instance)
wal_level = logical
max_replication_slots = 10
max_wal_senders = 10

-- Create a publication of the tables you want to stream
CREATE PUBLICATION my_publication FOR TABLE orders, users, payments;

-- Or all tables:
CREATE PUBLICATION all_tables FOR ALL TABLES;

-- Create a logical replication slot — this is what consumers read from
SELECT pg_create_logical_replication_slot('my_slot', 'pgoutput');

A consumer connects to the slot, reads events, processes, sends back acks. Postgres holds WAL only as long as the slowest slot needs it.

Debezium

The most popular CDC platform. Runs as a Kafka Connect connector:

{
  "name": "postgres-orders",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "...",
    "database.dbname": "myapp",
    "topic.prefix": "myapp",
    "table.include.list": "public.orders,public.payments",
    "plugin.name": "pgoutput",
    "publication.name": "my_publication",
    "slot.name": "my_slot",
    "snapshot.mode": "initial"
  }
}

What it gives you:

Initial snapshot of existing rows + ongoing change stream.
Schema-aware events — every change includes table schema metadata.
Heartbeat / health monitoring.
Stable topic per table (myapp.public.orders).
Fault tolerance — reconnects, resumes from last offset.

Debezium is the right choice when you already run Kafka and have many tables to replicate. Without Kafka, see Debezium Server.

Debezium Server (no Kafka)

If you want CDC but don’t want Kafka, Debezium Server ships changes directly to:

Apache Pulsar
AWS Kinesis
GCP Pub/Sub
HTTP webhook
Redis Streams
Pravega
NATS JetStream

A single binary; configured similarly to the connector form.

The outbox pattern (smaller scale)

For when you don’t want any of that infra:

CREATE TABLE outbox (
    id            BIGSERIAL PRIMARY KEY,
    aggregate     TEXT NOT NULL,
    event_type    TEXT NOT NULL,
    payload       JSONB NOT NULL,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at  TIMESTAMPTZ
);

CREATE INDEX outbox_unpublished ON outbox (id) WHERE published_at IS NULL;

In your application transactions:

async with db.transaction():
    await db.execute("UPDATE orders SET status = 'paid' WHERE id = $1", order_id)
    await db.execute(
        "INSERT INTO outbox (aggregate, event_type, payload) VALUES ($1, $2, $3)",
        "order", "order.paid", json.dumps({"id": order_id, "ts": now()}),
    )

A relay worker:

async def relay():
    while True:
        rows = await db.fetch(
            "SELECT id, aggregate, event_type, payload FROM outbox "
            "WHERE published_at IS NULL ORDER BY id LIMIT 100 FOR UPDATE SKIP LOCKED"
        )
        for r in rows:
            await broker.publish(f"{r['aggregate']}.{r['event_type']}", r["payload"])
            await db.execute("UPDATE outbox SET published_at = now() WHERE id = $1", r["id"])
        if not rows:
            await asyncio.sleep(0.5)

Same delivery guarantees as Debezium for application-emitted events. Simpler ops. Works on any broker.

For idempotency on the consumer side see Idempotency, Retries, and Exactly-Once Illusions .

Outbox vs Debezium

	Outbox	Debezium
What captures	Events your app writes	Every row change
Operational complexity	Low (one worker)	Higher (Kafka Connect)
Schema	What you put in payload	Postgres row + schema
When events fire	App-controlled	Every UPDATE/INSERT/DELETE
Best for	App-emitted business events	Mirror DB to search index, warehouse

A common pattern: outbox for events your app explicitly emits (order.paid, user.signed_up), Debezium for raw table mirroring (every change to orders → search index). They coexist.

Materialize and RisingWave — streaming SQL

A 2026 trend: streaming SQL engines that read Postgres logical replication and produce continuously-updated views:

-- In Materialize
CREATE SOURCE orders_source FROM POSTGRES
  CONNECTION pg_conn (PUBLICATION 'my_publication')
  FOR ALL TABLES;

-- A live materialized view that updates as changes flow in
CREATE MATERIALIZED VIEW revenue_by_day AS
SELECT date_trunc('day', created_at) AS day,
       SUM(amount_cents) AS revenue
FROM orders
WHERE status = 'paid'
GROUP BY 1;

-- Query the live view; no refresh needed
SELECT * FROM revenue_by_day;

Materialize and RisingWave both consume Postgres logical replication and maintain incremental views. Useful when:

You want sub-second analytics on live data.
Your team is SQL-fluent.
You’d rather not own Kafka + ClickHouse separately.

Replication slot pitfalls

The single biggest production pain:

1. Lag from a slow consumer

If a consumer is offline or slow, Postgres holds WAL. WAL fills disk. Postgres becomes unwritable. Outage.

Mitigations:

Monitor slot lag (SELECT slot_name, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn) FROM pg_replication_slots;).
Drop dead slots for consumers that won’t return.
Set max_slot_wal_keep_size (Postgres 13+) to bound max WAL retained.

2. Schema changes mid-stream

A DROP COLUMN while CDC is running can confuse consumers. Plan schema migrations carefully:

Add columns as nullable.
Backfill if needed.
Update consumers to read the new shape.
Only then drop / rename old columns.

This is exactly the zero-downtime migration discipline you should already follow.

3. Snapshots blocking

Initial snapshots can be heavy on the primary. Configure snapshot.mode = "incremental" (or use a read replica for the snapshot).

Decoded formats

When you create a slot, you pick a decoder plugin:

pgoutput — Postgres-native, recommended for new setups, used by Debezium 2.x.
wal2json — older alternative; emits JSON directly.
test_decoding — for debugging only.

Stick with pgoutput unless you have a specific reason.

Use cases I see in 2026

1. Postgres → OpenSearch (search index)

Mirror the rows; the search index is always up to date. No nightly reindex.

2. Postgres → ClickHouse (analytics)

Stream changes; ClickHouse holds long-term data for analytical queries. Postgres stays small. See DuckDB in Production for the embedded variant.

3. Postgres → Redis (cache invalidation)

When a user row changes, push an invalidation event. Caches update. See Caching Strategies in 2026 .

4. Postgres → AI feature pipeline

Embed every new product / document into pgvector or a vector DB as soon as it’s written. See Build a RAG App with pgvector and FastAPI .

5. Postgres → another service’s DB

One service is the source of truth; another wants a read-only copy of some tables. CDC + denormalization.

Common mistakes

1. Forgetting that DELETE is special

A DELETE event has the old row’s primary key but no other column values unless you set REPLICA IDENTITY FULL. Plan accordingly:

ALTER TABLE orders REPLICA IDENTITY FULL;

2. Coupling consumers to schema details

A consumer that reads payload['old_status'] will break the day someone renames the column. Treat the CDC stream as a public API; version it; communicate changes.

3. No backpressure on slow consumers

Consumers that fall behind silently are how WAL fills disks. Alert on slot lag.

4. Reading raw CDC events from application code

Application code wants business events (“order paid”), not row changes (“orders.id=42 column status changed”). Either use the outbox pattern for business events, or transform CDC events into business events in a stream-processor stage.

5. Treating eventually-consistent CDC as transactional

CDC is asynchronous. The downstream may be 100ms behind. Don’t use it for read-after-write workflows.

Why CDC#

Postgres logical replication#

Debezium#

Debezium Server (no Kafka)#

The outbox pattern (smaller scale)#

Outbox vs Debezium#

Materialize and RisingWave — streaming SQL#

Replication slot pitfalls#

1. Lag from a slow consumer#

2. Schema changes mid-stream#

3. Snapshots blocking#

Decoded formats#

Use cases I see in 2026#

1. Postgres → OpenSearch (search index)#

2. Postgres → ClickHouse (analytics)#

3. Postgres → Redis (cache invalidation)#

4. Postgres → AI feature pipeline#

5. Postgres → another service’s DB#

Common mistakes#

1. Forgetting that DELETE is special#

2. Coupling consumers to schema details#

3. No backpressure on slow consumers#

4. Reading raw CDC events from application code#

5. Treating eventually-consistent CDC as transactional#

Read this next#