“How do I make a multi-service operation atomic?” — eventually you get there. The honest answer: you don’t. You compensate. This post is the working set for cross-service consistency.

Why 2PC mostly doesn’t fit

Two-phase commit:

Coordinator: "Prepare?" → all services
Each service:           ← "Prepared" / "Abort"
Coordinator: "Commit?" → all services
Each service:           ← "Committed"

Issues:

  • Every participant must support 2PC. Most modern services (Stripe, SendGrid, your own HTTP services) don’t.
  • Blocking: if the coordinator dies between prepare and commit, participants are stuck holding locks.
  • Tight coupling: participants must trust each other’s commit.
  • Performance: extra round trips per transaction.

For inside-the-DB transactions across shards: 2PC works. For cross-service distributed transactions: rarely.

Sagas: the alternative

A saga is a series of local transactions, each with a compensating action that reverses it.

Order saga:
1. Reserve inventory  → compensate: release inventory
2. Charge payment     → compensate: refund
3. Ship order         → compensate: cancel shipment
4. Send confirmation  → compensate: send cancellation

If step 3 fails: run compensations for 2 and 1.

Orchestration

A central coordinator drives the saga:

async def order_saga(order_id):
    state = SagaState(order_id)
    try:
        state.inventory = await reserve_inventory(order)
        state.payment   = await charge_payment(order)
        state.shipment  = await ship_order(order)
        await send_confirmation(order)
    except Exception:
        await compensate(state)
        raise

Pros: explicit; debuggable; one place to read the flow.

Cons: coordinator is a hot path; must persist state.

Choreography

Services react to events:

order.created → InventoryService listens → reserves
                                       → emits inventory.reserved
inventory.reserved → PaymentService listens → charges
                                       → emits payment.captured
...

Pros: loosely coupled; each service self-contained.

Cons: hard to debug; flow is implicit; “who reacts to what” scattered.

For most teams: orchestration is the default. Choreography for high-scale or when coupling truly hurts.

Persistent saga state

CREATE TABLE sagas (
    id           uuid PRIMARY KEY,
    type         text NOT NULL,
    state        text NOT NULL,    -- 'pending', 'compensating', 'completed', 'failed'
    payload      jsonb NOT NULL,
    started_at   timestamptz DEFAULT now(),
    updated_at   timestamptz DEFAULT now()
);

CREATE TABLE saga_steps (
    saga_id      uuid REFERENCES sagas(id),
    step         int  NOT NULL,
    name         text NOT NULL,
    status       text NOT NULL,    -- 'pending', 'completed', 'failed', 'compensated'
    result       jsonb,
    PRIMARY KEY (saga_id, step)
);

After every step, persist. On crash: rehydrate, continue from where you stopped.

Compensations need idempotency

Compensation might run twice (retry, partial failure). Make sure refunding twice doesn’t double-refund:

async def refund_payment(payment_id, idempotency_key):
    if await already_refunded(payment_id):
        return
    await stripe.Refund.create(
        payment_intent=payment_id,
        idempotency_key=idempotency_key,
    )
    await mark_refunded(payment_id)

See Idempotency .

Compensations aren’t always possible

What if “ship order” already happened and the truck has left?

Options:

  • Mark for return (asynchronous compensation).
  • Refund and apologize (semantic compensation).
  • Prevent the failure modes that would require non-recoverable compensation (validate harder upfront).

Compensations are best-effort. Some operations can’t be undone. Plan for it.

Outbox pattern

Combine local transaction + event publishing atomically:

BEGIN;
INSERT INTO orders ...;
INSERT INTO outbox (event_type, payload) VALUES ('order.created', ...);
COMMIT;

A worker reads from outbox and publishes to your event bus:

async def outbox_worker():
    while True:
        events = await db.fetch("SELECT * FROM outbox WHERE published_at IS NULL LIMIT 100")
        for e in events:
            await bus.publish(e.event_type, e.payload)
            await db.execute("UPDATE outbox SET published_at = now() WHERE id = $1", e.id)
        await asyncio.sleep(0.1)

Either both happen or neither (the local transaction guarantees it). Event delivery is async but reliable.

Inbox pattern

Mirror on the consumer side:

CREATE TABLE inbox (
    id          uuid PRIMARY KEY,
    received_at timestamptz DEFAULT now(),
    processed_at timestamptz
);

INSERT INTO inbox (id) VALUES ($1) ON CONFLICT DO NOTHING RETURNING id;
-- if returned, it's new; process. if not, already seen.

Idempotent consume — same event delivered twice, processed once.

Workflow engines

For complex multi-step operations:

  • Temporal: durable workflows; built-in retries, timers, signals.
  • Cadence: predecessor of Temporal.
  • Camunda: BPMN-based.
  • AWS Step Functions: managed.

These do the saga state management for you. See Temporal Workflow Engine .

Event sourcing fit

Sagas pair well with event sourcing — saga state IS a stream of events. See Event Sourcing .

Common mistakes

1. 2PC across services

You make all services pretend to support 2PC; coordinator failures lock everything; reliability tanks. Use sagas.

2. No compensation for some steps

“This step always succeeds.” Until it doesn’t. Plan compensation for every step.

3. Compensations not idempotent

Retry refunds → double refund. Always idempotency-key.

4. No persistent saga state

Process crashes mid-saga; state lost; orphaned partial transactions. Persist after every step.

5. Optimism

“Network never fails between us.” It will. Build for retries from day one.

What I’d ship today

For multi-service operations:

  • Saga (orchestration) for clarity.
  • Persistent saga state in your DB.
  • Idempotent compensations.
  • Outbox pattern for atomic local-write + event-emit.
  • Temporal if the workflows get complex.
  • Tracing through the whole saga via distributed tracing .

Read this next

If you want my saga + outbox starter (Postgres + Python), it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .