Webhooks are the boring infrastructure that powers every integration. Get them wrong and every customer integration is a support ticket. Get them right and they disappear into the background. This post is the working guide.

What “right” looks like

A webhook system that doesn’t ruin anyone’s day:

  • Signed so consumers verify origin.
  • Retried with backoff over days.
  • Idempotent so retries don’t double-process.
  • Ordered within a stream (or explicitly unordered).
  • Observable (consumers can see delivery history; producers can debug failures).
  • Replayable from the API for backfills.

Stripe is the gold standard. Mimic them.

Producer side

Sign every payload

import hmac, hashlib, time

def sign(secret: str, payload: bytes) -> str:
    ts = int(time.time())
    msg = f"{ts}.".encode() + payload
    sig = hmac.new(secret.encode(), msg, hashlib.sha256).hexdigest()
    return f"t={ts},v1={sig}"

response_headers["X-Webhook-Signature"] = sign(customer_secret, body)

Consumer verifies signature + checks timestamp is within ~5 minutes of now (replay protection).

Retry with backoff

If consumer returns non-2xx or times out:

  • Retry at 1m, 5m, 30m, 1h, 6h, 24h, 48h, 72h.
  • Stop at 72h or N attempts.
  • Mark as failed; surface in customer dashboard.
@dramatiq.actor(max_retries=8, retry_when=should_retry)
def deliver_webhook(endpoint_id, event_id):
    endpoint = db.get_endpoint(endpoint_id)
    event = db.get_event(event_id)
    body = json.dumps(event).encode()
    signature = sign(endpoint.secret, body)
    resp = httpx.post(endpoint.url,
                       headers={"X-Webhook-Signature": signature},
                       content=body, timeout=10)
    resp.raise_for_status()

Idempotency-friendly events

Each event has a stable id. Consumers dedup on id because retries WILL deliver the same event multiple times. See Idempotency, Retries, and Exactly-Once Illusions .

Replay endpoint

GET /api/events?since=event_id_123&limit=100

If a consumer was offline for 4 days, they re-fetch from the API. Webhooks are best-effort delivery; the API is the source of truth.

Per-endpoint config

Each customer endpoint configures:

  • URL.
  • Secret.
  • Event types they want.
  • Active/disabled.

UI for them to test, view delivery history, rotate secrets.

Consumer side

Verify signature

def verify(secret: str, signature_header: str, body: bytes) -> bool:
    parts = dict(p.split("=") for p in signature_header.split(","))
    ts, sig = int(parts["t"]), parts["v1"]
    if abs(time.time() - ts) > 300:
        return False                                    # too old; replay attack
    expected = hmac.new(secret.encode(),
                        f"{ts}.".encode() + body,
                        hashlib.sha256).hexdigest()
    return hmac.compare_digest(sig, expected)


@app.post("/webhook")
async def handle(request: Request):
    body = await request.body()
    if not verify(SECRET, request.headers["X-Webhook-Signature"], body):
        return JSONResponse({"error": "invalid"}, 401)
    event = json.loads(body)
    await process_event(event)
    return {"ok": True}

Use compare_digest, not == (timing-attack safe).

Idempotency

async def process_event(event):
    seen = await db.fetchval("SELECT 1 FROM processed_events WHERE id = $1", event["id"])
    if seen:
        return                                          # already done
    async with db.transaction():
        await actually_process(event)
        await db.execute(
            "INSERT INTO processed_events (id, ts) VALUES ($1, now())",
            event["id"],
        )

The processed-events table is your dedup. Never skip this for production webhooks.

Return 2xx fast

A webhook handler should return 2xx in under 5 seconds. If processing takes longer, queue the work and ack:

@app.post("/webhook")
async def handle(request: Request):
    body = await request.body()
    # verify, parse
    await queue.enqueue(process, event)
    return {"ok": True}                                 # ack immediately

Long processing in the handler = timeouts = retries = mess.

For the queue side see Background Jobs in Python .

Be tolerant of duplicates and order

Webhooks will arrive out of order. Plan for it:

  • Each event has a version or created_at. Don’t apply older versions on top of newer.
  • Eventual consistency is OK; race conditions are not.

Producer dashboards

Customers need to debug. Show:

  • Event history with timestamps.
  • Per-event delivery attempts + status codes.
  • Replay button.
  • Endpoint test (send a test event).
  • Rotating secrets without breaking active integrations (overlap period).

Without these, every integration question becomes a support ticket.

Tools that handle this for you

  • Svix — webhooks-as-a-service. Producer side. Signing, retries, dashboards. Pay per webhook.
  • Hookdeck — sit between producers and consumers; smooth retries.
  • Inngest — workflow + webhook-shaped triggers in one platform.

For a small SaaS, building it is fine. For scale, Svix saves months.

Read this next

If you want a Postgres-backed webhook delivery service template, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .