Postgres makes a great job queue up to surprising scale. No Redis, no SQS, no Kafka — just a table and SKIP LOCKED. This post is the working playbook.

Schema

CREATE TABLE jobs (
    id          bigserial PRIMARY KEY,
    queue       text NOT NULL DEFAULT 'default',
    payload     jsonb NOT NULL,
    priority    int NOT NULL DEFAULT 0,
    run_at      timestamptz NOT NULL DEFAULT now(),
    attempts    int NOT NULL DEFAULT 0,
    locked_at   timestamptz,
    locked_by   text,
    created_at  timestamptz NOT NULL DEFAULT now()
);

CREATE INDEX jobs_pickup ON jobs (queue, run_at, priority DESC)
    WHERE locked_at IS NULL;

Partial index = small + fast for the “available jobs” query.

Enqueue

async def enqueue(queue, payload, run_at=None, priority=0):
    await db.execute(
        "INSERT INTO jobs (queue, payload, run_at, priority) VALUES ($1, $2, $3, $4)",
        queue, payload, run_at or "now()", priority
    )

Atomic with business writes:

async with db.transaction():
    await db.execute("INSERT INTO orders ...")
    await enqueue("send_confirmation", {"order_id": order_id})

Either both happen or neither. The outbox pattern, free.

Dequeue (SKIP LOCKED)

WITH next_job AS (
    SELECT id FROM jobs
    WHERE queue = $1 AND run_at <= now() AND locked_at IS NULL
    ORDER BY priority DESC, run_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED
)
UPDATE jobs SET locked_at = now(), locked_by = $2, attempts = attempts + 1
WHERE id = (SELECT id FROM next_job)
RETURNING id, payload;

Multi-worker safe: each worker grabs a different row. Other workers skip locked rows.

async def claim_job(queue, worker_id):
    return await db.fetchrow(SQL, queue, worker_id)

Process and complete

async def worker_loop(worker_id):
    while True:
        job = await claim_job("default", worker_id)
        if not job:
            await asyncio.sleep(1)
            continue
        try:
            await process(job["payload"])
            await db.execute("DELETE FROM jobs WHERE id = $1", job["id"])
        except Exception as e:
            await handle_failure(job, e)

Delete on success. On failure: requeue with backoff or move to dead-letter.

Retries with backoff

async def handle_failure(job, error):
    if job["attempts"] < 5:
        delay = 2 ** job["attempts"] + random.uniform(0, 1)
        await db.execute(
            "UPDATE jobs SET locked_at = NULL, run_at = now() + interval '1 second' * $1 WHERE id = $2",
            delay, job["id"]
        )
    else:
        await db.execute(
            "INSERT INTO dead_jobs SELECT * FROM jobs WHERE id = $1", job["id"]
        )
        await db.execute("DELETE FROM jobs WHERE id = $1", job["id"])
        log.error(f"job {job['id']} dead-lettered: {error}")

Exponential + jitter. Cap retries; dead-letter the rest. See Idempotency .

Visibility timeout

A worker dies mid-job; lock stuck forever. Reaper:

-- Periodic cleanup
UPDATE jobs SET locked_at = NULL, locked_by = NULL
WHERE locked_at < now() - interval '5 minutes';

If a job takes longer than the timeout, the worker should heartbeat:

UPDATE jobs SET locked_at = now() WHERE id = $1 AND locked_by = $2;

LISTEN / NOTIFY for low latency

Polling has 1s+ latency. For instant wake-up:

-- On enqueue
NOTIFY jobs_default, '';
# Worker
async with db.acquire() as conn:
    await conn.execute("LISTEN jobs_default")
    while True:
        try:
            job = await claim_job(...)
            if job:
                await process(...)
            else:
                await conn.notifies.get(timeout=5)  # wakes on NOTIFY or after 5s
        except asyncio.TimeoutError:
            pass

Sub-millisecond pickup latency. Pair with periodic polling as a safety net.

Concurrency

Postgres handles SKIP LOCKED well. With ~100k jobs/sec sustained you’ll hit:

  • WAL bandwidth (writes).
  • Index bloat on the partial index.
  • Connection limits.

Mitigations:

  • PgBouncer transaction pooling.
  • Partition by queue or date.
  • Vacuum aggressively — high-churn tables need it.

Multi-queue

Use the queue column for logical separation:

async def claim_any(worker_id):
    return await db.fetchrow(SQL_WITH_ANY_QUEUE, worker_id)

Workers can specialize: high-priority workers on urgent queue, generalists on default.

Scheduled jobs

await enqueue("cleanup", {}, run_at=tomorrow_3am)

run_at <= now() filter handles scheduling. For complex schedules: pg_cron extension or external scheduler that calls enqueue.

At-least-once semantics

Jobs may run twice (worker dies after work but before delete). Make handlers idempotent:

async def send_email(payload):
    if await already_sent(payload["email_id"]):
        return
    await sendgrid.send(...)
    await mark_sent(payload["email_id"])

Comparison

Postgres queueRedis queueSQSKafka
SetupNone (already there)Add RedisAdd SQSAdd cluster
Throughput~10k/sec~100k/secHighVery high
DurabilityHigh (DB)ConfigurableHighHigh
Atomic with biz dataYesNoNoNo
VisibilitySQLRedisInsightConsoleLess
Streaming / replayNoLimitedNoYes

For most apps: Postgres queue handles the load and saves you a service.

Tools

  • River (Go) — Postgres queue with great DX.
  • Procrastinate (Python) — uses LISTEN/NOTIFY.
  • GoodJob (Ruby/Rails) — same pattern.
  • graphile-worker (TypeScript) — robust.
  • Hand-rolled — what this post showed.

For Python: Procrastinate is excellent.

Common mistakes

1. No partial index

Sequential scan of “available jobs” on million-row table. Add the partial index.

2. No vacuum

Bloat accumulates; queries slow. autovacuum on, or vacuum manually.

3. Single worker

One worker = bottleneck. Multiple workers + SKIP LOCKED scale near linearly.

4. Forever-locked stuck jobs

Worker dies; lock stays. Add reaper.

5. Non-idempotent handlers

At-least-once delivery duplicates work. Make handlers idempotent.

What I’d ship today

For ≤10k jobs/sec:

  • Postgres-backed queue with the schema above.
  • SKIP LOCKED for concurrency.
  • LISTEN/NOTIFY for low latency.
  • Exponential backoff retries.
  • Dead-letter table for poison messages.
  • Reaper for stuck locks.
  • Procrastinate / River / graphile-worker if you want it off-the-shelf.

When you outgrow it: graduate to Kafka or NATS, but most apps never do.

Read this next

If you want my Postgres-as-queue Python implementation, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .