Should I build a task queue or use an existing one?

Almost always use an existing one. Celery, BullMQ, SQS, NATS JetStream, and Postgres-based queues (River for Go, Oban for Elixir) cover virtually every shape. Build only if you have requirements (very specific fairness, custom routing) that no existing system gives you.

Why is exactly-once delivery so hard?

Network failures break the symmetric handshake between producer and broker, broker and consumer. You can have at-least-once with idempotent consumers (effectively-once outcomes), or at-most-once with possible loss. True exactly-once across services doesn't exist — but it can look like it with the right patterns.

Can I just use Postgres as a queue?

Yes, up to a real scale. SELECT FOR UPDATE SKIP LOCKED gives you sane concurrent dequeueing. River, Oban, and arq+Postgres prove this works. Move to specialized brokers when you need Kafka-style replay, NATS-style sub-millisecond latency, or millions of messages per second.

Design a Distributed Task Queue — Production Patterns That Scale

A task queue is the boring infrastructure that quietly does most of the work in any non-trivial backend. Designing one teaches you about durability, retries, ordering, and the patterns that make distributed systems actually work. Here’s how I’d design one — even if you’ll end up using an existing implementation.

Requirements

Functional

Producers enqueue tasks (a payload + metadata).
Workers dequeue and process tasks.
Tasks retry on failure with backoff.
Failed tasks move to dead-letter for review.
Optional: scheduling, priorities, deduplication, fairness across tenants.

Non-functional

Durable. Tasks survive broker / worker / power failures.
At-least-once delivery. Combined with idempotent workers for effectively-once outcomes.
Throughput appropriate to workload (1k/sec to 1M/sec).
Low latency between enqueue and pickup (typically <1s).

API

POST /enqueue
  body: { type: "send_email", payload: {...}, run_at: "now" }
  → 201 { task_id: "..." }

# Worker side (pseudo)
task = queue.dequeue(timeout=30s)
process(task)
queue.ack(task)              # or queue.nack(task) on failure

That’s the contract. Everything else is how you implement it durably and at scale.

Storage choices

	Best for
Postgres	Up to ~100k tasks/sec; transactional; strong consistency
Redis (or Valkey)	Simple, fast, ephemeral or short-retention
NATS JetStream	Sub-ms latency; subject-based routing
Kafka / Redpanda	Massive throughput; long retention; replay
SQS / Pub/Sub	Managed; fine for ~10k/sec

Postgres is criminally underrated as a task queue. With SELECT FOR UPDATE SKIP LOCKED, multiple workers dequeue concurrently:

WITH next_task AS (
  SELECT id FROM tasks
  WHERE status = 'pending' AND run_at <= now()
  ORDER BY priority DESC, run_at
  LIMIT 1
  FOR UPDATE SKIP LOCKED
)
UPDATE tasks SET status = 'processing', started_at = now()
WHERE id = (SELECT id FROM next_task)
RETURNING id, type, payload;

SKIP LOCKED lets each worker grab a row no one else is processing. No coordination. Linear scale up to where Postgres itself is the bottleneck (~tens of thousands of tasks/sec).

For deeper Postgres concurrency see PostgreSQL MVCC, Isolation, Locking .

Schema

CREATE TABLE tasks (
    id              BIGSERIAL PRIMARY KEY,
    type            TEXT NOT NULL,
    payload         JSONB NOT NULL,
    priority        INT NOT NULL DEFAULT 0,
    status          TEXT NOT NULL DEFAULT 'pending',     -- pending|processing|completed|failed|dead
    attempts        INT NOT NULL DEFAULT 0,
    max_attempts    INT NOT NULL DEFAULT 5,
    run_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
    started_at      TIMESTAMPTZ,
    completed_at    TIMESTAMPTZ,
    last_error      TEXT,
    idempotency_key TEXT UNIQUE,                        -- dedup
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Index for the dequeue query
CREATE INDEX tasks_pending ON tasks (priority DESC, run_at)
  WHERE status = 'pending';

Partial index on status='pending' keeps the index small even as the table grows. Older completed rows can be archived to a cold table.

Worker loop

async def worker(queue):
    while True:
        task = await queue.fetch_one()
        if task is None:
            await asyncio.sleep(1)               # idle backoff
            continue
        try:
            await asyncio.wait_for(handle(task), timeout=task.timeout)
            await queue.ack(task)
        except Exception as e:
            await queue.nack(task, error=str(e))

fetch_one:

Atomic dequeue (the SQL above).
Sets status='processing', increments attempts.

ack:

Sets status='completed', completed_at=now().

nack:

If attempts < max_attempts: set status='pending', run_at=now() + backoff(attempts).
Else: move to dead-letter (status='dead').

For the underlying retry / idempotency patterns see Idempotency, Retries, and Exactly-Once Illusions .

Visibility timeout

A worker that crashes mid-task must release the task back to the queue. Two patterns:

1. Heartbeat

Worker pings the queue every 5s (“still processing this”). If no heartbeat for 30s, the broker reclaims the task.

2. Visibility timeout

When fetching, set visible_again_at = now() + timeout. A separate sweeper resets stuck processing rows to pending after the timeout.

-- Sweeper, runs every 30s
UPDATE tasks
SET status = 'pending'
WHERE status = 'processing' AND started_at < now() - interval '5 minutes';

Visibility timeout is simpler. Heartbeat is more responsive.

Retries with exponential backoff

def next_run(attempts: int) -> datetime:
    base_seconds = 2 ** attempts                    # 2, 4, 8, 16, 32, 64...
    jittered = base_seconds * (0.5 + random.random())
    return datetime.now() + timedelta(seconds=jittered)

Add jitter so a fleet of workers doesn’t retry in lockstep. Cap at sensible max (e.g., 1 hour). After max_attempts, dead-letter.

Dead letter queue

INSERT INTO dead_letter (task_id, payload, last_error, created_at)
SELECT id, payload, last_error, now() FROM tasks WHERE id = $1;
UPDATE tasks SET status = 'dead' WHERE id = $1;

The dead-letter queue is your bug tracker. Review weekly. Either fix the bug + re-enqueue, or accept the loss and document why.

Scheduling

Three flavors:

1. Delayed enqueue

run_at = now() + delay. The dequeue query already filters by run_at <= now(). Done.

2. Cron

A scheduler service emits tasks at fixed times. Run one scheduler per cluster (leader-elected) to avoid duplicates.

3. Recurring (interval)

Every N minutes, re-enqueue. Implement on top of cron + idempotency key.

Deduplication

Producers send the same logical event multiple times (retries, network blips). Dedup at the queue:

INSERT INTO tasks (idempotency_key, type, payload)
VALUES ($1, $2, $3)
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING id;

If the row already exists, no new task. If it didn’t, you got the new row’s ID.

The producer is responsible for the idempotency key. Common shapes: (user_id, action, ts_minute) or a UUID generated at the original event.

Fairness across tenants

Naive: FIFO. Problem: a noisy tenant’s 10,000-task burst blocks every other tenant.

Solutions:

Per-tenant queues

SELECT id FROM tasks
WHERE tenant_id = $picked_tenant AND status = 'pending'
ORDER BY priority DESC, run_at
LIMIT 1
FOR UPDATE SKIP LOCKED;

Pick picked_tenant round-robin. Each tenant gets equal share regardless of backlog.

Stochastic Fair Queueing

Pick a random tenant weighted by their pending-task count. Tenants with bigger backlogs get processed faster but no tenant is starved.

Token bucket per tenant

Each tenant has a budget refilled per second. Workers respect budgets when dequeuing. See Design a Rate Limiter for the bucket implementation.

Priorities

ORDER BY priority DESC, run_at

Higher priority drains first. Beware starvation — low-priority tasks may never run. Mitigations:

Priority decay (low-priority tasks gain priority over time).
Reserved worker slots for low-priority.
Quota: “10% of throughput to low-priority always.”

Concurrency caps

Some tasks shouldn’t run more than N at a time globally (e.g., calls to a rate-limited API). Pattern:

async def with_concurrency_cap(queue_name: str, cap: int, fn):
    granted = await redis.incr(f"running:{queue_name}")
    if granted > cap:
        await redis.decr(f"running:{queue_name}")
        return None                          # release this task back
    try:
        return await fn()
    finally:
        await redis.decr(f"running:{queue_name}")

A semaphore in Redis. Many systems (Sidekiq Pro, Celery via custom routing) build this in.

Observability

Per-task logged events:

Enqueued (when, by whom, with what idempotency key).
Picked up (worker ID, attempt #).
Completed / Failed / Dead.
Latency histograms (enqueue → pickup, pickup → completion).

Dashboards:

Per-type backlog (pending tasks by type).
Per-type p95 latency.
Failed and dead counts.
Worker utilization.

When a problem hits, the first question is “are we backed up?” — that needs to be on a graph someone watches. See SLOs and Error Budgets for App Developers .

Performance ceiling

Postgres-based queue with the schema above:

Single Postgres node: 5k–20k tasks/sec for short tasks.
With partitioning + sharding: 100k+ tasks/sec.

Past that, switch broker:

Redis Streams: 50k+/sec, simpler.
NATS JetStream: 500k+/sec, sub-ms latency.
Kafka / Redpanda: 1M+/sec, replayable history.

See Kafka vs NATS vs RabbitMQ in 2026 .

When to use each existing system

	When
Celery / Dramatiq	Python apps; hide the queue mechanics. See Background Jobs in Python .
BullMQ	Node.js apps with Redis.
River (Go), Oban (Elixir)	Native, fast, Postgres-based.
Sidekiq	Ruby apps.
NATS JetStream	Multi-language, low-latency, simple ops.
Kafka	Massive throughput, replayable.
SQS / Pub-Sub	Managed, AWS / GCP-native.
Temporal	Multi-step workflows, durable, complex orchestration. See Temporal Durable Execution .

What I’d actually build today

For a small product:

Postgres + River (Go) or Postgres + arq (Python) or Postgres + Oban (Elixir).
Simple. Same DB as your application. Fewer moving parts.

For mid-scale:

NATS JetStream for sub-second jobs; Postgres for orchestrated work.

For high-throughput pipelines:

Kafka for events + a worker pool consuming them.

Common mistakes

1. No idempotency

Workers must be safe to call twice. Otherwise duplicate processing on retry breaks invariants.

2. No dead letter

Tasks that always fail loop forever. Cap retries, dead-letter.

3. Visibility timeout shorter than max task duration

Worker takes 6 minutes; visibility is 5 minutes. The task is reclaimed; another worker starts it. Race condition. Configure visibility > worst-case task time.

4. Single-tenant fairness assumed in multi-tenant systems

A burst from one tenant locks out everyone. Add per-tenant fairness early.

5. Logging payload to dead letter without redaction

Payloads contain PII. Redact before storing, especially in error logs.

Read this next

If you want a working Postgres-based task queue with retries, dead-letter, fairness, and OTel tracing, it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

Requirements#

Functional#

Non-functional#

API#

Storage choices#

Schema#

Worker loop#

Visibility timeout#

1. Heartbeat#

2. Visibility timeout#

Retries with exponential backoff#

Dead letter queue#

Scheduling#

1. Delayed enqueue#

2. Cron#

3. Recurring (interval)#

Deduplication#

Fairness across tenants#

Per-tenant queues#

Stochastic Fair Queueing#

Token bucket per tenant#

Priorities#

Concurrency caps#

Observability#

Performance ceiling#

When to use each existing system#

What I’d actually build today#

Common mistakes#

1. No idempotency#

2. No dead letter#

3. Visibility timeout shorter than max task duration#

4. Single-tenant fairness assumed in multi-tenant systems#

5. Logging payload to dead letter without redaction#

Read this next#