A task queue is the boring infrastructure that quietly does most of the work in any non-trivial backend. Designing one teaches you about durability, retries, ordering, and the patterns that make distributed systems actually work. Here’s how I’d design one — even if you’ll end up using an existing implementation.
Requirements
Functional
- Producers enqueue tasks (a payload + metadata).
- Workers dequeue and process tasks.
- Tasks retry on failure with backoff.
- Failed tasks move to dead-letter for review.
- Optional: scheduling, priorities, deduplication, fairness across tenants.
Non-functional
- Durable. Tasks survive broker / worker / power failures.
- At-least-once delivery. Combined with idempotent workers for effectively-once outcomes.
- Throughput appropriate to workload (1k/sec to 1M/sec).
- Low latency between enqueue and pickup (typically <1s).
API
POST /enqueue
body: { type: "send_email", payload: {...}, run_at: "now" }
→ 201 { task_id: "..." }
# Worker side (pseudo)
task = queue.dequeue(timeout=30s)
process(task)
queue.ack(task) # or queue.nack(task) on failure
That’s the contract. Everything else is how you implement it durably and at scale.
Storage choices
| Best for | |
|---|---|
| Postgres | Up to ~100k tasks/sec; transactional; strong consistency |
| Redis (or Valkey) | Simple, fast, ephemeral or short-retention |
| NATS JetStream | Sub-ms latency; subject-based routing |
| Kafka / Redpanda | Massive throughput; long retention; replay |
| SQS / Pub/Sub | Managed; fine for ~10k/sec |
Postgres is criminally underrated as a task queue. With SELECT FOR UPDATE SKIP LOCKED, multiple workers dequeue concurrently:
WITH next_task AS (
SELECT id FROM tasks
WHERE status = 'pending' AND run_at <= now()
ORDER BY priority DESC, run_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
UPDATE tasks SET status = 'processing', started_at = now()
WHERE id = (SELECT id FROM next_task)
RETURNING id, type, payload;
SKIP LOCKED lets each worker grab a row no one else is processing. No coordination. Linear scale up to where Postgres itself is the bottleneck (~tens of thousands of tasks/sec).
For deeper Postgres concurrency see PostgreSQL MVCC, Isolation, Locking .
Schema
CREATE TABLE tasks (
id BIGSERIAL PRIMARY KEY,
type TEXT NOT NULL,
payload JSONB NOT NULL,
priority INT NOT NULL DEFAULT 0,
status TEXT NOT NULL DEFAULT 'pending', -- pending|processing|completed|failed|dead
attempts INT NOT NULL DEFAULT 0,
max_attempts INT NOT NULL DEFAULT 5,
run_at TIMESTAMPTZ NOT NULL DEFAULT now(),
started_at TIMESTAMPTZ,
completed_at TIMESTAMPTZ,
last_error TEXT,
idempotency_key TEXT UNIQUE, -- dedup
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Index for the dequeue query
CREATE INDEX tasks_pending ON tasks (priority DESC, run_at)
WHERE status = 'pending';
Partial index on status='pending' keeps the index small even as the table grows. Older completed rows can be archived to a cold table.
Worker loop
async def worker(queue):
while True:
task = await queue.fetch_one()
if task is None:
await asyncio.sleep(1) # idle backoff
continue
try:
await asyncio.wait_for(handle(task), timeout=task.timeout)
await queue.ack(task)
except Exception as e:
await queue.nack(task, error=str(e))
fetch_one:
- Atomic dequeue (the SQL above).
- Sets
status='processing', increments attempts.
ack:
- Sets
status='completed',completed_at=now().
nack:
- If
attempts < max_attempts: setstatus='pending',run_at=now() + backoff(attempts). - Else: move to dead-letter (
status='dead').
For the underlying retry / idempotency patterns see Idempotency, Retries, and Exactly-Once Illusions .
Visibility timeout
A worker that crashes mid-task must release the task back to the queue. Two patterns:
1. Heartbeat
Worker pings the queue every 5s (“still processing this”). If no heartbeat for 30s, the broker reclaims the task.
2. Visibility timeout
When fetching, set visible_again_at = now() + timeout. A separate sweeper resets stuck processing rows to pending after the timeout.
-- Sweeper, runs every 30s
UPDATE tasks
SET status = 'pending'
WHERE status = 'processing' AND started_at < now() - interval '5 minutes';
Visibility timeout is simpler. Heartbeat is more responsive.
Retries with exponential backoff
def next_run(attempts: int) -> datetime:
base_seconds = 2 ** attempts # 2, 4, 8, 16, 32, 64...
jittered = base_seconds * (0.5 + random.random())
return datetime.now() + timedelta(seconds=jittered)
Add jitter so a fleet of workers doesn’t retry in lockstep. Cap at sensible max (e.g., 1 hour). After max_attempts, dead-letter.
Dead letter queue
INSERT INTO dead_letter (task_id, payload, last_error, created_at)
SELECT id, payload, last_error, now() FROM tasks WHERE id = $1;
UPDATE tasks SET status = 'dead' WHERE id = $1;
The dead-letter queue is your bug tracker. Review weekly. Either fix the bug + re-enqueue, or accept the loss and document why.
Scheduling
Three flavors:
1. Delayed enqueue
run_at = now() + delay. The dequeue query already filters by run_at <= now(). Done.
2. Cron
A scheduler service emits tasks at fixed times. Run one scheduler per cluster (leader-elected) to avoid duplicates.
3. Recurring (interval)
Every N minutes, re-enqueue. Implement on top of cron + idempotency key.
Deduplication
Producers send the same logical event multiple times (retries, network blips). Dedup at the queue:
INSERT INTO tasks (idempotency_key, type, payload)
VALUES ($1, $2, $3)
ON CONFLICT (idempotency_key) DO NOTHING
RETURNING id;
If the row already exists, no new task. If it didn’t, you got the new row’s ID.
The producer is responsible for the idempotency key. Common shapes: (user_id, action, ts_minute) or a UUID generated at the original event.
Fairness across tenants
Naive: FIFO. Problem: a noisy tenant’s 10,000-task burst blocks every other tenant.
Solutions:
Per-tenant queues
SELECT id FROM tasks
WHERE tenant_id = $picked_tenant AND status = 'pending'
ORDER BY priority DESC, run_at
LIMIT 1
FOR UPDATE SKIP LOCKED;
Pick picked_tenant round-robin. Each tenant gets equal share regardless of backlog.
Stochastic Fair Queueing
Pick a random tenant weighted by their pending-task count. Tenants with bigger backlogs get processed faster but no tenant is starved.
Token bucket per tenant
Each tenant has a budget refilled per second. Workers respect budgets when dequeuing. See Design a Rate Limiter for the bucket implementation.
Priorities
ORDER BY priority DESC, run_at
Higher priority drains first. Beware starvation — low-priority tasks may never run. Mitigations:
- Priority decay (low-priority tasks gain priority over time).
- Reserved worker slots for low-priority.
- Quota: “10% of throughput to low-priority always.”
Concurrency caps
Some tasks shouldn’t run more than N at a time globally (e.g., calls to a rate-limited API). Pattern:
async def with_concurrency_cap(queue_name: str, cap: int, fn):
granted = await redis.incr(f"running:{queue_name}")
if granted > cap:
await redis.decr(f"running:{queue_name}")
return None # release this task back
try:
return await fn()
finally:
await redis.decr(f"running:{queue_name}")
A semaphore in Redis. Many systems (Sidekiq Pro, Celery via custom routing) build this in.
Observability
Per-task logged events:
- Enqueued (when, by whom, with what idempotency key).
- Picked up (worker ID, attempt #).
- Completed / Failed / Dead.
- Latency histograms (enqueue → pickup, pickup → completion).
Dashboards:
- Per-type backlog (pending tasks by type).
- Per-type p95 latency.
- Failed and dead counts.
- Worker utilization.
When a problem hits, the first question is “are we backed up?” — that needs to be on a graph someone watches. See SLOs and Error Budgets for App Developers .
Performance ceiling
Postgres-based queue with the schema above:
- Single Postgres node: 5k–20k tasks/sec for short tasks.
- With partitioning + sharding: 100k+ tasks/sec.
Past that, switch broker:
- Redis Streams: 50k+/sec, simpler.
- NATS JetStream: 500k+/sec, sub-ms latency.
- Kafka / Redpanda: 1M+/sec, replayable history.
See Kafka vs NATS vs RabbitMQ in 2026 .
When to use each existing system
| When | |
|---|---|
| Celery / Dramatiq | Python apps; hide the queue mechanics. See Background Jobs in Python . |
| BullMQ | Node.js apps with Redis. |
| River (Go), Oban (Elixir) | Native, fast, Postgres-based. |
| Sidekiq | Ruby apps. |
| NATS JetStream | Multi-language, low-latency, simple ops. |
| Kafka | Massive throughput, replayable. |
| SQS / Pub-Sub | Managed, AWS / GCP-native. |
| Temporal | Multi-step workflows, durable, complex orchestration. See Temporal Durable Execution . |
What I’d actually build today
For a small product:
- Postgres + River (Go) or Postgres + arq (Python) or Postgres + Oban (Elixir).
- Simple. Same DB as your application. Fewer moving parts.
For mid-scale:
- NATS JetStream for sub-second jobs; Postgres for orchestrated work.
For high-throughput pipelines:
- Kafka for events + a worker pool consuming them.
Common mistakes
1. No idempotency
Workers must be safe to call twice. Otherwise duplicate processing on retry breaks invariants.
2. No dead letter
Tasks that always fail loop forever. Cap retries, dead-letter.
3. Visibility timeout shorter than max task duration
Worker takes 6 minutes; visibility is 5 minutes. The task is reclaimed; another worker starts it. Race condition. Configure visibility > worst-case task time.
4. Single-tenant fairness assumed in multi-tenant systems
A burst from one tenant locks out everyone. Add per-tenant fairness early.
5. Logging payload to dead letter without redaction
Payloads contain PII. Redact before storing, especially in error logs.
Read this next
- Background Jobs in Python — arq, Dramatiq, Taskiq
- Idempotency, Retries, and Exactly-Once Illusions
- Kafka vs NATS vs RabbitMQ in 2026
- Temporal Durable Execution
- PostgreSQL MVCC, Isolation, and Locking
If you want a working Postgres-based task queue with retries, dead-letter, fairness, and OTel tracing, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .