Background jobs decouple slow work from request latency. Done right: emails send, reports generate, webhooks deliver, all without touching your p95. Done wrong: silent failures, duplicate work, lost tasks. This post is the working playbook.

Setup

uv add celery redis django-celery-beat django-celery-results
# celery.py
import os
from celery import Celery

os.environ.setdefault("DJANGO_SETTINGS_MODULE", "myapp.settings")
app = Celery("myapp")
app.config_from_object("django.conf:settings", namespace="CELERY")
app.autodiscover_tasks()
# settings.py
CELERY_BROKER_URL = "redis://redis:6379/0"
CELERY_RESULT_BACKEND = "django-db"
CELERY_TASK_ACKS_LATE = True
CELERY_TASK_REJECT_ON_WORKER_LOST = True
CELERY_TASK_TRACK_STARTED = True
CELERY_BEAT_SCHEDULER = "django_celery_beat.schedulers:DatabaseScheduler"

acks_late + reject_on_worker_lost: tasks survive worker crashes. Without these you’ll silently lose jobs.

Task design

from celery import shared_task
from django.db import transaction

@shared_task(
    bind=True,
    autoretry_for=(ConnectionError, TimeoutError),
    retry_kwargs={"max_retries": 5},
    retry_backoff=True,
    retry_jitter=True,
)
def send_welcome(self, user_id: int):
    user = User.objects.get(pk=user_id)
    if user.welcomed_at:
        return  # idempotent: safe to re-run
    
    with transaction.atomic():
        send_mail(...)
        user.welcomed_at = timezone.now()
        user.save(update_fields=["welcomed_at"])

Pass IDs, not objects. Always derive idempotency keys. Always update state to mark “done.”

Idempotency

@shared_task
def charge_customer(invoice_id: int):
    inv = Invoice.objects.select_for_update().get(pk=invoice_id)
    if inv.charged:
        return
    stripe.Charge.create(idempotency_key=f"inv-{invoice_id}", amount=inv.amount, ...)
    inv.charged = True
    inv.save()

Stripe’s idempotency_key makes the API call safe. The DB row makes our side safe. Two layers; survives both retries.

For more: Idempotency Patterns .

Retries

@shared_task(
    bind=True,
    autoretry_for=(Exception,),  # broad — fine for some
    retry_backoff=True,           # 1, 2, 4, 8, 16...
    retry_backoff_max=600,        # cap at 10min
    retry_jitter=True,            # randomize 50%
    max_retries=10,
)
def webhook_send(self, url, payload):
    resp = requests.post(url, json=payload, timeout=10)
    resp.raise_for_status()

Exponential backoff + jitter = the standard. Without backoff: thundering herd retries on outages. Without jitter: synchronized retries.

Dead-letter queue

When max_retries is hit, log to a dead-letter table:

@shared_task(bind=True)
def webhook_send(self, url, payload):
    try:
        requests.post(url, json=payload, timeout=10).raise_for_status()
    except Exception as e:
        try:
            self.retry(countdown=2 ** self.request.retries, max_retries=10)
        except MaxRetriesExceededError:
            DeadLetter.objects.create(
                task="webhook_send",
                args={"url": url, "payload": payload},
                error=str(e),
            )

Operator can re-trigger or inspect. Failures aren’t silent.

Beat (scheduling)

# settings.py
from celery.schedules import crontab

CELERY_BEAT_SCHEDULE = {
    "cleanup-expired-tokens": {
        "task": "auth.tasks.cleanup_tokens",
        "schedule": crontab(minute=0, hour="*/2"),
    },
    "daily-report": {
        "task": "reports.tasks.daily",
        "schedule": crontab(minute=0, hour=2),
    },
}

Or via DB-backed scheduler (django-celery-beat) for runtime config in admin.

Run exactly one beat instance. Two beat instances = duplicate scheduled tasks.

Monitoring

celery -A myapp flower --basic-auth=admin:secret

Flower: real-time dashboard. Active tasks, success rate, queue depth. Free.

Beyond flower: emit metrics for each task:

@shared_task(bind=True)
def my_task(self, ...):
    start = time.time()
    try:
        do_work()
        statsd.incr("tasks.my_task.success")
    except Exception:
        statsd.incr("tasks.my_task.failure")
        raise
    finally:
        statsd.timing("tasks.my_task.duration", time.time() - start)

Plus alerts on queue depth (broker-side) and dead-letter rate. See Observability .

Chains and groups

from celery import chain, group

# Sequential
chain(a.s(1), b.s(), c.s())()

# Parallel + collect
group(process_chunk.s(c) for c in chunks)()

# Parallel then aggregate
chord(group(process.s(c) for c in chunks))(aggregate.s())

For workflows. Beyond simple cases, use Temporal — Celery’s workflow primitives are coarse compared to a real workflow engine.

Queues / priorities

CELERY_TASK_ROUTES = {
    "emails.tasks.*": {"queue": "emails"},
    "reports.tasks.*": {"queue": "reports"},
}
celery -A myapp worker -Q emails -c 16
celery -A myapp worker -Q reports -c 4

Run different worker pools for different concerns. Email backlog doesn’t block report jobs.

Hot-path: don’t queue things that aren’t actually background

# BAD
@shared_task
def add_user(email):
    User.objects.create(email=email)

# Then in view:
add_user.delay(email)  # 100ms+ for queue overhead vs 5ms direct

Celery has overhead. For sub-100ms work, just do it inline. Use Celery for: emails (latency tolerant), reports (long-running), webhooks (retryable), etc.

Common mistakes

1. Pickling Django objects

task.delay(user) serializes the whole model. State drifts. Always pass primary keys.

2. Database transactions in tasks

Default Celery doesn’t auto-wrap tasks in transactions. Wrap explicitly with @transaction.atomic.

3. acks_late=False

Default acks-early — task is acked before running. Worker crash mid-task → task lost. Set acks_late=True + reject_on_worker_lost=True.

4. No timeout

Task hangs; worker stuck. Set time_limit and soft_time_limit:

@shared_task(soft_time_limit=60, time_limit=120)
def my_task():
    ...

5. One queue for everything

Fast tasks blocked behind slow ones. Split queues; size workers per queue.

Alternatives

Strengths
CeleryMature; ecosystem; complex
RQSimple; Redis-only
Django-Q2DB-backed; minimal infra
TemporalWorkflow engine; durable
ProcrastinatePostgres-backed; uses LISTEN/NOTIFY

For 90% of Django shops: Celery. For tiny apps: Django-Q2. For complex workflows: Temporal.

Read this next

If you want my Django + Celery production starter, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .