Every backend eventually needs to do work outside the request/response cycle. Send an email. Resize an image. Reconcile with a third party. Run a nightly report. Doing this synchronously inside the request makes pages slow and fragile. The solution is a background worker — and in Python, Celery has been the default for over a decade.

This post is the practical Celery guide: how to wire it up, the patterns that actually work in production, and the foot-guns that ruin teams’ weekends.

Why background jobs at all?

Three big wins from moving slow work off the request thread:

  1. Faster responses. A 10s email send becomes a 10ms enqueue.
  2. Resilience. Retry on failure without the user pressing F5.
  3. Decoupling. The web tier doesn’t have to know how the worker tier works, only that it exists.

Trade-off: you now have a distributed system. Eventual consistency, dead-letter handling, idempotency, monitoring — these are now your problems, not the framework’s.

The mental model

Celery has three actors:

  • Producer — your web app. Calls task.delay(args) to enqueue work.
  • Broker — the message queue. Redis or RabbitMQ in 2026 (don’t use SQS or “celery + DB” for serious workloads).
  • Worker — long-running process(es) that pop jobs off the queue and execute them.

Optionally, a result backend stores task results so producers can wait for them. Often you don’t need this.

[ web app ] --enqueue--> [ Redis ] --consume--> [ worker ]

Install

pip install "celery[redis]>=5.4"
# Or with RabbitMQ
pip install "celery[librabbitmq]>=5.4"

For local Redis on macOS:

brew install redis
brew services start redis

Your first task

# app/tasks.py
from celery import Celery


celery_app = Celery(
    "myapp",
    broker="redis://localhost:6379/0",
    backend="redis://localhost:6379/1",   # optional; only if you need results
)


@celery_app.task
def add(x: int, y: int) -> int:
    return x + y

Run a worker in a separate terminal:

celery -A app.tasks worker --loglevel=info

Enqueue a task from a Python REPL or your web code:

from app.tasks import add

result = add.delay(2, 3)        # returns immediately
print(result.get(timeout=5))    # blocks until done; only if you have a result backend

That’s the whole core. Everything else is configuration.

Production-shaped configuration

Don’t pass options into the Celery() constructor — use a config object:

# app/celery_config.py
from kombu import Queue


broker_url = "redis://redis:6379/0"
result_backend = "redis://redis:6379/1"

# Serialization
task_serializer = "json"
accept_content = ["json"]
result_serializer = "json"
timezone = "UTC"
enable_utc = True

# Reliability
task_acks_late = True             # ack after task completes (not on receipt)
task_reject_on_worker_lost = True # requeue if worker is killed mid-task
worker_prefetch_multiplier = 1    # one task per worker at a time (fairness)

# Visibility timeout (Redis only) — how long a task can be in-flight before being redelivered
broker_transport_options = {"visibility_timeout": 3600}

# Routing
task_default_queue = "default"
task_queues = (
    Queue("default", routing_key="default"),
    Queue("emails",  routing_key="emails"),
    Queue("reports", routing_key="reports"),
)
task_routes = {
    "app.tasks.send_email":      {"queue": "emails"},
    "app.tasks.generate_report": {"queue": "reports"},
}
# app/tasks.py
from celery import Celery
from app import celery_config

celery_app = Celery("myapp")
celery_app.config_from_object(celery_config)

A few of these settings matter a lot:

  • task_acks_late = True — without it, a task that crashes the worker is lost. With it, the broker redelivers.
  • worker_prefetch_multiplier = 1 — by default Celery grabs many tasks at once for performance. For long-running tasks (>1s), set to 1 so workers can be load-balanced fairly.
  • task_serializer = "json" — never use pickle for serialization. It’s a security hole.

Idempotent task design (the most important thing)

A task can run more than once. Network glitches, worker crashes, message redelivery — you have to assume “at-least-once” delivery. So design tasks to be safe to run twice.

Bad: side effects without checks

@celery_app.task
def charge_user(user_id: int, amount: int):
    payment = stripe.charge(user_id, amount)  # double-charges if retried!
    save(payment)

Good: idempotency key + check

@celery_app.task
def charge_user(user_id: int, amount: int, idempotency_key: str):
    if Payment.objects.filter(idempotency_key=idempotency_key).exists():
        return  # already charged

    payment = stripe.charge(user_id, amount, idempotency_key=idempotency_key)
    Payment.objects.create(idempotency_key=idempotency_key, ...)

The idempotency key is your contract: the same key always represents the same logical operation, regardless of how many times the task fires. Stripe and most payment providers accept idempotency keys directly — use them.

Retries with exponential backoff

Failures happen. Build them in:

@celery_app.task(
    bind=True,
    autoretry_for=(ConnectionError, TimeoutError),
    retry_backoff=True,        # exponential: 1s, 2s, 4s, 8s, 16s
    retry_backoff_max=600,     # cap at 10 minutes
    retry_jitter=True,         # randomize so they don't thunder
    max_retries=10,
)
def fetch_remote(self, url: str):
    response = requests.get(url, timeout=10)
    response.raise_for_status()
    return response.json()

bind=True lets the task access self.request (retry count, task ID, etc.). retry_jitter is critical at scale — without it, every task in a batch retries at the same moment and you DDoS the upstream.

For finer control, raise self.retry():

@celery_app.task(bind=True, max_retries=5)
def call_third_party(self, payload):
    try:
        return external_api(payload)
    except RateLimited as e:
        raise self.retry(countdown=e.retry_after_seconds)

Scheduled / periodic tasks: Celery Beat

For cron-like jobs (nightly reports, hourly cleanups), use Celery Beat:

# app/celery_config.py (additions)
from celery.schedules import crontab

beat_schedule = {
    "nightly-report": {
        "task": "app.tasks.generate_report",
        "schedule": crontab(hour=2, minute=0),       # 02:00 UTC every day
        "args": (),
    },
    "every-15-minutes-cleanup": {
        "task": "app.tasks.cleanup_temp",
        "schedule": 900.0,                           # every 15 min
    },
}

Run Beat as a separate process:

celery -A app.tasks beat --loglevel=info

Calling tasks from your web app

# Django view, FastAPI route, Flask handler — same idea everywhere
from app.tasks import send_welcome_email

def signup(request):
    user = User.objects.create(...)
    send_welcome_email.delay(user.id)   # fire and forget
    return JsonResponse({"id": user.id})

task.delay(args) is shorthand for task.apply_async(args=args). Use apply_async when you need extra options (custom queue, ETA, expires):

send_welcome_email.apply_async(
    args=[user.id],
    queue="emails",
    countdown=30,                       # delay 30s
    expires=300,                        # discard if not run within 5 min
)

Important: pass primitive types (user.id, not user) — anything you pass gets serialized to JSON. Pass IDs, fetch from the DB inside the task. This also makes the task more reliable: a stale user object on disk won’t affect a re-run.

Monitoring: don’t skip this

A queue that silently fails is the worst kind of failure. Monitor at minimum:

  • Worker liveness — alert if no worker has consumed a task in N minutes.
  • Queue depth — alert if the queue grows unboundedly.
  • Failed tasks — pipe task_failure signals into Sentry or your error tracker.
  • Task latency — time from enqueue to start; surfaces backpressure.

The Celery equivalent of ps aux is Flower:

pip install flower
celery -A app.tasks flower --port=5555

Web UI at localhost:5555 shows tasks in flight, history, worker status. Useful for development; for production prefer purpose-built tools (Datadog, New Relic, Prometheus exporters).

Concurrency models

Celery workers can run with different concurrency models:

  • prefork (default) — multi-process. Best for CPU-bound or sync-blocking tasks.
  • gevent / eventlet — greenlet-based. Better for I/O-heavy tasks (lots of HTTP calls per task).
  • solo — single-thread, single-task. Useful for debugging.
celery -A app.tasks worker --pool=gevent --concurrency=200

For tasks that mostly wait on I/O (calling third-party APIs), gevent lets one worker handle hundreds of concurrent tasks. For CPU-bound work, prefork with concurrency = CPU cores is right.

Alternatives in 2026

Celery isn’t the only game in town:

  • Dramatiq — newer, simpler, fewer foot-guns. Worth a look for greenfield.
  • RQ (Redis Queue) — minimal, Redis-only. Great for simple use cases; lacks Celery’s features.
  • arq — async-native, Redis-based. If your app is FastAPI + asyncio, this fits naturally.
  • Hatchet, Inngest, Temporal — workflow orchestration as a service. More than just task queues.

Celery is still the safest, most mature choice for most teams. The ecosystem and docs are unmatched. But if you’re starting fresh and your code is async-heavy, arq or Dramatiq deserve a look.

Common pitfalls

  • Passing big objects as args. Don’t. Pass IDs, fetch the object inside the task.
  • Calling tasks synchronously by accident. task(args) runs in-process. task.delay(args) enqueues. Big difference.
  • Forgetting task_acks_late. A task that ran but the worker died = lost work.
  • Not setting prefetch_multiplier. Workers grab batches by default, which starves other workers and breaks fairness for long tasks.
  • Result backend bloat. If you set a result backend, configure result_expires (default 1 day) so old results don’t fill Redis.
  • Running multiple Beat processes. One Beat. Always.
  • Using send_task with the wrong name. Typo → silent loss into the void.

Conclusion

Background tasks are foundational for any non-trivial backend. Celery is the default for good reason — it’s mature, flexible, and well-documented. The complexity isn’t in the framework; it’s in distributed systems: idempotency, retries, monitoring, fairness. Get those right and your queue will run for years without drama.

If your stack is FastAPI-based, also see Testing FastAPI Apps — testing tasks involves the same patterns.

Happy queueing!


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .