FastAPI Textbook Ch. 12 — Deployment, Scaling, and Production

Chapter 12, the final chapter of the FastAPI textbook: shipping it. ASGI server selection, Docker, Kubernetes, autoscaling, graceful shutdown, secrets, the production checklist.

ASGI server choice

	When
Uvicorn	Default; uvloop + httptools; fast
Hypercorn	HTTP/2, HTTP/3 (QUIC), trio support
Granian	Rust-based; very fast; newer
Daphne	Original ASGI; mostly historical

For most production: Uvicorn. Hypercorn if you need HTTP/2. Granian for max raw perf.

Run command

uvicorn src.myapp.main:app --host 0.0.0.0 --port 8000 --workers 4 --proxy-headers --forwarded-allow-ips='*'

--workers 4: multiple processes. Or use gunicorn -k uvicorn.workers.UvicornWorker -w 4.
--proxy-headers: trust X-Forwarded-* (for LB).
--forwarded-allow-ips: which proxy IPs to trust.

For K8s: usually one worker per container; let K8s do horizontal scaling.

Dockerfile

# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS builder
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev

FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/.venv ./.venv
COPY src ./src
ENV PATH=/app/.venv/bin:$PATH
USER 1000:1000
EXPOSE 8000
CMD ["uvicorn", "src.myapp.main:app", "--host", "0.0.0.0", "--port", "8000"]

Multi-stage; non-root; minimal runtime image. See Docker Best Practices .

Kubernetes deployment

apiVersion: apps/v1
kind: Deployment
metadata: { name: myapi }
spec:
  replicas: 3
  selector: { matchLabels: { app: myapi } }
  template:
    metadata: { labels: { app: myapi } }
    spec:
      containers:
        - name: api
          image: ghcr.io/me/myapi:1.2.3
          ports: [{ containerPort: 8000 }]
          resources:
            requests: { cpu: "200m", memory: "256Mi" }
            limits: { memory: "512Mi" }
          readinessProbe:
            httpGet: { path: /ready, port: 8000 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /healthz, port: 8000 }
            periodSeconds: 10
          startupProbe:
            httpGet: { path: /healthz, port: 8000 }
            periodSeconds: 5
            failureThreshold: 30
          envFrom:
            - secretRef: { name: myapi-secrets }
            - configMapRef: { name: myapi-config }
---
apiVersion: v1
kind: Service
metadata: { name: myapi }
spec:
  selector: { app: myapi }
  ports: [{ port: 80, targetPort: 8000 }]

startupProbe for slow-starting apps; readinessProbe for “ready to serve”; livenessProbe for “alive.”

See K8s Resource Limits .

Graceful shutdown

@asynccontextmanager
async def lifespan(app):
    # startup
    app.state.db = await create_pool()
    yield
    # shutdown
    await app.state.db.close()

Uvicorn handles SIGTERM:

Stops accepting new connections.
Waits up to --timeout-graceful-shutdown (default 30s) for in-flight to finish.
Cancels stragglers.
Runs lifespan shutdown.

For K8s: set terminationGracePeriodSeconds: 60 to match.

Pre-stop hook

lifecycle:
  preStop:
    exec:
      command: ["sh", "-c", "sleep 10"]

Wait 10s before SIGTERM so iptables / kube-proxy stop sending traffic before app shuts down. Avoids “connection refused” during rollout.

Autoscaling (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: myapi }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: myapi }
  minReplicas: 3
  maxReplicas: 30
  metrics:
    - type: Resource
      resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }

Or scale on RPS via custom metrics (Prometheus + KEDA).

For LLM-heavy or CPU-burst workloads: scale on inflight requests; CPU is misleading for IO-bound.

Secrets

apiVersion: v1
kind: Secret
metadata: { name: myapi-secrets }
stringData:
  DATABASE_URL: postgresql://...
  SECRET_KEY: ...

Or via External Secrets Operator from Vault / AWS Secrets Manager. See Secrets Management .

Never bake into image. Never commit to git.

TLS / HTTPS

Terminate at the LB (ALB / nginx / Cloudflare). FastAPI / Uvicorn don’t need TLS (cluster-internal HTTP fine).

If you must terminate at the app:

uvicorn ... --ssl-keyfile=key.pem --ssl-certfile=cert.pem

Usually managed by cert-manager + Let’s Encrypt at ingress.

Behind a reverse proxy

uvicorn ... --proxy-headers --forwarded-allow-ips='*'

Tells Starlette to read X-Forwarded-For, X-Forwarded-Proto, etc. So request.client.host is the actual client IP, not the LB.

Connection limits

Default Uvicorn: ~100 concurrent connections per worker. For more:

uvicorn ... --limit-concurrency 500 --backlog 1024

limit-concurrency: max simultaneous requests (return 503 beyond). backlog: TCP accept queue.

Database pool sizing

engine = create_async_engine(DATABASE_URL, pool_size=20, max_overflow=10, pool_pre_ping=True)

total = workers × replicas × (pool_size + max_overflow)
       ≤ db_max_connections

PgBouncer in transaction-pool mode multiplies your effective capacity.

CDN / caching

For static assets: CDN (Cloudflare / Vercel / S3 + CloudFront).

For dynamic content with cache headers:

@app.get("/posts")
async def list_posts(response: Response):
    response.headers["Cache-Control"] = "public, max-age=60, s-maxage=120"
    return await db.list_posts()

CDN respects; intermediaries cache.

Logging in production

JSON to stdout; cluster log collector (Fluent Bit, Promtail) ships to backend (Loki, ELK, Datadog).

# In structlog setup
processors = [..., structlog.processors.JSONRenderer()]

Don’t log to disk; container has no persistent disk.

Environment-based config

class Settings(BaseSettings):
    env: Literal["dev", "staging", "prod"] = "dev"
    database_url: str
    secret_key: str
    log_level: str = "INFO"
    
    model_config = {"env_prefix": "MYAPI_"}

Set per environment via K8s ConfigMap / Secret.

Migrations on deploy

# K8s Job runs before rolling deploy
apiVersion: batch/v1
kind: Job
metadata: { name: migrate-{{ .Values.image.tag }} }
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: ghcr.io/me/myapi:{{ .Values.image.tag }}
          command: ["alembic", "upgrade", "head"]
      restartPolicy: OnFailure

Or in-process at startup (simpler; slightly slower):

@asynccontextmanager
async def lifespan(app):
    await run_migrations()
    yield

For multi-replica deploys: only one should run migrations. K8s Job handles this.

See Database Migrations and Alembic textbook .

Blue/green vs rolling

For most: rolling. For risky / stateful: blue/green or canary. See Blue/Green vs Canary .

Production checklist

Before going live:

Multi-stage Dockerfile, non-root user, distroless / slim.
Pinned base image digest.
Health and readiness probes.
Resource requests + memory limits.
HPA configured.
Secrets via secret manager.
TLS at ingress.
Structured logging to stdout.
OTEL tracing exported.
Prometheus metrics exposed.
Sentry / error tracking integrated.
Graceful shutdown verified.
Pre-stop hook for rolling deploys.
Migration strategy.
Backups configured (DB, files).
DR plan.
Runbook for top alerts.
On-call setup.
Load test result documented.

Common mistakes

1. Single replica

One pod restart = downtime. Always >= 2.

2. No readiness probe

K8s sends traffic to a not-ready pod; users get errors.

3. Long startup, no startupProbe

Liveness fails during init; pod restart loop.

4. Hard-coded URLs

http://api-internal:8000 works in one cluster only. Use service discovery / config.

5. No graceful shutdown handling

In-flight requests dropped on rollout; users see errors.

What’s next

You’ve finished the FastAPI textbook. Next:

The SQLAlchemy 2.0 Textbook — DB-agnostic.
The Postgres-Focused SQLAlchemy Textbook .
The Pydantic v2 Textbook .
The Alembic Textbook .

If you want my full FastAPI production starter (this entire stack wired up), it’s at rajpoot.dev .

Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .

ASGI server choice#

Run command#

Dockerfile#

Kubernetes deployment#

Graceful shutdown#

Pre-stop hook#

Autoscaling (HPA)#

Secrets#

TLS / HTTPS#

Behind a reverse proxy#

Connection limits#

Database pool sizing#

CDN / caching#

Logging in production#

Environment-based config#

Migrations on deploy#

Blue/green vs rolling#

Production checklist#

Common mistakes#

1. Single replica#

2. No readiness probe#

3. Long startup, no startupProbe#

4. Hard-coded URLs#

5. No graceful shutdown handling#

What’s next#