Chapter 12, the final chapter of the FastAPI textbook: shipping it. ASGI server selection, Docker, Kubernetes, autoscaling, graceful shutdown, secrets, the production checklist.
ASGI server choice
| When | |
|---|---|
| Uvicorn | Default; uvloop + httptools; fast |
| Hypercorn | HTTP/2, HTTP/3 (QUIC), trio support |
| Granian | Rust-based; very fast; newer |
| Daphne | Original ASGI; mostly historical |
For most production: Uvicorn. Hypercorn if you need HTTP/2. Granian for max raw perf.
Run command
uvicorn src.myapp.main:app --host 0.0.0.0 --port 8000 --workers 4 --proxy-headers --forwarded-allow-ips='*'
--workers 4: multiple processes. Or usegunicorn -k uvicorn.workers.UvicornWorker -w 4.--proxy-headers: trust X-Forwarded-* (for LB).--forwarded-allow-ips: which proxy IPs to trust.
For K8s: usually one worker per container; let K8s do horizontal scaling.
Dockerfile
# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS builder
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev
FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/.venv ./.venv
COPY src ./src
ENV PATH=/app/.venv/bin:$PATH
USER 1000:1000
EXPOSE 8000
CMD ["uvicorn", "src.myapp.main:app", "--host", "0.0.0.0", "--port", "8000"]
Multi-stage; non-root; minimal runtime image. See Docker Best Practices .
Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata: { name: myapi }
spec:
replicas: 3
selector: { matchLabels: { app: myapi } }
template:
metadata: { labels: { app: myapi } }
spec:
containers:
- name: api
image: ghcr.io/me/myapi:1.2.3
ports: [{ containerPort: 8000 }]
resources:
requests: { cpu: "200m", memory: "256Mi" }
limits: { memory: "512Mi" }
readinessProbe:
httpGet: { path: /ready, port: 8000 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8000 }
periodSeconds: 10
startupProbe:
httpGet: { path: /healthz, port: 8000 }
periodSeconds: 5
failureThreshold: 30
envFrom:
- secretRef: { name: myapi-secrets }
- configMapRef: { name: myapi-config }
---
apiVersion: v1
kind: Service
metadata: { name: myapi }
spec:
selector: { app: myapi }
ports: [{ port: 80, targetPort: 8000 }]
startupProbe for slow-starting apps; readinessProbe for “ready to serve”; livenessProbe for “alive.”
See K8s Resource Limits .
Graceful shutdown
@asynccontextmanager
async def lifespan(app):
# startup
app.state.db = await create_pool()
yield
# shutdown
await app.state.db.close()
Uvicorn handles SIGTERM:
- Stops accepting new connections.
- Waits up to
--timeout-graceful-shutdown(default 30s) for in-flight to finish. - Cancels stragglers.
- Runs lifespan shutdown.
For K8s: set terminationGracePeriodSeconds: 60 to match.
Pre-stop hook
lifecycle:
preStop:
exec:
command: ["sh", "-c", "sleep 10"]
Wait 10s before SIGTERM so iptables / kube-proxy stop sending traffic before app shuts down. Avoids “connection refused” during rollout.
Autoscaling (HPA)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: myapi }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: myapi }
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
Or scale on RPS via custom metrics (Prometheus + KEDA).
For LLM-heavy or CPU-burst workloads: scale on inflight requests; CPU is misleading for IO-bound.
Secrets
apiVersion: v1
kind: Secret
metadata: { name: myapi-secrets }
stringData:
DATABASE_URL: postgresql://...
SECRET_KEY: ...
Or via External Secrets Operator from Vault / AWS Secrets Manager. See Secrets Management .
Never bake into image. Never commit to git.
TLS / HTTPS
Terminate at the LB (ALB / nginx / Cloudflare). FastAPI / Uvicorn don’t need TLS (cluster-internal HTTP fine).
If you must terminate at the app:
uvicorn ... --ssl-keyfile=key.pem --ssl-certfile=cert.pem
Usually managed by cert-manager + Let’s Encrypt at ingress.
Behind a reverse proxy
uvicorn ... --proxy-headers --forwarded-allow-ips='*'
Tells Starlette to read X-Forwarded-For, X-Forwarded-Proto, etc. So request.client.host is the actual client IP, not the LB.
Connection limits
Default Uvicorn: ~100 concurrent connections per worker. For more:
uvicorn ... --limit-concurrency 500 --backlog 1024
limit-concurrency: max simultaneous requests (return 503 beyond).
backlog: TCP accept queue.
Database pool sizing
engine = create_async_engine(DATABASE_URL, pool_size=20, max_overflow=10, pool_pre_ping=True)
total = workers × replicas × (pool_size + max_overflow)
≤ db_max_connections
PgBouncer in transaction-pool mode multiplies your effective capacity.
CDN / caching
For static assets: CDN (Cloudflare / Vercel / S3 + CloudFront).
For dynamic content with cache headers:
@app.get("/posts")
async def list_posts(response: Response):
response.headers["Cache-Control"] = "public, max-age=60, s-maxage=120"
return await db.list_posts()
CDN respects; intermediaries cache.
Logging in production
JSON to stdout; cluster log collector (Fluent Bit, Promtail) ships to backend (Loki, ELK, Datadog).
# In structlog setup
processors = [..., structlog.processors.JSONRenderer()]
Don’t log to disk; container has no persistent disk.
Environment-based config
class Settings(BaseSettings):
env: Literal["dev", "staging", "prod"] = "dev"
database_url: str
secret_key: str
log_level: str = "INFO"
model_config = {"env_prefix": "MYAPI_"}
Set per environment via K8s ConfigMap / Secret.
Migrations on deploy
# K8s Job runs before rolling deploy
apiVersion: batch/v1
kind: Job
metadata: { name: migrate-{{ .Values.image.tag }} }
spec:
template:
spec:
containers:
- name: migrate
image: ghcr.io/me/myapi:{{ .Values.image.tag }}
command: ["alembic", "upgrade", "head"]
restartPolicy: OnFailure
Or in-process at startup (simpler; slightly slower):
@asynccontextmanager
async def lifespan(app):
await run_migrations()
yield
For multi-replica deploys: only one should run migrations. K8s Job handles this.
See Database Migrations and Alembic textbook .
Blue/green vs rolling
For most: rolling. For risky / stateful: blue/green or canary. See Blue/Green vs Canary .
Production checklist
Before going live:
- Multi-stage Dockerfile, non-root user, distroless / slim.
- Pinned base image digest.
- Health and readiness probes.
- Resource requests + memory limits.
- HPA configured.
- Secrets via secret manager.
- TLS at ingress.
- Structured logging to stdout.
- OTEL tracing exported.
- Prometheus metrics exposed.
- Sentry / error tracking integrated.
- Graceful shutdown verified.
- Pre-stop hook for rolling deploys.
- Migration strategy.
- Backups configured (DB, files).
- DR plan.
- Runbook for top alerts.
- On-call setup.
- Load test result documented.
Common mistakes
1. Single replica
One pod restart = downtime. Always >= 2.
2. No readiness probe
K8s sends traffic to a not-ready pod; users get errors.
3. Long startup, no startupProbe
Liveness fails during init; pod restart loop.
4. Hard-coded URLs
http://api-internal:8000 works in one cluster only. Use service discovery / config.
5. No graceful shutdown handling
In-flight requests dropped on rollout; users see errors.
What’s next
You’ve finished the FastAPI textbook. Next:
- The SQLAlchemy 2.0 Textbook — DB-agnostic.
- The Postgres-Focused SQLAlchemy Textbook .
- The Pydantic v2 Textbook .
- The Alembic Textbook .
If you want my full FastAPI production starter (this entire stack wired up), it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .