Cheatsheet for shipping it. Long-form: Textbook Ch 12 .

Dockerfile (uv + multi-stage + distroless)

# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS builder
WORKDIR /app
RUN pip install --no-cache-dir uv
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
    uv sync --frozen --no-dev

FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/.venv ./.venv
COPY src ./src
ENV PATH="/app/.venv/bin:$PATH" \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1
USER 1000:1000
EXPOSE 8000
CMD ["uvicorn", "src.myapp.main:app", "--host", "0.0.0.0", "--port", "8000", "--proxy-headers", "--forwarded-allow-ips=*"]

.dockerignore

.git
.venv
__pycache__
*.pyc
.env
.env.local
.pytest_cache
htmlcov
dist
.mypy_cache
.ruff_cache
node_modules

Uvicorn flags

uvicorn src.myapp.main:app \
  --host 0.0.0.0 --port 8000 \
  --workers 4 \
  --proxy-headers --forwarded-allow-ips='*' \
  --timeout-keep-alive 5 \
  --timeout-graceful-shutdown 30 \
  --limit-concurrency 500 \
  --backlog 1024 \
  --access-log

For K8s: usually 1 worker per container; let K8s scale horizontally.

Gunicorn + Uvicorn workers

gunicorn src.myapp.main:app \
  -k uvicorn.workers.UvicornWorker \
  -w 4 \
  -b 0.0.0.0:8000 \
  --timeout 60 \
  --graceful-timeout 30 \
  --keep-alive 5 \
  --access-logfile -

K8s deployment

apiVersion: apps/v1
kind: Deployment
metadata: { name: myapi }
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate: { maxSurge: 1, maxUnavailable: 0 }
  selector: { matchLabels: { app: myapi } }
  template:
    metadata: { labels: { app: myapi } }
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: ghcr.io/me/myapi:1.2.3
          ports: [{ containerPort: 8000 }]
          resources:
            requests: { cpu: "200m", memory: "256Mi" }
            limits: { memory: "512Mi" }
          startupProbe:
            httpGet: { path: /healthz, port: 8000 }
            periodSeconds: 5
            failureThreshold: 30
          readinessProbe:
            httpGet: { path: /ready, port: 8000 }
            periodSeconds: 5
          livenessProbe:
            httpGet: { path: /healthz, port: 8000 }
            periodSeconds: 10
          envFrom:
            - secretRef: { name: myapi-secrets }
            - configMapRef: { name: myapi-config }
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 10"]
---
apiVersion: v1
kind: Service
metadata: { name: myapi }
spec:
  selector: { app: myapi }
  ports: [{ port: 80, targetPort: 8000 }]

HPA (CPU)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: myapi }
spec:
  scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: myapi }
  minReplicas: 3
  maxReplicas: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target: { type: Utilization, averageUtilization: 70 }

Migration as Job (Helm)

apiVersion: batch/v1
kind: Job
metadata:
  name: migrate-{{ .Values.image.tag }}
  annotations:
    helm.sh/hook: pre-install,pre-upgrade
    helm.sh/hook-weight: "0"
spec:
  template:
    spec:
      containers:
        - name: migrate
          image: ghcr.io/me/myapi:{{ .Values.image.tag }}
          command: ["alembic", "upgrade", "head"]
          envFrom: [{ secretRef: { name: myapi-secrets } }]
      restartPolicy: OnFailure
  backoffLimit: 3

Lifespan (graceful shutdown of pools)

from contextlib import asynccontextmanager

@asynccontextmanager
async def lifespan(app: FastAPI):
    app.state.engine = create_async_engine(URL, pool_size=10, max_overflow=10, pool_pre_ping=True, pool_recycle=300)
    app.state.sm = async_sessionmaker(app.state.engine, expire_on_commit=False)
    app.state.http = httpx.AsyncClient(timeout=10)
    yield
    await app.state.http.aclose()
    await app.state.engine.dispose()

app = FastAPI(lifespan=lifespan)

Behind a proxy (X-Forwarded-*)

uvicorn ... --proxy-headers --forwarded-allow-ips='*'

TLS

Terminate at LB / ingress. cert-manager + Let’s Encrypt for K8s.

Connection limits

uvicorn ... --limit-concurrency 500 --backlog 1024

Production checklist

  • Non-root USER in Dockerfile.
  • Pinned image digest.
  • Health/readiness/startup probes.
  • Memory limit set (CPU usually no limit).
  • Resource requests sized via Goldilocks/VPA data.
  • HPA configured.
  • Secrets from secret manager (ESO + Vault) or platform secret.
  • TLS at ingress.
  • Structured JSON logs to stdout.
  • OTEL traces exported.
  • Prometheus metrics exposed.
  • Sentry configured.
  • Pre-stop hook for clean rollout.
  • Migration Job in pre-install/upgrade.
  • Backups (DB).

Read this next

If you want my full FastAPI Helm chart, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .