Cheatsheet for deploying the stack.
Dockerfile
# syntax=docker/dockerfile:1.7
FROM python:3.13-slim AS builder
WORKDIR /app
RUN pip install uv
COPY pyproject.toml uv.lock ./
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-dev
FROM python:3.13-slim
WORKDIR /app
COPY --from=builder /app/.venv ./.venv
COPY src ./src
COPY alembic.ini ./
COPY migrations ./migrations
ENV PATH=/app/.venv/bin:$PATH
USER 1000:1000
EXPOSE 8000
CMD ["uvicorn", "src.myapp.main:app", "--host", "0.0.0.0", "--port", "8000"]
Includes migrations so the same image can run them.
Migration as K8s Job
apiVersion: batch/v1
kind: Job
metadata:
name: migrate-{{ .Values.image.tag }}
annotations:
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "0"
"helm.sh/hook-delete-policy": before-hook-creation
spec:
template:
spec:
containers:
- name: migrate
image: ghcr.io/me/myapp:{{ .Values.image.tag }}
command: ["alembic", "upgrade", "head"]
envFrom:
- secretRef: { name: myapp-secrets }
- configMapRef: { name: myapp-config }
restartPolicy: OnFailure
backoffLimit: 3
ttlSecondsAfterFinished: 300
Runs before app pods rollout.
Deployment
apiVersion: apps/v1
kind: Deployment
metadata: { name: myapp }
spec:
replicas: 3
selector: { matchLabels: { app: myapp } }
template:
metadata: { labels: { app: myapp } }
spec:
terminationGracePeriodSeconds: 60
containers:
- name: api
image: ghcr.io/me/myapp:{{ .Values.image.tag }}
ports: [{ containerPort: 8000 }]
envFrom:
- secretRef: { name: myapp-secrets }
- configMapRef: { name: myapp-config }
resources:
requests: { cpu: "200m", memory: "256Mi" }
limits: { memory: "512Mi" }
startupProbe:
httpGet: { path: /healthz, port: 8000 }
periodSeconds: 5
failureThreshold: 30
readinessProbe:
httpGet: { path: /ready, port: 8000 }
periodSeconds: 5
livenessProbe:
httpGet: { path: /healthz, port: 8000 }
periodSeconds: 10
lifecycle:
preStop:
exec: { command: ["sh", "-c", "sleep 10"] }
Service
apiVersion: v1
kind: Service
metadata: { name: myapp }
spec:
selector: { app: myapp }
ports: [{ port: 80, targetPort: 8000 }]
Secrets
apiVersion: v1
kind: Secret
metadata: { name: myapp-secrets }
stringData:
MYAPP_DATABASE_URL: postgresql+asyncpg://...
MYAPP_SECRET_KEY: ...
Or use External Secrets Operator + Vault / AWS Secrets Manager.
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata: { name: myapp-config }
data:
MYAPP_ENV: prod
MYAPP_LOG_LEVEL: INFO
HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata: { name: myapp }
spec:
scaleTargetRef: { apiVersion: apps/v1, kind: Deployment, name: myapp }
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource: { name: cpu, target: { type: Utilization, averageUtilization: 70 } }
Lifespan
@asynccontextmanager
async def lifespan(app: FastAPI):
s = get_settings()
app.state.engine = create_async_engine(s.database_url, ...)
app.state.sm = async_sessionmaker(app.state.engine, expire_on_commit=False)
yield
await app.state.engine.dispose()
SIGTERM → ASGI server stops accepting → existing requests finish → lifespan shutdown → exit.
Graceful shutdown gotchas
terminationGracePeriodSecondsmust exceed lifecycle preStop sleep + longest request.preStop sleep 10allows kube-proxy to stop routing.
Health endpoints
@app.get("/healthz", include_in_schema=False)
async def healthz(): return {"status": "ok"}
@app.get("/ready", include_in_schema=False)
async def ready(db: AsyncSession = Depends(get_db)):
try:
await db.execute(text("SELECT 1"))
except Exception:
return JSONResponse({"status": "not ready"}, status_code=503)
return {"status": "ready"}
Liveness: process up. Readiness: DB reachable.
Single uvicorn worker per container
CMD ["uvicorn", "src.myapp.main:app", "--host", "0.0.0.0", "--port", "8000"]
K8s scales horizontally (replicas). Don’t --workers 4 inside a container; K8s manages scale.
TLS
Terminate at ingress (ALB / nginx / Cloudflare). App speaks HTTP internally.
Behind a proxy
# main.py
from fastapi.middleware.trustedhost import TrustedHostMiddleware
app.add_middleware(TrustedHostMiddleware, allowed_hosts=["myapp.com", "*.myapp.com"])
Or via uvicorn --proxy-headers --forwarded-allow-ips='*' to read X-Forwarded-*.
DB pool sizing
engine = create_async_engine(
url, pool_size=20, max_overflow=10,
pool_pre_ping=True, pool_recycle=3600,
)
total_connections = replicas × (pool_size + max_overflow)
≤ db_max_connections (with margin for admin + replication)
PgBouncer in transaction-pool for higher multiplexing.
Common mistakes
- Migration Job not in
pre-install,pre-upgradehook — runs after app pods (which fail). terminationGracePeriodSecondstoo short — kills in-flight requests.- Missing
pool_pre_ping— stale conns after DB restart. - No PodDisruptionBudget — voluntary disruptions kill all replicas.
- Plain HTTP secrets in YAML — use sealed-secrets / ESO.
Read this next
If you want my Helm chart for the whole stack, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .