FastAPI Cheatsheet 18 — Streaming and LLM Integration
Cheatsheet: streaming Claude / GPT / vLLM tokens via SSE, tool-call loops, cancellation, prompt caching.
Cheatsheet: streaming Claude / GPT / vLLM tokens via SSE, tool-call loops, cancellation, prompt caching.
Cheatsheet: liveness vs readiness, deep health checks, lifespan resource management, startup probes.
Pre-launch checklist: deps, config, observability, security, data, deployment, on-call.
Practical FastAPI DI: scoped dependencies, async DB sessions, auth chains, request-scoped state, lifespan resources, and where DI starts to bite.
Choosing background processing for FastAPI: BackgroundTasks for fire-and-forget, ARQ for Redis-backed simplicity, Celery for ecosystem, Dramatiq for ergonomics.
Practical FastAPI streaming: SSE for one-way real-time, NDJSON for streamed JSON, LLM token streaming, backpressure handling, and reconnect patterns.
FastAPI BackgroundTasks looks convenient but has real limits. When it’s enough, when you need a real queue, and how to migrate without disrupting users.
Production FastAPI dependency injection: typed dep aliases, request-scoped vs app-scoped resources, async deps, lifespan management, and the patterns that keep a 50-endpoint codebase clean.
Production WebSocket patterns in FastAPI: connection management, auth, scaling beyond one process, broadcast via Redis Pub/Sub or NATS, and the gotchas that bite.
How to test FastAPI apps that survive: unit, integration with real Postgres, contract via OpenAPI, end-to-end, async-aware fixtures, and the patterns from production codebases.