Distributed Tracing in 2026 — OpenTelemetry, Trace Context, and What Actually Helps Debugging
Practical distributed tracing: OTEL setup, span design, context propagation across services, head/tail sampling, and the operational realities.
Practical distributed tracing: OTEL setup, span design, context propagation across services, head/tail sampling, and the operational realities.
Practical circuit breakers: the closed/open/half-open state machine, threshold tuning, fallback strategies, libraries (resilience4j, py-breaker), and where breakers go wrong.
Practical event sourcing: append-only streams, projections, snapshots, replay, and an honest take on when ES pays off vs when CRUD wins.
Practical API protocol selection: REST for simplicity and HTTP cache, gRPC for service-to-service perf, GraphQL for client-driven queries — and the gotchas of each.
Production Django + Celery: task signatures, idempotency, retry strategies, beat scheduling, monitoring, and how to avoid the lost-job class of bugs.
Practical Axum middleware: auth, request ID, tracing, rate limiting, error mapping, and how Tower’s layer composition actually works.
Picking an API gateway. Kong, Envoy, Cloudflare Workers as gateway, Tyk, and the patterns from real platforms — auth, rate limiting, transformations, observability.
Designing customer-facing API rate limits. Tier structure, quota types (per-second / per-minute / per-day), Stripe / GitHub-style response headers, 429 with Retry-After, and the patterns customers actually integrate with.
Comparison of binary serialization formats. Protobuf for typed RPC. MessagePack / CBOR as JSON drop-ins. FlatBuffers when zero-copy matters. JSON when humans must read.
The full expand-and-contract migration playbook. Add a column, rename a column, drop a column, change a type — each as a multi-step deploy that never blocks.