Integration Cheatsheet 10 — Observability Stack
Cheatsheet: structlog with trace IDs, OTEL auto-instrumentation, Prometheus, slow-query log.
Cheatsheet: structlog with trace IDs, OTEL auto-instrumentation, Prometheus, slow-query log.
Cheatsheet: structlog with contextvars, OTEL auto-instrumentation, custom spans, Prometheus middleware, /metrics.
Production observability: structlog with contextvars, OTEL tracing, Prometheus metrics, log/trace correlation, and the patterns that pay back.
Cheatsheet: kube-prometheus-stack, Loki, traces, alerts.
Practical observability cost cuts: cardinality discipline, log sampling, trace tail-sampling, retention tiers, and self-hosting tradeoffs.
Practical distributed tracing: OTEL setup, span design, context propagation across services, head/tail sampling, and the operational realities.
Practical LLM observability: tracing every call, eval harnesses, regression detection, prompt versioning, and how to debug the model in production.
Picking a log aggregator in 2026: Loki for cheap storage, ClickHouse for query power, OpenSearch for full-text, Datadog when you can pay. Decision matrix and patterns.
What changed in observability since 2020. Wide events vs three-pillars, SLOs as the unit of conversation, OTel’s role, and how to actually find problems in production.
What to track in LLM apps, the tooling landscape (LangSmith, Langfuse, Helicone, Phoenix), the OTel GenAI conventions, and the metrics-and-traces playbook for production AI.