The first time a backend goes down in production, you realize how much you don’t know about your own system. What changed? Which user is affected? Is the database the bottleneck? Which version of the code is running? “Logs and metrics” stops being an abstract good idea and becomes the only way to find out.
This post is the practical observability guide for backend developers. We’ll cover the three pillars (logs, metrics, traces), when each one is the right tool, and the tooling that makes them work together. By the end you’ll know what to instrument and how — without drowning in vendor pages.
The three pillars (and what they’re each good at)
Logs
Discrete events with context. The detailed story.
- Best for: “what happened on this specific request?”
- Cost model: roughly proportional to volume. Each line is stored.
- When to reach for them: debugging a specific incident.
Metrics
Numbers, aggregated over time. The dashboard story.
- Best for: “is the system healthy right now? Has it ever been?”
- Cost model: proportional to cardinality (number of unique label combinations), not raw data volume.
- When to reach for them: dashboards, alerts, capacity planning.
Traces
End-to-end view of a single request as it crosses services.
- Best for: “where in the call graph did time go?”
- Cost model: sampling-friendly; usually cheap if you sample.
- When to reach for them: distributed systems debugging, latency investigations.
You need all three. They’re not redundant; they answer different questions.
Logs: structured or it didn’t happen
The difference between logs that help and logs that don’t is structure. Plain text:
2026-04-28 10:31:22 ERROR Could not charge user 42 amount 100
Structured (JSON):
{"ts":"2026-04-28T10:31:22Z","level":"error","msg":"charge failed","user_id":42,"amount":100,"reason":"insufficient_funds","request_id":"abc-123"}
Why it matters: you can grep both. But you can only query the second one.
# In Loki / Elasticsearch / CloudWatch / etc:
{level="error"} | json | user_id="42"
In Python:
import structlog
log = structlog.get_logger()
log.info("charge_attempted", user_id=42, amount=100)
log.error("charge_failed", user_id=42, amount=100, reason="insufficient_funds")
In Go:
import "log/slog"
logger := slog.New(slog.NewJSONHandler(os.Stdout, nil))
logger.Info("charge_attempted", "user_id", 42, "amount", 100)
logger.Error("charge_failed", "user_id", 42, "amount", 100, "reason", "insufficient_funds")
Use a structured logger from day one. Plain print()/fmt.Println belongs in scripts, not services.
Add context
Every log entry should answer “for which request?”. Inject a request ID early in your middleware and attach it to every log line:
# FastAPI example
import uuid
from fastapi import Request
import structlog
@app.middleware("http")
async def add_request_id(request: Request, call_next):
request_id = request.headers.get("x-request-id") or str(uuid.uuid4())
structlog.contextvars.bind_contextvars(request_id=request_id)
response = await call_next(request)
response.headers["x-request-id"] = request_id
structlog.contextvars.clear_contextvars()
return response
Now every log.info(...) inside that request automatically includes the request ID. Trace through the logs of a failed request in seconds.
What to log
- Errors and warnings with full context (user/tenant ID, parameters, error type).
- Slow requests (>1s) — with timing breakdown.
- Auth events — logins, failures, role changes. (At a level you can audit later.)
- Data mutations at boundaries — payments, deletes, exports.
What NOT to log:
- Secrets. Tokens, passwords, API keys, JWT contents — never log them. Audit your logs after a code review.
- Every request. Your access log already does this; structured logs are for interesting events.
- PII without thought — at minimum, scrub it for cross-team views.
Where logs go
- Tiny scale: stdout/stderr → systemd journal → grep.
- Small/medium scale: ship to a hosted service (Better Stack, Papertrail, Logtail).
- Production: centralized aggregation — Loki (lightweight, integrates with Prometheus/Grafana), Elasticsearch/OpenSearch (powerful but heavy), CloudWatch / Cloud Logging (managed, cloud-locked).
Don’t try to grep tens of GBs of logs over SSH at 3 AM. You will lose.
Metrics: dashboards and alerts live here
Metrics are time series — for each name, a series of (timestamp, value) pairs, optionally with labels.
http_requests_total{method="GET", path="/users", status="200"} = 12483
http_request_duration_seconds_bucket{path="/users", le="0.1"} = 11200
db_connections_in_use{pool="primary"} = 7
In 2026, the de facto standard is Prometheus (or a Prometheus-compatible system: VictoriaMetrics, Mimir, Cortex, AWS Managed Prometheus).
The four metric types
- Counter — only goes up.
http_requests_total. Userate()to get per-second rate. - Gauge — can go up or down.
queue_depth,db_connections_in_use,memory_bytes. - Histogram — buckets of observations.
http_request_duration_seconds. Lets you compute percentiles (p50, p95, p99). - Summary — percentiles pre-computed at the source. Less flexible than histograms; usually prefer histograms.
What to instrument: RED and USE
Two acronyms that cover what you actually need:
- RED for services: Rate, Errors, Duration. For every endpoint.
- USE for resources: Utilization, Saturation, Errors. For CPU, memory, disk, network, DB connections.
If you have RED on every service and USE on every resource, you can debug almost any production issue.
Avoid the cardinality trap
Every unique label-value combination is a separate time series.
# BAD — user_id can be millions of values
http_requests_total{path="/users", user_id="42"}
# GOOD — bucketed status
http_requests_total{path="/users", status="200"}
A few hundred unique values per label is fine. A few million will OOM your metrics store. Never use unbounded values (user IDs, request IDs, error messages) as labels.
Instrumenting a Python app
from prometheus_client import Counter, Histogram, generate_latest, CONTENT_TYPE_LATEST
REQUEST_COUNT = Counter(
"http_requests_total", "Total HTTP requests",
["method", "path", "status"],
)
REQUEST_LATENCY = Histogram(
"http_request_duration_seconds", "HTTP request latency",
["method", "path"],
)
@app.middleware("http")
async def metrics_middleware(request, call_next):
with REQUEST_LATENCY.labels(request.method, request.url.path).time():
response = await call_next(request)
REQUEST_COUNT.labels(request.method, request.url.path, response.status_code).inc()
return response
@app.get("/metrics")
def metrics():
return Response(generate_latest(), media_type=CONTENT_TYPE_LATEST)
Prometheus scrapes /metrics every few seconds. Grafana queries Prometheus to render dashboards.
Alerts
Metrics only matter if someone gets paged when they go bad. A few alerts every backend should have:
- Error rate > X% for Y minutes —
rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.01 - p99 latency > X ms for Y minutes —
histogram_quantile(0.99, ...) > 1.0 - Database connections saturated —
db_connections_in_use / db_pool_size > 0.9 - Disk space < 10%
- Process restarts — your service crash-looping
- Queue depth growing unbounded (for Celery/RQ/etc.)
Tune until you trust them. False pages are the fastest way to make a team ignore real ones.
Traces: where did the time go?
In a single-service app, the slow part is usually obvious. In a microservice setup, “slow API” could be any of 12 services, 3 caches, 2 databases, or the network in between.
Distributed tracing assigns each request a trace ID that’s propagated across all services it touches. Each span (a unit of work) records start time, duration, and parent span. The trace UI shows you a flame graph of the whole request.
[ ────────────── /api/orders (180ms) ──────────────────────── ]
[ auth (12ms) ]
[ ── db: load order (90ms) ── ]
[ stripe (60ms) ]
[ db: write log (5ms) ]
Now “this endpoint is slow” becomes “Stripe is the bottleneck.”
OpenTelemetry: the standard
OpenTelemetry (OTel) is the vendor-neutral standard for traces (and metrics, and logs). Most modern services support it natively.
In Python:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
Run with auto-instrumentation:
opentelemetry-instrument \
--traces_exporter otlp \
--metrics_exporter otlp \
--service_name my-api \
uvicorn app.main:app
It auto-instruments common libraries (FastAPI, requests, httpx, SQLAlchemy, psycopg, redis, etc.). Send to Jaeger, Tempo, Honeycomb, Datadog APM — the OTLP wire protocol works with all of them.
Manual instrumentation when you need it:
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
def expensive():
with tracer.start_as_current_span("compute_thing") as span:
span.set_attribute("input.size", len(data))
result = do_work(data)
span.set_attribute("output.size", len(result))
return result
Sampling
Tracing every request at scale is expensive and noisy. Sample:
- Head-based sampling — decide at the request entry point (e.g. 1% of all requests).
- Tail-based sampling — collect all traces, decide whether to keep them after they finish (keep all errors, slow requests, sample fast ones).
Tail-based is better quality but harder operationally. Head-based at 1-10% is fine for most teams.
The tooling stack you probably want
For a typical small-to-medium backend in 2026:
- Logs: Loki + Grafana, or a hosted alternative.
- Metrics: Prometheus + Grafana, or a hosted Prometheus.
- Traces: Tempo + Grafana, or Jaeger.
- Errors: Sentry. (Yes, even with logs and traces — Sentry’s automatic grouping is unmatched.)
- Uptime: Better Stack, UptimeRobot, Pingdom — external pings.
The “Grafana stack” (Loki, Prometheus, Tempo, Mimir, all visualized in Grafana) is the strongest open-source story. Hosted options (Grafana Cloud, Datadog, New Relic, Honeycomb) trade money for less ops.
For a small team, start with Sentry + a hosted log service + a hosted metrics service. The unit cost is low; the time saved is enormous. Self-host when scale demands it.
Health checks: the entry point to all of this
Your service should expose:
/healthz— fast liveness. Returns 200 if the process is up./readyz— readiness. Returns 200 only if the service can actually serve traffic (DB reachable, caches warm)./metrics— Prometheus scrape endpoint.
Load balancers and orchestrators (Kubernetes, Nomad) use the first two; Prometheus uses the third.
A “minimum viable observability” checklist
For any service going to production:
- Structured logs (JSON) at INFO level by default.
- Request ID middleware that attaches an ID to every log line in a request.
-
/healthzand/readyzendpoints. -
/metricsendpoint with at least RED metrics for HTTP and queue depth for any background workers. - Alerts for: 5xx rate, p99 latency, DB connection saturation, disk space.
- Sentry (or equivalent) for unhandled exceptions, with release tagging tied to your CI version.
- Uptime monitor pinging
/healthzfrom outside your network.
That’s the minimum. Most teams stop here for a long time and that’s okay.
Conclusion
Observability is the difference between debugging in production with confidence and guessing in a panic. Three pillars, complementary not redundant: logs for the story, metrics for the dashboard, traces for the call graph. Get the basics in place from day one — it’s far easier than retrofitting after an incident.
For more on the platform layer, see Kubernetes for App Developers . For the deploy pipeline, see GitHub Actions CI/CD for Python Apps .
Happy observing!
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .