Log volume in 2026 is brutal. A small SaaS generates 100s of GB/day; mid-size, terabytes. The choice of log aggregator decides whether logs are useful or unaffordable. This post is the comparison.
The contenders
| Type | Strengths | Cost | |
|---|---|---|---|
| Grafana Loki | Index labels, store raw | Cheap; Grafana native | $-$$ |
| ClickHouse | OLAP DB for logs | Fastest queries; SQL | $-$$ |
| OpenSearch / Elasticsearch | Full-text index | Search-heavy | $$-$$$ |
| Datadog Logs | Managed | Polished; expensive | $$$-$$$$ |
| Splunk | Enterprise | Mature; very expensive | $$$$ |
| Honeycomb | Wide events | Event-shape native | $$ |
Loki
Designed by Grafana Labs. Indexes only labels (service, level, env); stores raw log lines compressed.
Pros:
- Storage cost comparable to S3 (uses object storage).
- Grafana-native — same UI as metrics + traces.
- LogQL is decent for label-based filtering.
Cons:
- Slow at full-text queries vs Elastic. Optimized for “filter by label, scan small subset.”
- Cardinality limits — too many distinct label values explodes storage.
Best for: medium-volume teams that want cheap logs with Grafana.
ClickHouse for logs
A growing pattern. ClickHouse is OLAP; logs are event tables; SQL queries them.
SELECT count() FROM logs
WHERE service = 'api'
AND level = 'error'
AND ts > now() - INTERVAL 1 HOUR
GROUP BY tenant_id
ORDER BY count() DESC
LIMIT 20;
Faster than anything else for “tell me X across millions of rows.” The OTel collector can ingest directly to ClickHouse. See Observability 2.0 .
Best for: teams comfortable with SQL who want maximum query power.
OpenSearch / Elasticsearch
Full-text search on logs. Powerful, expensive at scale.
Best for: log analytics where free-text search dominates queries.
Datadog
Managed. Polished UI. Massive cost. Many teams pay 50% of cloud bill on Datadog logs.
Best for: teams with budget who don’t want to operate their own logging.
Decision matrix
| Scenario | Pick |
|---|---|
| Greenfield, on Grafana | Loki |
| Want SQL on logs | ClickHouse |
| Heavy free-text search | OpenSearch |
| Budget; want managed | Datadog |
| Wide-events shape | Honeycomb / ClickHouse |
| < 1 GB / day | any (or stdout + grep) |
Cost reality
For 1 TB/day:
- Datadog: $5k–10k/month.
- Self-hosted Loki + S3: $200–500/month.
- Self-hosted ClickHouse: $300–800/month.
The 10× factor explains why teams move off Datadog at scale.
Patterns that work
Sample low-value logs
Not every line is valuable. Sample debug logs aggressively:
if random.random() < 0.01:
log.debug(...) # 1% sampling
Errors and warnings: 100%. Info: 10%. Debug: 1%.
Drop high-cardinality labels at the collector
OTel collector processor can strip labels that explode cardinality:
processors:
attributes/drop:
actions:
- key: request_id
action: delete
Keep labels you’ll group by; drop ones that are noise.
Structured from day one
log.info("user_action", user_id=42, action="login", duration_ms=120)
Not strings to parse. Json / structured. SQL-able. Searchable.
Retention tiers
- Hot (last 14 days): in the queryable store.
- Warm (14–90 days): cheaper tier or aggregated.
- Cold: S3 archive for compliance.
Most queries hit the last 7 days. Pay for hot only what you need; archive the rest cheaply.
What I’d ship today
For a 2026 backend service generating ~500 GB/day:
- OTel collector at the edge.
- ClickHouse for storage + SQL queries.
- Grafana for dashboards + log exploration.
- S3 archive of older partitions.
- Sampling at the collector.
End-to-end self-hosted on a couple of beefy nodes. ~$500/month vs ~$5k/month managed.
Read this next
- OpenTelemetry End-to-End in 2026
- Observability 2.0
- Kubernetes Cost Engineering
- Cloud Cost Optimization in 2026
If you want my OTel + ClickHouse + Grafana log stack, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .