Log volume in 2026 is brutal. A small SaaS generates 100s of GB/day; mid-size, terabytes. The choice of log aggregator decides whether logs are useful or unaffordable. This post is the comparison.

The contenders

TypeStrengthsCost
Grafana LokiIndex labels, store rawCheap; Grafana native$-$$
ClickHouseOLAP DB for logsFastest queries; SQL$-$$
OpenSearch / ElasticsearchFull-text indexSearch-heavy$$-$$$
Datadog LogsManagedPolished; expensive$$$-$$$$
SplunkEnterpriseMature; very expensive$$$$
HoneycombWide eventsEvent-shape native$$

Loki

Designed by Grafana Labs. Indexes only labels (service, level, env); stores raw log lines compressed.

Pros:

  • Storage cost comparable to S3 (uses object storage).
  • Grafana-native — same UI as metrics + traces.
  • LogQL is decent for label-based filtering.

Cons:

  • Slow at full-text queries vs Elastic. Optimized for “filter by label, scan small subset.”
  • Cardinality limits — too many distinct label values explodes storage.

Best for: medium-volume teams that want cheap logs with Grafana.

ClickHouse for logs

A growing pattern. ClickHouse is OLAP; logs are event tables; SQL queries them.

SELECT count() FROM logs
WHERE service = 'api'
  AND level = 'error'
  AND ts > now() - INTERVAL 1 HOUR
GROUP BY tenant_id
ORDER BY count() DESC
LIMIT 20;

Faster than anything else for “tell me X across millions of rows.” The OTel collector can ingest directly to ClickHouse. See Observability 2.0 .

Best for: teams comfortable with SQL who want maximum query power.

OpenSearch / Elasticsearch

Full-text search on logs. Powerful, expensive at scale.

Best for: log analytics where free-text search dominates queries.

Datadog

Managed. Polished UI. Massive cost. Many teams pay 50% of cloud bill on Datadog logs.

Best for: teams with budget who don’t want to operate their own logging.

Decision matrix

ScenarioPick
Greenfield, on GrafanaLoki
Want SQL on logsClickHouse
Heavy free-text searchOpenSearch
Budget; want managedDatadog
Wide-events shapeHoneycomb / ClickHouse
< 1 GB / dayany (or stdout + grep)

Cost reality

For 1 TB/day:

  • Datadog: $5k–10k/month.
  • Self-hosted Loki + S3: $200–500/month.
  • Self-hosted ClickHouse: $300–800/month.

The 10× factor explains why teams move off Datadog at scale.

Patterns that work

Sample low-value logs

Not every line is valuable. Sample debug logs aggressively:

if random.random() < 0.01:
    log.debug(...)        # 1% sampling

Errors and warnings: 100%. Info: 10%. Debug: 1%.

Drop high-cardinality labels at the collector

OTel collector processor can strip labels that explode cardinality:

processors:
  attributes/drop:
    actions:
      - key: request_id
        action: delete

Keep labels you’ll group by; drop ones that are noise.

Structured from day one

log.info("user_action", user_id=42, action="login", duration_ms=120)

Not strings to parse. Json / structured. SQL-able. Searchable.

Retention tiers

  • Hot (last 14 days): in the queryable store.
  • Warm (14–90 days): cheaper tier or aggregated.
  • Cold: S3 archive for compliance.

Most queries hit the last 7 days. Pay for hot only what you need; archive the rest cheaply.

What I’d ship today

For a 2026 backend service generating ~500 GB/day:

  • OTel collector at the edge.
  • ClickHouse for storage + SQL queries.
  • Grafana for dashboards + log exploration.
  • S3 archive of older partitions.
  • Sampling at the collector.

End-to-end self-hosted on a couple of beefy nodes. ~$500/month vs ~$5k/month managed.

Read this next

If you want my OTel + ClickHouse + Grafana log stack, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .