ClickHouse pitfalls.

Per-row INSERT

for row in rows:
    client.insert("events", [row])    # SLOW

Fix: batch.

Treating as OLTP

CH does not support UPDATE/DELETE well. Mutations are async + heavy.

Too many partitions

PARTITION BY toDate(ts) on table with years → 10000+ partitions → slow.

Use month: toYYYYMM(ts).

Wrong ORDER BY

ORDER BY (event, user_id, ts)

Queries usually filter by user_id first → bad. Should be:

ORDER BY (user_id, ts, event)

Joining big tables

CH joins less optimized. Use:

  • Dictionaries.
  • Denormalization.
  • Materialized views.

SELECT *

Loads all columns from disk. Be specific.

Nullable when not needed

Adds overhead. Use default values.

High-cardinality LowCardinality

url LowCardinality(String)   -- BAD if millions of unique URLs

OPTIMIZE FINAL frequently

Locks parts; long operation. Only when really needed.

ALTER UPDATE / DELETE

Async, expensive. Avoid. Re-design schema if needed.

Distributed insert at high QPS

Distributed table forwards to shards. Overhead. Insert directly to local shard.

Forgetting replication

ReplicatedMergeTree only with Keeper / ZK. Without: single point of failure.

Memory limits

max_memory_usage = 10GB

Default may be too low for big queries.

Inappropriate compression

ZSTD on hot path → CPU bound.

LZ4 on cold archive → wasted space.

Missing skip index

Filtering frequently on non-key column → COLLSCAN. Add skip index.

Not testing on prod-scale data

Performance characteristics change with data size. Test on representative volumes.

Storage policy confusion

Hot/cold tiering misconfigured → data lands wrong tier.

Materialized view backfill missing

MV only sees future. Manual INSERT SELECT for historical.

Mixing sync vs async insert

SET async_insert = 1

Per-row works but inconsistent perf.

Common mistakes

  • Treating CH like Postgres.
  • Per-row workflows.
  • Joining everything.
  • Random ORDER BY.
  • No monitoring of merges / mutations.

Read this next

If you want my CH troubleshooting checklist, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .