ClickHouse query optimization.
Primary key (ORDER BY)
ORDER BY (date, user_id, event)
Queries filtering on prefix use primary index → fast.
WHERE date = today() AND user_id = 1 -- fast
WHERE event = 'x' -- slow (last in key)
Skip index
ALTER TABLE events ADD INDEX idx_event event TYPE bloom_filter(0.01) GRANULARITY 4;
For non-key filter columns.
Projections
ALTER TABLE events ADD PROJECTION top_users (
SELECT user_id, count()
GROUP BY user_id
);
CH auto-uses for matching queries.
Sampling
SELECT count() FROM events SAMPLE 0.1 -- 10% sample
Requires SAMPLE BY in table.
LIMIT BY
SELECT user_id, event, ts FROM events ORDER BY ts DESC LIMIT 5 BY user_id
Top 5 per user.
arrayJoin / explode
SELECT arrayJoin([1,2,3]) AS x
uniqCombined / uniqExact
uniqCombined(user_id) -- HyperLogLog, fast, approximate
uniqExact(user_id) -- exact, more memory
quantile
quantile(0.99)(latency)
quantileExact / quantileTDigest
any / argMax
SELECT user_id, argMax(name, version) AS latest_name FROM users GROUP BY user_id
anyLast / first_value
For pre-aggregated picks.
arrayMap / filter
SELECT arrayMap(x -> x * 2, [1,2,3])
SELECT arrayFilter(x -> x > 5, [1,5,10])
JOIN strategies
SELECT * FROM events ANY LEFT JOIN users USING user_id -- get first match
SELECT * FROM events INNER JOIN users ON ...
ClickHouse JOINs less optimized than Postgres. Prefer denormalized or dictGet.
Dictionaries (precomputed joins)
CREATE DICTIONARY users_dict (
user_id UInt32, name String, country String
) PRIMARY KEY user_id
SOURCE(CLICKHOUSE(table 'users'))
LAYOUT(HASHED())
LIFETIME(3600);
-- Use
SELECT dictGet('users_dict', 'name', user_id) FROM events;
In-memory lookup. Replaces small-table joins.
Materialized columns
ALTER TABLE events ADD COLUMN day Date MATERIALIZED toDate(ts);
Computed on insert. Indexable.
EXPLAIN
EXPLAIN syntax SELECT ...;
EXPLAIN AST SELECT ...;
EXPLAIN PIPELINE SELECT ...;
EXPLAIN ESTIMATE SELECT ...;
system.parts
SELECT name, rows, bytes_on_disk FROM system.parts WHERE table = 'events';
system.query_log
SELECT query, query_duration_ms FROM system.query_log
WHERE event_time > now() - INTERVAL 1 HOUR
ORDER BY query_duration_ms DESC LIMIT 20;
Settings per query
SET max_threads = 8;
SET max_memory_usage = 10000000000;
Or SETTINGS max_threads=8 at query end.
Sample optimizations
-- Filter early
WHERE date = today()
AND user_id IN (SELECT id FROM ...)
-- Use PREWHERE for big filters
SELECT * FROM events PREWHERE event = 'x' WHERE date = today()
-- Limit columns
SELECT user_id, event FROM events -- not SELECT *
Common mistakes
- SELECT * on huge table.
- JOIN on big tables (use dict or denorm).
- Wrong ORDER BY for query patterns.
- No partition pruning.
- Inserting one row at a time.
Read this next
If you want my CH query patterns, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .