ClickHouse joins.
Types
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
CROSS JOIN
LEFT SEMI JOIN
LEFT ANTI JOIN
ANY LEFT JOIN -- first match
Basic
SELECT e.user_id, u.name, count()
FROM events e
INNER JOIN users u ON e.user_id = u.id
GROUP BY e.user_id, u.name;
CH joins less optimized than Postgres. Right table loaded into memory.
Dict alternative
SELECT dictGet('users_dict', 'name', user_id) AS name, count()
FROM events GROUP BY name;
Faster for small lookup tables.
IN
SELECT * FROM events WHERE user_id IN (SELECT id FROM users WHERE active);
For “filter by other table.” Often faster than JOIN.
GLOBAL IN / JOIN
For Distributed tables:
SELECT * FROM events_dist WHERE user_id GLOBAL IN (SELECT id FROM users_dist WHERE active);
Broadcasts inner result to all shards.
ASOF JOIN
SELECT e.*, p.price
FROM events e
ASOF LEFT JOIN prices p
ON e.symbol = p.symbol AND e.ts >= p.ts;
For time-series joins.
Subquery
SELECT * FROM (
SELECT user_id, count() AS c FROM events GROUP BY user_id
) WHERE c > 100;
CTEs (WITH) too.
Memory limits
SETTINGS join_algorithm = 'hash', max_bytes_in_join = 1000000000
For huge joins: partial_merge, direct.
Common mistakes
- INNER JOIN big × big in CH (slow). Pre-aggregate or denormalize.
- Forgetting GLOBAL on distributed.
- LEFT JOIN where dict would work.
- Subqueries without proper filters.
Read this next
If you want my CH join patterns, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .