ClickHouse joins.

Types

INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
CROSS JOIN
LEFT SEMI JOIN
LEFT ANTI JOIN
ANY LEFT JOIN              -- first match

Basic

SELECT e.user_id, u.name, count()
FROM events e
INNER JOIN users u ON e.user_id = u.id
GROUP BY e.user_id, u.name;

CH joins less optimized than Postgres. Right table loaded into memory.

Dict alternative

SELECT dictGet('users_dict', 'name', user_id) AS name, count()
FROM events GROUP BY name;

Faster for small lookup tables.

IN

SELECT * FROM events WHERE user_id IN (SELECT id FROM users WHERE active);

For “filter by other table.” Often faster than JOIN.

GLOBAL IN / JOIN

For Distributed tables:

SELECT * FROM events_dist WHERE user_id GLOBAL IN (SELECT id FROM users_dist WHERE active);

Broadcasts inner result to all shards.

ASOF JOIN

SELECT e.*, p.price
FROM events e
ASOF LEFT JOIN prices p
ON e.symbol = p.symbol AND e.ts >= p.ts;

For time-series joins.

Subquery

SELECT * FROM (
    SELECT user_id, count() AS c FROM events GROUP BY user_id
) WHERE c > 100;

CTEs (WITH) too.

Memory limits

SETTINGS join_algorithm = 'hash', max_bytes_in_join = 1000000000

For huge joins: partial_merge, direct.

Common mistakes

  • INNER JOIN big × big in CH (slow). Pre-aggregate or denormalize.
  • Forgetting GLOBAL on distributed.
  • LEFT JOIN where dict would work.
  • Subqueries without proper filters.

Read this next

If you want my CH join patterns, they’re at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .