ClickHouse analytics.
Window functions
SELECT
user_id, ts, value,
sum(value) OVER (PARTITION BY user_id ORDER BY ts) AS cum,
row_number() OVER (PARTITION BY user_id ORDER BY ts DESC) AS rn
FROM events;
Retention
SELECT
retention(
toDate(ts) = today() - 7,
toDate(ts) BETWEEN today() - 6 AND today()
) AS r
FROM events
GROUP BY user_id;
Funnel
SELECT
sumIf(level = 1, 1) AS s1,
sumIf(level = 2, 1) AS s2,
sumIf(level = 3, 1) AS s3
FROM (
SELECT user_id, max(windowFunnel(3600)(ts,
event = 'view',
event = 'click',
event = 'buy'
)) AS level
FROM events GROUP BY user_id
);
Sequences
sequenceCount('(?1)(?t<60)(?2)')(ts, event = 'a', event = 'b')
Counts a→b within 60s.
Top by group
SELECT user_id, event, count() AS c
FROM events
GROUP BY user_id, event
ORDER BY user_id, c DESC
LIMIT 5 BY user_id;
Top 5 events per user.
Sampling for fast estimates
SELECT count() * 10 FROM events SAMPLE 0.1;
10x scaled count from 10% sample.
Cohort
SELECT
toStartOfMonth(first_seen) AS cohort,
toStartOfMonth(ts) AS month,
count(DISTINCT user_id)
FROM events_with_cohort
GROUP BY cohort, month
ORDER BY cohort, month;
Percentiles
quantile(0.5)(latency)
quantiles(0.5, 0.9, 0.99)(latency)
quantileTDigest(0.99)(latency)
groupArray
SELECT user_id, groupArray(event) FROM events GROUP BY user_id;
Common mistakes
- Window without ORDER BY → undefined.
- groupArray with no LIMIT → memory bomb.
- Heavy window functions on raw data → use rollups.
Read this next
If you want my analytics recipes, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .