ClickHouse analytics.

Window functions

SELECT
    user_id, ts, value,
    sum(value) OVER (PARTITION BY user_id ORDER BY ts) AS cum,
    row_number() OVER (PARTITION BY user_id ORDER BY ts DESC) AS rn
FROM events;

Retention

SELECT
    retention(
        toDate(ts) = today() - 7,
        toDate(ts) BETWEEN today() - 6 AND today()
    ) AS r
FROM events
GROUP BY user_id;

Funnel

SELECT
    sumIf(level = 1, 1) AS s1,
    sumIf(level = 2, 1) AS s2,
    sumIf(level = 3, 1) AS s3
FROM (
    SELECT user_id, max(windowFunnel(3600)(ts,
        event = 'view',
        event = 'click',
        event = 'buy'
    )) AS level
    FROM events GROUP BY user_id
);

Sequences

sequenceCount('(?1)(?t<60)(?2)')(ts, event = 'a', event = 'b')

Counts a→b within 60s.

Top by group

SELECT user_id, event, count() AS c
FROM events
GROUP BY user_id, event
ORDER BY user_id, c DESC
LIMIT 5 BY user_id;

Top 5 events per user.

Sampling for fast estimates

SELECT count() * 10 FROM events SAMPLE 0.1;

10x scaled count from 10% sample.

Cohort

SELECT
    toStartOfMonth(first_seen) AS cohort,
    toStartOfMonth(ts) AS month,
    count(DISTINCT user_id)
FROM events_with_cohort
GROUP BY cohort, month
ORDER BY cohort, month;

Percentiles

quantile(0.5)(latency)
quantiles(0.5, 0.9, 0.99)(latency)
quantileTDigest(0.99)(latency)

groupArray

SELECT user_id, groupArray(event) FROM events GROUP BY user_id;

Common mistakes

  • Window without ORDER BY → undefined.
  • groupArray with no LIMIT → memory bomb.
  • Heavy window functions on raw data → use rollups.

Read this next

If you want my analytics recipes, they’re at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .