ClickHouse time-series.
Date functions
toYYYYMM(ts)
toDate(ts)
toStartOfMinute(ts)
toStartOfHour(ts)
toStartOfDay(ts)
toStartOfMonth(ts)
toStartOfInterval(ts, INTERVAL 5 MINUTE)
toRelativeMinuteNum(ts)
dateDiff('hour', a, b)
Bucketing
SELECT
toStartOfInterval(ts, INTERVAL 1 MINUTE) AS m,
count(),
avg(value)
FROM events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY m
ORDER BY m;
TTL
CREATE TABLE events (...)
ENGINE = MergeTree()
ORDER BY (ts, user_id)
TTL ts + INTERVAL 30 DAY;
Auto-deletes old data.
TTL with rollup
TTL ts + INTERVAL 30 DAY,
ts + INTERVAL 7 DAY GROUP BY toStartOfDay(ts) SET value = sum(value)
Aggregate to daily after 7 days.
Move parts to cold storage
TTL ts + INTERVAL 7 DAY TO VOLUME 'cold'
Use storage policy with hot/cold disks.
Rollups via MV
CREATE MATERIALIZED VIEW events_hourly TO events_hourly_target AS
SELECT
toStartOfHour(ts) AS hour,
user_id,
countState() AS count_state,
sumState(value) AS sum_state
FROM events
GROUP BY hour, user_id;
Cheap queries on aggregated data.
Time-decayed
SELECT user_id, sum(score * exp(-(now() - ts) / 86400)) AS decayed
FROM events GROUP BY user_id ORDER BY decayed DESC LIMIT 10;
Window functions
SELECT
user_id, ts, value,
avg(value) OVER (PARTITION BY user_id ORDER BY ts ROWS BETWEEN 10 PRECEDING AND CURRENT ROW)
FROM events;
Rolling average.
quantiles over time
SELECT
toStartOfMinute(ts) AS m,
quantile(0.99)(latency)
FROM events GROUP BY m;
Anomaly detection (basic)
WITH (SELECT avg(value), stddevSamp(value) FROM events) AS (mean, sd)
SELECT * FROM events WHERE abs(value - mean) > 3 * sd;
Common mistakes
- Querying raw data when rollup exists.
- Wrong PARTITION BY granularity.
- No TTL.
- toDate(ts) repeatedly (cache via materialized column).
Read this next
If you want my time-series patterns, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .