ClickHouse compression.
Default
LZ4 (fast). Configurable per column:
city String CODEC(ZSTD(3))
ts DateTime CODEC(DoubleDelta, ZSTD)
amount Float64 CODEC(Gorilla, ZSTD)
counter UInt32 CODEC(Delta, ZSTD)
Codecs
LZ4/LZ4HC: default, fast.ZSTD(N): better compression, slightly slower (N=3-9).Delta: deltas of int sequences.DoubleDelta: delta of deltas (perfect for timestamps).Gorilla: floats.T64: integer truncation.NONE: no compression.
Chain: CODEC(DoubleDelta, ZSTD).
LowCardinality
country LowCardinality(String)
status LowCardinality(String)
Dictionary-encoded for low-cardinality strings (< millions unique). 10-100x smaller, faster queries.
When ZSTD vs LZ4
- LZ4: fast queries.
- ZSTD: 2-3x better compression, ~30% slower decompress.
ZSTD often worth it for cold/medium-temp data.
Inspect
SELECT
column,
formatReadableSize(data_compressed_bytes) AS comp,
formatReadableSize(data_uncompressed_bytes) AS uncomp,
round(data_uncompressed_bytes / data_compressed_bytes, 2) AS ratio
FROM system.columns
WHERE table = 'events' AND database = 'myapp'
ORDER BY data_compressed_bytes DESC;
Test codecs
CREATE TABLE test_codec (
a UInt32 CODEC(LZ4),
b UInt32 CODEC(Delta, ZSTD)
);
Compare ratios.
Granularity
SETTINGS index_granularity = 8192
Default 8192 rows. Lower = more index entries, finer-grained scan. Higher = less overhead.
Compression and queries
Decompression is fast but does cost. ZSTD level 9 can hurt query latency.
Common mistakes
- LowCardinality on high-cardinality column → bloat.
- ZSTD 22 in hot path.
- Forgetting codec on time columns (DoubleDelta huge win).
- Compressing IDs as strings (use UInt64).
Read this next
If you want my CH compression recipes, they’re at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .