ClickHouse compression.

Default

LZ4 (fast). Configurable per column:

city String CODEC(ZSTD(3))
ts DateTime CODEC(DoubleDelta, ZSTD)
amount Float64 CODEC(Gorilla, ZSTD)
counter UInt32 CODEC(Delta, ZSTD)

Codecs

  • LZ4 / LZ4HC: default, fast.
  • ZSTD(N): better compression, slightly slower (N=3-9).
  • Delta: deltas of int sequences.
  • DoubleDelta: delta of deltas (perfect for timestamps).
  • Gorilla: floats.
  • T64: integer truncation.
  • NONE: no compression.

Chain: CODEC(DoubleDelta, ZSTD).

LowCardinality

country LowCardinality(String)
status LowCardinality(String)

Dictionary-encoded for low-cardinality strings (< millions unique). 10-100x smaller, faster queries.

When ZSTD vs LZ4

  • LZ4: fast queries.
  • ZSTD: 2-3x better compression, ~30% slower decompress.

ZSTD often worth it for cold/medium-temp data.

Inspect

SELECT
    column,
    formatReadableSize(data_compressed_bytes) AS comp,
    formatReadableSize(data_uncompressed_bytes) AS uncomp,
    round(data_uncompressed_bytes / data_compressed_bytes, 2) AS ratio
FROM system.columns
WHERE table = 'events' AND database = 'myapp'
ORDER BY data_compressed_bytes DESC;

Test codecs

CREATE TABLE test_codec (
    a UInt32 CODEC(LZ4),
    b UInt32 CODEC(Delta, ZSTD)
);

Compare ratios.

Granularity

SETTINGS index_granularity = 8192

Default 8192 rows. Lower = more index entries, finer-grained scan. Higher = less overhead.

Compression and queries

Decompression is fast but does cost. ZSTD level 9 can hurt query latency.

Common mistakes

  • LowCardinality on high-cardinality column → bloat.
  • ZSTD 22 in hot path.
  • Forgetting codec on time columns (DoubleDelta huge win).
  • Compressing IDs as strings (use UInt64).

Read this next

If you want my CH compression recipes, they’re at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .