ClickHouse basics.

Why ClickHouse

Columnar OLAP DB. Aggregations on billions of rows in milliseconds. Great for analytics, logs, metrics.

Install / connect

docker run -d --name ch -p 8123:8123 -p 9000:9000 clickhouse/clickhouse-server

clickhouse-client                      # native TCP
clickhouse-client --query "SELECT 1"
curl 'http://localhost:8123/?query=SELECT+1'    # HTTP

CLI

clickhouse-client --host=localhost --user=default
clickhouse-client --multiquery --query "..."

In client: \q quit, multiline queries end with ;.

CREATE TABLE

CREATE TABLE events (
    ts DateTime,
    user_id UInt32,
    event String,
    properties String
) ENGINE = MergeTree()
ORDER BY (user_id, ts)
PARTITION BY toYYYYMM(ts);

MergeTree = main engine. ORDER BY = primary key (sort).

Insert

INSERT INTO events VALUES (now(), 1, 'click', '{}');
INSERT INTO events SELECT ... FROM src;
INSERT INTO events FORMAT CSV ...

Bulk inserts much faster than per-row.

Query

SELECT * FROM events LIMIT 10;
SELECT count() FROM events;
SELECT event, count() FROM events GROUP BY event ORDER BY 2 DESC;
SELECT toDate(ts) AS day, count() FROM events GROUP BY day ORDER BY day;

Common types

UInt8/16/32/64 Int8/16/32/64
Float32/64 Decimal(P,S)
String FixedString(N)
Date Date32 DateTime DateTime64(3) IPv4 IPv6 UUID
LowCardinality(String)     -- for low-cardinality strings (fast)
Array(T)
Nullable(T)
Tuple(T1, T2)
Map(K, V)
Enum8('a'=1, 'b'=2)

LowCardinality

country LowCardinality(String)

Dictionary-encoded. Use for columns with < millions of unique values.

DESCRIBE / SHOW

SHOW DATABASES;
SHOW TABLES;
DESCRIBE TABLE events;
SHOW CREATE TABLE events;

DROP / ALTER

DROP TABLE events;
ALTER TABLE events ADD COLUMN region String DEFAULT '';
ALTER TABLE events MODIFY COLUMN x UInt64;
ALTER TABLE events RENAME COLUMN old TO new;

INSERT performance

  • Batch inserts (100k+ rows per insert).
  • Don’t insert one row at a time.
  • Use Buffer engine or insert via Kafka/etc to batch.

Sample queries

-- Top events
SELECT event, count() AS c
FROM events
WHERE ts >= now() - INTERVAL 1 HOUR
GROUP BY event ORDER BY c DESC LIMIT 10;

-- Time-series
SELECT toStartOfMinute(ts) AS m, count()
FROM events GROUP BY m ORDER BY m;

Common mistakes

  • Per-row INSERT → slow.
  • Wrong ORDER BY → poor compression.
  • Using String for low-cardinality (use LowCardinality).
  • Nullable when not needed (overhead).
  • Tiny PARTITION BY (one per day on big table) → too many parts.

Read this next

If you want my CH starter, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .