ClickHouse ingest.
Batch size
Recommended: 1k-100k rows per INSERT. Per-row inserts are catastrophic.
INSERT formats
clickhouse-client --query "INSERT INTO events FORMAT JSONEachRow" < data.json
clickhouse-client --query "INSERT INTO events FORMAT CSV" < data.csv
clickhouse-client --query "INSERT INTO events FORMAT Parquet" < data.parquet
Buffer table
CREATE TABLE events_buffer AS events
ENGINE = Buffer(db, events, 16, 10, 100, 10000, 1000000, 10000000, 100000000);
Flushes to events when thresholds hit. Use for high-frequency small inserts.
Async inserts
INSERT INTO events SETTINGS async_insert = 1, wait_for_async_insert = 0 VALUES (...);
CH batches at server. Per-row clients OK.
Kafka engine
CREATE TABLE kafka_events (
ts DateTime, user_id UInt32, event String
)
ENGINE = Kafka()
SETTINGS kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'events',
kafka_group_name = 'ch',
kafka_format = 'JSONEachRow';
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT * FROM kafka_events;
Streaming from S3
INSERT INTO events
SELECT * FROM s3('https://bucket.../*.parquet', 'Parquet');
clickhouse-bulk
Sidecar that buffers inserts:
docker run -d clickhouse-bulk -addr clickhouse:8123 ...
App POSTs JSON; bulk batches.
clickhouse-keeper for replication
For ReplicatedMergeTree, requires ZK or Keeper.
ON DUPLICATE / dedup
ENGINE = ReplacingMergeTree(version) ORDER BY id
Insert dupes; merge dedupes by id keeping max version.
ALTER TABLE INSERT FROM
INSERT INTO target SELECT * FROM source WHERE ...;
Schema migrations
ALTER TABLE events ADD COLUMN region String DEFAULT '';
ALTER TABLE events DROP COLUMN x;
ALTER TABLE events MODIFY COLUMN x UInt64;
ON CLUSTER to propagate.
Mutations
ALTER TABLE events UPDATE name = 'X' WHERE user_id = 1;
ALTER TABLE events DELETE WHERE date < '2025-01-01';
Async, slow. Avoid frequent. Prefer ReplacingMergeTree upserts.
Common mistakes
- Per-row INSERT.
- Inserting via Distributed at high rate (use buffer or direct local).
- Kafka MV without backfill.
- Frequent ALTER UPDATE.
Read this next
If you want my ingest pipeline, it’s at rajpoot.dev .
Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .