ClickHouse ops.

system tables

SELECT * FROM system.tables LIMIT 5;
SELECT * FROM system.parts WHERE table = 'events';
SELECT * FROM system.metrics;
SELECT * FROM system.events;
SELECT * FROM system.processes;
SELECT * FROM system.query_log WHERE event_time > now() - INTERVAL 1 HOUR ORDER BY query_duration_ms DESC LIMIT 10;
SELECT * FROM system.merges;
SELECT * FROM system.mutations;

Top slow queries

SELECT
    query, query_duration_ms, read_rows, memory_usage
FROM system.query_log
WHERE event_time > now() - INTERVAL 1 DAY
ORDER BY query_duration_ms DESC LIMIT 10;

Active queries

SELECT query_id, query, elapsed, memory_usage
FROM system.processes
ORDER BY elapsed DESC;

KILL QUERY WHERE query_id = '...';

Disk usage

SELECT
    database, table,
    formatReadableSize(sum(bytes_on_disk)) AS size,
    sum(rows) AS rows
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;

Replication health

SELECT table, queue_size, absolute_delay
FROM system.replicas;

Prometheus exporter

<prometheus>
  <endpoint>/metrics</endpoint>
  <port>9363</port>
</prometheus>

Metrics at :9363/metrics.

Grafana dashboards

ClickHouse official: ID 14192, 11906.

Alerts

  • replica delay > 60s
  • disk > 80%
  • merge queue > 100
  • query failure rate

Optimize

OPTIMIZE TABLE events FINAL;

Force merge. Heavy.

Detach / attach

ALTER TABLE events DETACH PARTITION 202601;
ALTER TABLE events ATTACH PARTITION 202601;

Common mistakes

  • No query_log → can’t analyze post-mortem.
  • OPTIMIZE FINAL on huge table (long lock).
  • Mutations queue growing → slow.
  • Disk full → CH stops accepting writes.

Read this next

If you want my CH ops playbook, it’s at rajpoot.dev .


Building something AI-, backend-, or data-heavy and want a second pair of eyes? I do consulting and freelance work — see my projects and ways to reach me at rajpoot.dev .