ClickHouse clusters slow down over time as data accumulates, queries become more complex, and merge operations consume increasing amounts of CPU and disk I/O. What starts as sub-second query performance can gradually degrade to multi-second response times. But, proper optimization can deliver 10× to 20× faster query performance as seen in production deployments.
This guide walks through nine proven optimization patterns for ClickHouse clusters, from benchmarking current performance to automating maintenance tasks, that address the most common bottlenecks in production ClickHouse clusters. Each step includes practical examples and configuration changes you can apply to improve query latency, reduce resource usage, and maintain consistent performance as your data grows.
Step 1: Benchmark your current cluster performance
Before changing anything in your ClickHouse cluster, you want to measure what's actually happening right now. This means looking at schema design, data management, query execution, and infrastructure to understand where performance problems exist.
Use system.query_log
to find slow queries
The system.query_log
table records every query that runs on your cluster. It captures execution time, memory usage, and how many rows each query processes.
SELECT
query,
query_duration_ms,
read_rows,
memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_duration_ms > 1000
ORDER BY query_duration_ms DESC
LIMIT 20;
This query shows your 20 slowest queries that took longer than one second. Every slow query consumes compute resources on your cluster. Start with the slowest ones first, since fixing them typically gives you the biggest performance gains.
Capture baseline latency with clickhouse-client --time
The --time
flag shows you exactly how long each query takes. Run your most common queries with this flag to record baseline measurements.
echo "run_number,query_time_seconds,count" > query_results.csv
for i in {1..25}; do
# Run query and capture timing info
start_time=$(date +%s.%N)
result=$(clickhouse-client --query "SELECT count() FROM events WHERE event_date = today();")
end_time=$(date +%s.%N)
query_time=$(echo "$end_time - $start_time" | bc)
# Append results to CSV
echo "$i,$query_time,$result" >> query_results.csv
done
Always document baselines before you make changes. Later, you'll compare new measurements against them to verify that your optimizations actually worked.
Visualize metrics in Grafana or Tinybird observability
Dashboards help you spot performance problems before they affect your users. Grafana connects to ClickHouse through the official data source plugin, letting you visualize query latency, memory usage, and disk I/O over time.
Tinybird includes built-in observability for managed ClickHouse deployments, so you don't necessarily have to set up and maintain your own Grafana instance (though you can still connect Tinybird to Grafana for more thorough observability). The platform tracks query performance, ingestion rates, and resource utilization automatically.
Step 2: Optimize data ingestion and merges
Merge overhead causes more performance problems in ClickHouse than anything else. ClickHouse stores data in immutable parts that get periodically merged into larger parts, and too much merging eats up CPU and disk I/O that could be used for queries.
1. Batch inserts to 100k+ rows
Small, frequent inserts create many small parts that require constant merging. Larger batches reduce the number of parts and decrease how often ClickHouse has to merge them.
-- Instead of this
INSERT INTO events VALUES (1, 'click', now());
-- Do this
INSERT INTO events VALUES
(1, 'click', now()),
(2, 'view', now()),
-- ... 99,998 more rows
(100000, 'purchase', now());
Somewhere between 100,000 and 1,000,000 rows per batch works well for most workloads. You can check the system.parts
table to verify that your batch size keeps the part count reasonable.
2. Keep parts wide to reduce merge overhead
ClickHouse parts can be "wide" (all columns stored together) or "narrow" (columns stored separately). Wide parts mean fewer files on disk and faster merges for tables with many columns.
Two settings control when ClickHouse creates wide parts: min_bytes_for_wide_part
and min_rows_for_wide_part
. Set these low enough that most parts end up wide:
CREATE TABLE events
(
event_id UInt64,
event_type String,
timestamp DateTime
)
ENGINE = MergeTree()
ORDER BY (event_type, timestamp)
SETTINGS min_bytes_for_wide_part = 10485760, min_rows_for_wide_part = 100000;
Parts larger than 10 MB or 100,000 rows become wide with this configuration. Adjust based on your typical insert batch size.
3. Enable async inserts for high-throughput streams
Async inserts buffer incoming data in memory before writing to disk. This reduces the number of parts created when you have high-frequency inserts, achieving 6× fewer parts with proper buffer flush intervals.
SET async_insert = 1;
SET wait_for_async_insert = 0;
SET async_insert_max_data_size = 10485760;
INSERT INTO events VALUES (1, 'click', now());
The async_insert_max_data_size
setting controls how much data ClickHouse buffers before flushing. 10 MB works well for most streaming workloads, though you might go higher for very high-throughput scenarios.
Step 3: Design primary keys and order by clauses
Primary keys in ClickHouse determine how data gets sorted on disk and which indexes get created. A well-designed primary key speeds up queries by letting ClickHouse skip reading data that doesn't match your filters.
Align order by with common filters
The ORDER BY
clause in your table definition controls the physical sort order of data on disk. Match this order to the columns you filter on most often.
-- If you frequently filter by user_id and then timestamp
CREATE TABLE user_events
(
user_id UInt64,
timestamp DateTime,
event_type String
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp);
When you query WHERE user_id = 123
, ClickHouse can skip reading data for all other users. The sort order lets the database use binary search to find relevant data blocks quickly.
Choose low-cardinality keys for better index selectivity
Low-cardinality columns (columns with relatively few distinct values) work well as the first columns in your primary key. They let ClickHouse eliminate large portions of the dataset early in query execution.
- Good primary key columns:
country_code
(200 values),event_type
(10-20 values),status
(3-5 values) - Poor primary key columns: user_id (millions of values), timestamp (unique per event), random_id (unique per row)
As much as possible, put low-cardinality columns first, then high-cardinality columns: ORDER BY (country_code, event_type, user_id, timestamp)
.
That said, you should still prioritize primary key selection based on your most common filter patterns. If you're most frequently filtering by timestamp or user_id, you should still set those as your primary key.
Add skip or bloom indexes for sparse columns
Secondary indexes help with columns that aren't in the primary key but get filtered frequently. ClickHouse supports skip indexes for numeric ranges and bloom filters for set membership tests.
CREATE TABLE events
(
event_id UInt64,
user_id UInt64,
event_type String,
url String
)
ENGINE = MergeTree()
ORDER BY (event_type, event_id);
ALTER TABLE events ADD INDEX url_bloom url TYPE bloom_filter GRANULARITY 4;
Bloom filter indexes work well for WHERE url = 'https://example.com'
queries when url
isn't in the primary key. They use minimal disk space but can significantly reduce the data ClickHouse reads.
Step 4: Eliminate common query anti-patterns
How you write SQL affects performance as much as schema design. Small changes to query structure can dramatically reduce execution time and resource usage.
Avoid SELECT *
and over-fetching
Reading unnecessary columns wastes memory and network bandwidth. ClickHouse's columnar storage means each column you select requires separate disk reads.
-- Bad: reads all columns
SELECT * FROM events WHERE event_date = today();
-- Good: only reads needed columns
SELECT event_type, count() FROM events WHERE event_date = today() GROUP BY event_type;
The performance difference gets significant with wide tables. A table with 50 columns where you only need 3 will read 16× more data with SELECT *
.
Project only needed columns early
Filter columns as early as possible in your query execution. This reduces the amount of data processed in later operations like joins and aggregations.
-- Less efficient
SELECT e.event_type, u.country
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.event_date = today();
-- More efficient
SELECT e.event_type, u.country
FROM (SELECT event_type, user_id FROM events WHERE event_date = today()) e
JOIN (SELECT user_id, country FROM users) u ON e.user_id = u.user_id;
The second query processes less data through the join operation, which reduces memory usage and execution time.
Push aggregations to materialized views
Pre-computing common aggregations eliminates repeated calculation overhead for frequently run queries. Materialized views in ClickHouse automatically maintain aggregated results as new data arrives.
CREATE MATERIALIZED VIEW events_hourly
ENGINE = SummingMergeTree()
ORDER BY (event_type, hour)
AS SELECT
event_type,
toStartOfHour(timestamp) AS hour,
count() AS event_count
FROM events
GROUP BY event_type, hour;
Now you can query the materialized view instead of the raw table, which is much faster for dashboards and monitoring systems that repeatedly query the same aggregations.
Materialized views also provide the added benefit of allowing you to define different primary keys for different query patterns, so you can speed up more queries.
Step 5: Accelerate joins and subqueries
Join operations have improved in distributed ClickHouse clusters but they still require careful optimization because they can trigger large data transfers between nodes. The right JOIN strategy depends on your table sizes and query patterns.
Use distributed tables for large joins
Distributed tables spread JOIN operations across cluster nodes, allowing parallel processing. First, create local tables on each shard, then create a distributed table that references them.
-- On each shard
CREATE TABLE events_local ON CLUSTER my_cluster
(
event_id UInt64,
user_id UInt64,
timestamp DateTime
)
ENGINE = MergeTree()
ORDER BY (user_id, timestamp);
-- Create distributed table
CREATE TABLE events ON CLUSTER my_cluster
AS events_local
ENGINE = Distributed(my_cluster, default, events_local, rand());
Queries against the distributed table automatically distribute work across all cluster nodes. ClickHouse handles data shuffling and result aggregation for you.
Filter data before the join
Reducing dataset size before joining improves performance more than any other join optimization. Apply WHERE clauses to filter rows before the JOIN processes them.
-- Less efficient: joins full tables then filters
SELECT e.event_type, u.country, count()
FROM events e
JOIN users u ON e.user_id = u.user_id
WHERE e.event_date = today()
GROUP BY e.event_type, u.country;
-- More efficient: filters before joining
SELECT e.event_type, u.country, count()
FROM (SELECT event_type, user_id FROM events WHERE event_date = today()) e
JOIN users u ON e.user_id = u.user_id
GROUP BY e.event_type, u.country;
The second query processes significantly less data through the JOIN, especially when the date filter eliminates most events.
Pick the right JOIN algorithm and join order
ClickHouse supports multiple JOIN algorithms optimized for different scenarios. The join_algorithm
setting controls which one ClickHouse uses.
JOIN Algorithm | Best For | Memory Usage |
---|---|---|
hash | Small right table (fits in memory) | High |
partial_merge | Large sorted tables | Medium |
direct | Dictionary-backed tables | Low |
parallel_hash | Large tables with many threads | Very High |
Set the algorithm explicitly for predictable performance:
SET join_algorithm = 'hash';
Always put the smaller table on the right side when using hash joins, since ClickHouse builds the hash table from the right table.
Step 6: Tune system settings for throughput
ClickHouse exposes hundreds of configuration settings that control query execution, memory allocation, and thread scheduling. A few key settings have the largest impact on cluster throughput.
Adjust max_threads
and max_memory_usage
The max_threads
setting controls how many CPU threads ClickHouse uses for a single query. Higher values increase parallelism but can cause resource contention when multiple queries run at once.
SET max_threads = 16;
SET max_memory_usage = 10000000000;
Start with max_threads
set to half your CPU core count. If you typically run 4 queries simultaneously on a 32-core machine, set max_threads = 8
to avoid oversubscription.
Size background pools to CPU cores
Background merge and fetch operations compete with query execution for CPU resources. The background_pool_size
and background_fetches_pool_size
settings control how many background threads ClickHouse uses.
<background_pool_size>8</background_pool_size>
<background_fetches_pool_size>4</background_fetches_pool_size>
Set background_pool_size
to 20–30% of your CPU core count. For a 32-core machine, 8 background threads handle merges without starving query threads.
Increase parts_to_throw_insert
to prevent backpressure
ClickHouse rejects inserts when a table has too many parts, preventing merge queues from growing unbounded. The parts_to_throw_insert
setting controls this threshold.
ALTER TABLE events MODIFY SETTING parts_to_throw_insert = 600;
The default value of 300 works well for moderate ingestion rates. Increase to 500–600 for high-throughput streaming workloads, but monitor merge queue depth to verify that background merges can keep up.
Step 7: Right-size hardware and network
Infrastructure choices directly affect cluster performance, with modern ARM instances providing 25% performance improvement in production workloads. ClickHouse benefits from fast storage, high network bandwidth, and sufficient memory for your working dataset.
NVMe SSDs or tiered storage for I/O
ClickHouse query performance depends heavily on disk I/O throughput. NVMe SSDs provide 3–5× better random read performance than SATA SSDs, which translates directly to faster query execution.
- NVMe SSDs: Best for frequently queried hot data, providing 500k+ IOPS
- SATA SSDs: Acceptable for warm data with moderate query frequency
- HDD storage: Only suitable for cold archive data with TTL policies
Tiered storage lets you keep hot data on fast NVMe while moving older data to cheaper SATA SSDs or object storage. Configure this with TTL rules that automatically migrate data based on age.
Many managed ClickHouse services like Tinybird support tiered storage, but you'll have to implement this yourself for self-hosting.
10 GbE+ links for replication and queries
Network bandwidth becomes a bottleneck in distributed queries that shuffle data between nodes. Replication also requires sufficient bandwidth to keep replicas synchronized.
A 1 GbE network link limits data transfer to about 100 MB/s after protocol overhead. For a cluster processing 1 TB queries, network transfers alone take nearly 3 hours. Upgrading to 10 GbE or 25 GbE reduces this to minutes.
Reserve RAM equal to hot working set
ClickHouse uses available RAM for caching frequently accessed data and intermediate query results. The operating system's page cache automatically caches disk reads, so more RAM means fewer disk operations.
Calculate your working set size by summing the size of data you query regularly:
SELECT
table,
formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.parts
WHERE active
AND modification_time > now() - INTERVAL 7 DAY
GROUP BY table;
If your hot working set is 200 GB, provision at least 256 GB of RAM to accommodate the data plus query execution overhead.
Step 8: Automate maintenance with OPTIMIZE and TTL
Regular maintenance prevents performance degradation over time. The OPTIMIZE
command triggers manual merges, while TTL policies automatically manage data lifecycle.
Schedule OPTIMIZE
during low traffic windows
The OPTIMIZE
command forces ClickHouse to merge all parts in a table or partition immediately. This operation is resource-intensive but can improve query performance by reducing the number of parts.
OPTIMIZE TABLE events PARTITION '2024-01-15' FINAL;
Run OPTIMIZE
during low-traffic periods to avoid competing with user queries for resources. The FINAL
keyword ensures all parts get merged completely, though this makes the operation take longer.
Use TTL rules to purge or migrate cold data
TTL policies automatically delete or move data based on age, keeping your active dataset manageable. Define TTL rules at table creation or add them later with ALTER TABLE
.
CREATE TABLE events
(
event_id UInt64,
timestamp DateTime,
event_type String
)
ENGINE = MergeTree()
ORDER BY timestamp
TTL timestamp + INTERVAL 90 DAY;
TTL rules run during background merges, so they don't require manual intervention. Monitor the system.parts
table to verify that old data gets removed as expected.
Monitor merge queues and part counts
Too many parts indicate that merges aren't keeping up with inserts. Query system.parts
to track part counts per table:
SELECT
table,
count() AS part_count
FROM system.parts
WHERE active
GROUP BY table
ORDER BY part_count DESC;
Healthy tables typically have fewer than 100 active parts. If you see thousands of parts, increase background_pool_size
or reduce insert frequency by batching more rows per insert, as ClickHouse will reject inserts above 300 active parts per partition.
Step 9: Monitor, profile, and iterate
Optimization is an ongoing process, not a one-time task. Establish monitoring and profiling workflows to catch performance regressions early and validate that optimizations actually improve performance.
Capture query traces with trace_log
The system.trace_log
table records detailed execution traces for queries, showing exactly where time gets spent during execution. Enable trace logging in your server configuration:
<trace_log>
<database>system</database>
<table>trace_log</table>
</trace_log>
Query the trace log to identify performance bottlenecks and see which functions consume the most CPU time during query execution.
Compare new metrics against baseline
After implementing optimizations, re-run your baseline queries from Step 1 to measure improvement. Document the performance delta to verify that changes actually helped.
clickhouse-client --time --query "SELECT count() FROM events WHERE event_date = today();"
Keep a log of optimization changes and their measured impact. This historical data helps you understand which optimizations provide the most value for your workload.
Add regression checks to CI/CD
Integrate performance testing into your deployment pipeline to catch regressions before they reach production. Run representative queries against test data and fail the build if performance degrades.
BASELINE_TIME=1000
ACTUAL_TIME=$(clickhouse-client --time --query "SELECT count() FROM test_events;" 2>&1 | grep "Time:" | awk '{print $2}')
if [ $ACTUAL_TIME -gt $BASELINE_TIME ]; then
echo "Performance regression detected"
exit 1
fi
This automated testing prevents schema changes or query modifications from accidentally degrading performance.
Tinybird handles cluster optimization for you
Managing ClickHouse cluster optimization is no small task. It requires specialized expertise in storage engines, query planning, and distributed systems. These 9 strategies will certainly help, but they're not exhaustive and they require time to properly implement (and continue).
Tinybird eliminates this complexity by providing a fully managed ClickHouse service that automatically handles performance tuning, scaling, and maintenance.
The platform continuously monitors query performance and automatically adjusts cluster configuration based on your workload patterns. Background merge scheduling, memory allocation, and thread pool sizing happen transparently, without requiring manual intervention or deep ClickHouse knowledge.
Tinybird's architecture separates compute from storage and uses zero-copy replication, allowing independent scaling of query processing and data storage. When query load increases, the platform automatically provisions additional compute resources without data migration or downtime.
The service includes built-in observability that tracks query latency, resource utilization, and data pipeline health. Developers can identify slow queries and optimization opportunities without setting up external monitoring infrastructure.
Tinybird automatically handles cluster upgrades, checking for regressions across cluster query patterns to ensure customers aren't negatively impacted by upgrades.
For teams building real-time analytics features, Tinybird's managed approach means you can focus on application logic rather than database operations. Sign up for a free Tinybird account to get started with an optimized ClickHouse in minutes instead of months.
FAQs about optimizing a ClickHouse cluster
How often should I run the optimize command on my ClickHouse cluster?
Run OPTIMIZE
during low-traffic periods, typically daily or weekly depending on data volume and insert patterns. Tables with frequent inserts benefit from more frequent optimization, while append-only tables with large batch inserts can run OPTIMIZE less often.
What insert batch size works best for real-time workloads in ClickHouse?
Batch sizes between 100,000 and 1,000,000 rows balance ingestion latency with merge efficiency for most real-time applications. Smaller batches (10,000–50,000 rows) work for very low-latency requirements but increase merge overhead.
Which ClickHouse cluster settings are safe to change in production environments?
Memory limits (max_memory_usage
), thread pools (max_threads
, background_pool_size
), and async insert settings can be adjusted gradually in production. Start with conservative values and increase incrementally while monitoring query performance.
How can I tell if my ClickHouse joins are spilling to disk?
Monitor system.query_log
for queries with high memory_usage
values that approach your max_memory_usage
limit. Check temporary disk space utilization during join operations using system.disks
and system.processes
.
When should I switch from self-hosting to a managed ClickHouse service?
Consider managed services when cluster maintenance overhead exceeds development time or when scaling requirements become unpredictable. If you're spending more than a few hours per week on ClickHouse operations, or if you need to scale quickly for product launches or seasonal traffic, managed services eliminate operational burden.