Streaming analytics systems process millions of events per second, but most databases struggle to ingest data fast enough while still returning query results in milliseconds. ClickHouse is built differently: its columnar storage, vectorized execution, and merge tree architecture can handle both high-velocity writes and sub-second analytical queries on billions of rows.
This guide covers how to optimize ClickHouse for high-throughput streaming workloads, from schema design and ingestion tuning to query optimization and infrastructure decisions. You'll learn the specific settings, patterns, and tradeoffs that determine whether your system handles thousands or millions of events per second.
Why ClickHouse excels at high-throughput streaming analytics
ClickHouse is a columnar database designed for analytical processing, which makes it particularly well-suited for streaming analytics workloads. The database stores data by column rather than by row, which means queries that aggregate or filter specific fields can skip reading unnecessary data entirely.
This columnar approach delivers compression ratios of 10x or more compared to row-oriented databases. When you're ingesting millions of events per second, that compression directly reduces both storage costs and the amount of data that queries need to scan.
ClickHouse uses vectorized query execution, processing data in batches rather than one row at a time. Modern CPUs can apply the same operation to multiple values simultaneously through SIMD instructions, which is why ClickHouse queries can return results in under 100 milliseconds even when scanning billions of rows.
The database scales both vertically and horizontally. You can add more CPU cores and memory to a single server, or you can distribute data across multiple shards with replication for fault tolerance.
Core architecture for real-time ingestion and queries
A typical streaming analytics setup with ClickHouse involves four main components that work together. Each component handles a specific part of the data flow, from ingestion to serving queries.
1. Streaming sources to buffer tables
Data arrives from sources like Apache Kafka, webhooks, or HTTP APIs. ClickHouse can consume directly from Kafka using the Kafka table engine, which creates a consumer that reads messages and writes them to a target table.
Buffer tables accumulate incoming data in memory before flushing to disk. This reduces the number of small data parts created during high-velocity ingestion, which otherwise slows down background merges and queries.
You can also batch inserts on the client side, which gives you more control over batch sizes and error handling. The tradeoff is that you're responsible for managing consumer logic and handling failures.
2. Materialized views for roll-ups
Materialized views in ClickHouse automatically transform or aggregate data as it's written to a source table. This pre-computation means queries can read from smaller, pre-aggregated tables instead of scanning raw events. Learn how to create materialized views in ClickHouse with complete syntax examples.
Common patterns include hourly summaries, top-N calculations for most active users, and approximate cardinality using HyperLogLog. Each materialized view writes to its own MergeTree table that you can query independently.
The tradeoff is that backfilling becomes more complex. If you need to recalculate historical aggregates, you'll need to rebuild the materialized view or manually insert corrected data.
3. Denormalised analytics tables
ClickHouse performs best when related data lives in the same table rather than split across multiple tables with joins. This means duplicating user attributes into each event row instead of storing them separately.
For example, instead of joining an events table with a users table, you might embed user_name, user_tier, and user_region directly into each event. This increases storage slightly but eliminates expensive join operations.
The tradeoff is write amplification. When a user attribute changes, you may need to update many rows. For streaming analytics where reads vastly outnumber writes, this tradeoff usually makes sense.
4. Parameterised API endpoints
Exposing ClickHouse queries as REST APIs lets applications fetch analytics without writing SQL directly. Parameterized queries accept filters, limits, and other options through URL parameters while preventing SQL injection.
For multi-tenant applications, row-level security filters data based on the authenticated user or API token. This ensures each tenant sees only their own data when querying shared tables.
Caching API responses reduces query load for frequently accessed data. Rate limiting protects your cluster from runaway queries.
Designing tables for million-row-per-second inserts
Schema design directly affects how much data ClickHouse can ingest per second and how fast queries run. The right table engine, sorting key, and partitioning strategy can be the difference between handling millions of rows per second versus struggling with thousands.
1. Choose the right engine and order-by key
The MergeTree family of engines is designed for high-throughput inserts and fast analytical queries. ReplicatedMergeTree adds replication across servers for fault tolerance. Specialized variants like SummingMergeTree and AggregatingMergeTree pre-aggregate data during background merges.
The ORDER BY
clause determines how data is sorted on disk. For streaming workloads, common patterns include:
- Timestamp first: Orders data chronologically, which works well for time-range queries
- Device or user ID first: Groups related events together, useful for session analysis
- Tuple of hot filters: Combines frequently filtered columns to maximize index effectiveness
The primary key in ClickHouse is a prefix of the ORDER BY
key used for sparse indexing. Unlike traditional databases, ClickHouse doesn't enforce uniqueness. Instead, it creates granules (blocks of rows) and stores min/max values for each granule, letting queries skip entire granules that don't match filters.
2. Partition by time and cardinality
Partitioning splits a table into separate directories based on a partition key, typically a time-based column like date or hour. This makes dropping old data fast and can speed up queries that filter by the partition key.
Too many partitions hurt performance though. Each partition creates separate files on disk, and ClickHouse tracks metadata for all of them. A good guideline is keeping partitions under a few thousand. For most streaming workloads, daily or hourly partitions work well.
Late-arriving data can still be inserted after a partition is finalized, but it may trigger additional merges. If late arrivals are common, consider using longer partition intervals.
3. Apply codecs and compression wisely
ClickHouse supports several compression codecs that trade CPU time for storage savings. LZ4 is the default and balances speed with compression. ZSTD provides better compression ratios but uses more CPU during both compression and decompression.
Per-column codecs can optimize specific data types:
- Delta and DoubleDelta: Work well for timestamps and monotonically increasing values
- Gorilla: Designed for floating-point time series data
- LZ4 vs ZSTD: LZ4 for hot data queried frequently, ZSTD for cold data rarely accessed
Higher compression uses more CPU during writes and reads. For data queried often, LZ4 minimizes query latency. For data rarely accessed, ZSTD significantly reduces storage costs.
4. Avoid costly mutations and TTL clashes
ALTER TABLE
operations that modify or delete existing data (mutations) are expensive because they rewrite entire data parts. Frequent mutations slow ingestion by competing for disk I/O and CPU with background merges.
For workloads that need updates or deletes, ReplacingMergeTree with a version column offers an alternative. This engine automatically deduplicates rows with the same ORDER BY
key during merges, keeping only the row with the highest version number.
TTL (Time To Live) rules automatically delete old data based on a time column. However, TTL operations can trigger large numbers of merges if not configured carefully. Align TTL intervals with partition boundaries to minimize merge overhead.
Optimizing merge tree settings for streaming workloads
ClickHouse continuously merges small data parts into larger ones in the background. Tuning these merge settings significantly impacts both ingestion throughput and query performance.
Adjusting max_partitions_to_read
The max_partitions_to_read
setting limits how many partitions a single query can scan. This prevents expensive queries from accidentally reading months of data when they only need a few days.
You can set this limit globally in users.xml
or per-query using the SETTINGS
clause. For streaming analytics where queries typically focus on recent data, a limit of 100-1000 partitions is often reasonable.
When a query exceeds this limit, ClickHouse returns an error rather than executing. This acts as a guardrail against runaway queries.
Controlling background merges
ClickHouse uses background threads to merge small data parts into larger ones. The background_pool_size
setting controls how many merge threads run concurrently. More threads speed up merges but increase CPU and disk I/O usage.
The max_bytes_to_merge_at_max_space_in_pool
setting limits the maximum size of parts that can be merged together. Larger values reduce the number of parts on disk, improving query performance, but increase time and resources needed for each merge.
For high-throughput ingestion, you may need to increase background threads to keep up with incoming data. However, too many threads can starve queries of resources.
Fine-tuning insert_quorum
and replication
For replicated tables, insert_quorum
controls how many replicas acknowledge a write before it's considered successful. Setting this to 1 (the default) maximizes throughput by acknowledging writes as soon as one replica receives the data.
Higher quorum values increase durability at the cost of write latency. For streaming workloads where some data loss is acceptable, the default of 1 is often sufficient.
Async inserts (async_insert=1
) buffer multiple small inserts into larger batches before writing to disk. This dramatically improves throughput for high-frequency, small-batch workloads, but adds a small delay (controlled by async_insert_busy_timeout
) before data becomes visible to queries.
Tuning ingestion from Kafka and other streams
The Kafka table engine provides native integration with Kafka topics, but it requires careful configuration to achieve high throughput and low latency.
1. Kafka table settings
The kafka_num_consumers
setting controls how many consumer threads ClickHouse creates to read from Kafka. More consumers increase throughput but also increase memory usage and concurrent connections to Kafka. With proper configuration, Kafka integration can achieve 2 million rows per second on a single server.
The kafka_max_block_size
setting determines how many rows ClickHouse accumulates before flushing to the target MergeTree table. Larger blocks reduce the number of small parts created, improving merge performance, but increase memory usage and delay before data becomes queryable.
ClickHouse supports several message formats:
- JSONEachRow: Human-readable but slower to parse
- Protobuf and Avro: Binary formats that are faster and more compact, but require schema definitions
2. Batched synchronous vs async inserts
For direct inserts from application code, batching multiple rows into a single INSERT
statement dramatically improves throughput. The max_insert_block_size
setting controls the default batch size, though you can override it per query.
Synchronous inserts wait for data to be written to disk before returning. This provides strong consistency but limits throughput. Async inserts (async_insert=1
) buffer data in memory and return immediately, then flush to disk in the background.
The tradeoff with async inserts is that data may be lost if the server crashes before flushing. For many streaming analytics use cases where some data loss is acceptable, async inserts provide a significant throughput boost.
3. Handling schema evolution without downtime
Adding new columns to a ClickHouse table is fast and doesn't require rewriting existing data. Use ALTER TABLE ADD COLUMN
with a default value, and new rows will include the column while old rows use the default.
For more complex schema changes, create a new materialized view that reads from the source table and writes to a new target table with the updated schema. This allows gradual migration without stopping ingestion.
ReplacingMergeTree can also help with schema migrations by allowing you to insert updated versions of rows with the same key. During merges, ClickHouse keeps only the latest version.
Minimising query latency on hot data
Even with optimized ingestion, queries can be slow if they scan too much data or don't use ClickHouse's indexing features. For visualizing performance metrics, you can connect Grafana to ClickHouse to monitor query latency and throughput in real-time.
Bloom filters and skip indexes
Skip indexes (also called data skipping indexes) let ClickHouse skip entire granules of data that don't match a query's WHERE
clause. Bloom filters (tokenbf_v1
) are particularly useful for high-cardinality string columns like user IDs or session IDs.
To add a Bloom filter index, use ALTER TABLE ADD INDEX
with the tokenbf_v1
type. The index is built asynchronously during background merges.
Choose indexes based on your most common query patterns. If you frequently filter by user_id
, add a Bloom filter on that column. If you filter by timestamp ranges, the primary index (based on ORDER BY
) is usually sufficient.
Using denormalised aggregates
Pre-computed aggregate tables can dramatically reduce query latency for common dashboards and reports. Instead of calculating sums or averages on the fly, you query a table that already contains the results.
Materialized views can maintain these aggregates incrementally as new data arrives. For example, a materialized view might calculate hourly active users and write results to a separate table that's much faster to query than the raw event stream.
The tradeoff is freshness. If you need up-to-the-second accuracy, pre-computed aggregates may not be sufficient. However, for many use cases, a delay of a few seconds or minutes is acceptable in exchange for much faster queries.
Balancing max_threads
and concurrency
The max_threads
setting controls how many CPU cores ClickHouse uses for a single query. More threads can speed up large queries but increase CPU usage and can starve other queries of resources.
For streaming analytics workloads with many concurrent queries, you may want to limit max_threads
to prevent any single query from monopolizing the CPU. A value of 4-8 threads per query is often a good starting point.
The max_concurrent_queries
setting limits the total number of queries that can run simultaneously. This prevents overload during traffic spikes and ensures each query gets a reasonable share of CPU and memory.
Next steps to ship streaming analytics features with Tinybird
Tinybird is a managed ClickHouse platform designed for developers who want to integrate real-time analytics into applications without managing infrastructure. The platform handles ingestion, storage, query optimization, and API creation.
You can get started with Tinybird by creating a free account at https://cloud.tinybird.co/signup. The free tier includes generous limits for development and testing, with no credit card required.
Tinybird provides rapid ingestion from Kafka, webhooks, and other streaming sources, along with real-time SQL pipes that transform and aggregate data as it arrives. You can deploy parameterized API endpoints with built-in authentication, rate limiting, and row-level security.
The platform includes a CLI for local development and CI/CD integration, allowing you to define data pipelines as code, test them locally, and deploy to production with a single command.
FAQs about streaming analytics in ClickHouse
How do I secure per-tenant analytics APIs?
Row-level security (RLS) filters data based on the authenticated user or API token, ensuring each tenant only sees their own data. You can implement RLS using query parameters, user attributes stored in ClickHouse, or token scopes that encode tenant IDs.
API token scopes limit which endpoints a token can access and which data it can read. Query parameter validation prevents SQL injection and ensures only safe values are passed to queries.
Can I colocate ClickHouse servers with Kafka brokers?
Colocating ClickHouse and Kafka in the same availability zone or rack can minimize network latency, which is important for high-throughput streaming workloads. However, ensure each service has sufficient resources and they don't compete for CPU, memory, or disk I/O.
For cross-zone or cross-region replication, ensure you have sufficient network bandwidth and low latency. Plan for the additional cost of data transfer, which can be significant for high-volume streaming workloads.
What is the safest way to back up streaming tables?
Incremental backups with BACKUP TABLE
to external object storage like S3 or Google Cloud Storage provide a good balance of safety and efficiency. These backups capture only data that has changed since the last backup, reducing storage costs and backup time.
Snapshot replicas during low-traffic windows to minimize impact on production queries. Verify restore procedures regularly to ensure backups are valid and you can recover quickly in case of data loss.