PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

Lessons learned from 5 years operating huge ClickHouse® clusters: Part II

This is the second part of the series. Here's more of what I've learned from operating petabyte-scale ClickHouse clusters for the last 5+ years.
The Data Base
Javi Santana
Javi SantanaCo-founder

This is the second part of this series. You can read the first one here. 

Handling load

This section is mostly about reads. I talked about ingestion in the previous post, and while reads and writes could use the same resources, I'm going to focus on reads in isolation, as if you only had reads.

Well, I lied. Because I'm going to start by telling you: you can't decouple reads and writes. If you see any benchmark that only gives read performance, it may look nice in the benchmark, but that's not true in real life.

Reads depend a lot on how many parts a table has, and the number of parts depends on the ingestion. If you are inserting data often, you'll get penalized while reading no matter what schema your table has (more about this in performance). You can reduce parts by running merges more often, but you'll need more CPU.

When it comes to reads, there are many topics I could cover. I'll start here: 

  1. Handling different kinds of traffic 
  2. Hardware resources
  3. Query design

About handling different kinds of traffic and hardware

In a cluster, you usually have:

  • Real-time traffic, aka queries that need to answer in less than X seconds, usually serving a dashboard or some real-time workload. 
  • Long-running queries, which could be from an ad hoc analysis (like someone who decides they need to know the average of something over 7 years of data)
  • Backfills (this requires its own section).
  • Anything else that's not real time, that is, queries where it doesn't matter how long it takes to run.

Solving this is a never-ending design topic: how do I not let my long-running queries affect my latency on real-time queries, especially the ones over p95? There are many ways to approach it, for example, by having different replicas for different workloads. But that's expensive because you need to allocate hardware for both of them, and while real-time traffic is usually predictable, one-off queries just "happen", and you can't plan ahead. You could be smarter and handle that in the application layer (before sending the query), but you probably have 3-4 different sources of queries: the app, people running clickhouse-client, people running a BI, and people running a BI. Yes, I counted it twice because BI queries are usually really bad.

You could create spot replicas to run those long-running queries, but that needs extra handling and adds complexity. At Tinybird, we handle it with a combination of app logic and a load balancer. We know the status of the replicas, and based on that, we pick the right one to send the query. Sometimes we reject the query to avoid crashing the server.

About query design

Queries have infinite configuration options, but one of the most important ones is max_threads. It controls how many threads you can use to read the data. In general, real-time queries should only use 1, and if you need more than 1-2, you'll need a lot of hardware if you have many QPS. You need to understand that, most of the time, your bottleneck here is scan speed. So you can keep adding CPUs, but you are still limited by scan size. A rule of thumb: you can scan 500Mb/s on a single machine on average. This is, of course, not true for every kind of machine or workload, but it's a good rule of thumb. The other important settings are those that control memory:

  • max_memory
  • max_bytes_before_external_group_by 

Both control how the memory is used, and you'll need to control how much memory you give to each query. If you want to run massive group by, joins, window functions, or anything like that, you want max_memory to be high. If you want those queries to be fast, max_bytes_before_external_group_by should be high to avoid ClickHouse dumping partial results to disk and killing performance.

ClickHouse does not have an optimizer; you are the optimizer. And your users don't care about optimizing (that feature or report needs to be done ASAP). Query optimization is an art, but there are some simple rules you have to follow:

  1. Filter on columns in the sorting key first. Sorting key design is super important as well, so pay attention to it. Believe me, you are losing 10-100x on your queries with a bad sorting key design. Spend 2-3 hours understanding the data layout
  2. Run other filters, second, and try to use PREWHERE if you have large columns. Move high-selectivity filters to prewhere on columns that are small (not Strings, not Arrays).
  3. Then run everything that filters data out, mostly IN operations
  4. Save JOINs, GROUP BY, and other complex operations for last.

If you master these rules, I can guarantee you will be in the top p95 of ClickHouse users. Actually, use any LLM and just give it those rules and table schema, and it will do a great job. By the way, we wrote more about these things here.

It's not that simple, as you may have expected, otherwise query planners would be a few lines of code (and they are not). Depending on the cardinality etc., you may need to change the rules. 

So that's Query Design 101, and yes, it is really more about the user than the operator, but you still need to know those things because you are going to be fighting with users running bad queries all the time. Because even ClickHouse has a max total memory config, and it'll OOM. OOM is bad; depending on your database, it could take minutes to load (so you also need to design your HA setup properly)

The other setting you want to set is max_concurrent_queries. This can save you when the server is overloaded, so do not remove this value. You may need to change it depending on your workload, but keep it, it's a lifesaver.

Ship real-time data features faster!
Tinybird turns raw data into real-time APIs so companies like Vercel, Canva and Framer can deliver instant insights at scale.
Get started for free

Backfills

I'm adding this section because it's so painful that if I manage to save even 1 person 1 hour of pain, this article would be worth it.

Let me explain the scenario: you are inserting into a table, that table has a materialized view, you add another materialized view,  and you want to backfill it. This sounds simple, but it's not. 

You might be tempted to use POPULATE... Don't. It's broken because data can be duplicated. From the official documentation:

We do not recommend using POPULATE, since data inserted in the table during the view creation will not be inserted in it.

If you run an INSERT INTO SELECT * FROM table, you risk losing or duplicating data, and you'll need a lot of memory to do it.

So, when backfilling materialized views, here's what to do: use a Null table before the actual table, and increase the rows you push per block to avoid generating thousands of parts.

I can’t miss the opportunity to explain how to do this in Tinybird. Just change the MV SQL and then tb deploy, we’ll take care of all of this. 

Operating the cluster and monitoring

Aside from the regular stuff you monitor (CPU, mem, IO), you also want to track things like:

  • Number of queries running
  • ZooKeeper latencies
  • S3 errors (if you use it)
  • Replication lag
  • DDL queue length (this is the queue that stores what all the replicas need to execute)
  • Merge queue
  • Merges memory
  • Mutations

Things get stuck sometimes, and killing the queries does not always work, so keep an eye on everything.

You should track errors as well. These are the important ones:

  • Read only tables (when you hit this one, you are very likely fucked)
  • ZooKeeper errors
  • Max simultaneous queries.

You should learn to work with ClickHouse system tables. You can add a lot of metrics and everything, but when things go south, this is what will save you.

  • system.query_log is your bible. This tells you everything about queries. The ProfileEvents column is where you go to understand what a query did. Learn how to use it, master it, and learn the column names.
  • system.processes tells you what's currently running. Useful in fires.
  • system.part_log tells you how parts move, what was merged, and so on.
  • You need to learn about system.tables and system.columns to understand the table shapes.

You may want to enable some others, but those are the critical ones.

Track segfaults as well. They might happen, and you'll need to identify which query caused it. This does not happen often, and if your workload is stable,  I don't think you'll ever see one. But, if they happen, and you don't identify the query causing it, your cluster will go down all the time. And you don't want to be on call all the time.

I'm not going to talk too much about the tooling, some people use K8s with the Altinity operator, others handle it with scripts, but everybody needs to:

  • Be aware of adding and removing replica nodes to the cluster
  • Be careful dropping table replicas
  • Keep an eye on replication lag when adding a new replica
  • Remain aware of long-running queries when turning off a replica
  • Watch inserts

And so on. We do a multi-step process at Tinybird to shut a replica down that takes care of all of this (first remove traffic, then wait for the inserts and queries, kill them if needed). 

Be careful handling ON CLUSTER operations. As you may know, changing a table in the cluster requires some coordination. For that, ClickHouse uses Zookeeper or ClickHouse Keeper to coordinate replicas. If a replica is down, ON CLUSTER operations will take a lot of time (depending on the config) and may generate issues in your app if timeouts are not properly set (and trust me, they're not).

For super-low-latency use cases (under 20ms per query), you need to warm caches (take a look at the settings), because if you start sending traffic to a replica, the latencies will go up like crazy. Watch the memory available for the mark_cache. There is also an uncompressed cache, but, to be honest, we never used it successfully, so let me know if you did.

In general, alert on max_simultaneous_queries, connectivity issues, and queues growing without control. You will add more alerts on other things, but that depends heavily on what your cluster is designed for.

Last note: Altinity's knowledge base (and videos) is probably one of the best repositories to understand how to set up ClickHouse on your own (and in general, all the content that Altinity publishes is really good. A must-read if you handle ClickHouse).

Other stuff

Other random stuff you may find useful:

Table deletion 

By default, ClickHouse does not remove any table with more than 50 GB. This makes total sense, and you may be tempted to lower the limit to a few Mb. There are two kinds of developers/data engineers: 1) The ones who removed a table by mistake, and 2) the ones who will. If you drop a table by mistake, you'll need to go to the backups and that's a pain in the ass (unless you use the engine we developed, of course)

Materialized views

MVs are a killer feature, but they are hard to manage, they lead to memory issues, and they generate a lot of parts. In general, you need to be careful with MVs because people tend to add a lot of things in them. MVs are executed on inserts, so every new MV will make your ingestion slower.

Don't kill your server

Some column types can make your server go south, for example, a uniqExactState over a column with high cardinality (+200m) can kill your cluster on merges. That usually happens with aggregating columns, so be careful with them.

index_granularity can also kill your server if it's too low. My recommendation: do not go under 128. Low values are useful for point-like queries.

Subscribe to SCHEMA > Evolution
We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

Final thoughts

I focused a lot on things that can go wrong with ClickHouse, but that doesn't mean ClickHouse is a bad database—quite the opposite. Handling a database at scale is hard; if you have enough experience, you know it.

Anyways, I hope this saves you some headaches. (If you didn't read the first post, here it is again.)

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark

Related posts

The Data Base
Apr 01, 2025
I've operated petabyte-scale ClickHouse® clusters for 5 years
Javi Santana
Javi SantanaCo-founder
1I've operated petabyte-scale ClickHouse® clusters for 5 years
The Data Base
Mar 03, 2025
We added the Backup Database engine to ClickHouse
Maksim Kita
Maksim KitaPrincipal Software Engineer
1We added the Backup Database engine to ClickHouse
The Data Base
May 12, 2025
Query Booster: How Tinybird optimizes table schemas for you
Ramiro Aznar
Ramiro AznarData Engineer
1Query Booster: How Tinybird optimizes table schemas for you
The Data Base
May 15, 2025
A Developer's Guide to Data Engineering
Javi Santana
Javi SantanaCo-founder
1A Developer's Guide to Data Engineering
The Data Base
May 26, 2025
Can I use Supabase for analytics?
Cameron Archer
Cameron ArcherTech Writer
1Can I use Supabase for analytics?
The Data Base
Aug 18, 2025
How to ingest 1 billion rows per second in ClickHouse®
Javi Santana
Javi SantanaCo-founder
1How to ingest 1 billion rows per second in ClickHouse®
The Data Base
Mar 16, 2021
Coming soon on ClickHouse: Window functions
Xoel López
Xoel LópezFounder at TheirStack
1Coming soon on ClickHouse: Window functions
The Data Base
Aug 20, 2024
Want a managed ClickHouse®? Here are some options
Cameron Archer
Cameron ArcherTech Writer
1Want a managed ClickHouse®? Here are some options
The Data Base
Jan 26, 2021
Clickhouse Tips #1: Calculating Aggregations After a Given Date
Javi Santana
Javi SantanaCo-founder
1Clickhouse Tips #1: Calculating Aggregations After a Given Date
The Data Base
Jan 24, 2025
Outgrowing Postgres: Handling increased user concurrency
Ariel Pérez
Ariel PérezHead of Product & Technology
1Outgrowing Postgres: Handling increased user concurrency