Apr 23, 2021

ClickHouse tips #6: Filtering data in subqueries to avoid joins

Sometimes you can replace joins on ClickHouse using where clauses, having the same performance as with Join engines. Learn how here.
Xoel López
Founder at TheirStack

Imagine that you want to join two tables, and filter by a column that comes from the table in the right side of the join. On ClickHouse the query a bit different than what you’d do in other databases like Postgres, and it will result in a big performance improvement.

Let’s say one of the tables is this events table, with 100M rows:

And the other table is this products one, with ~2M rows

If you will always filter the result after making the join as in the query above, you don’t need to make a join at all. ClickHouse saves data column-by-column, so filtering by the values in a column is a very fast operation. If you rewrite the query as follows, it would be just as fast. And you wouldn’t have create a Join table for it:

Tinybird lets you create real-time API endpoints on in minutes instead of hours of days, powered by ClickHouse. We’re still in private beta, but if you want to try out product, create an account here.

Do you like this post?

Related posts

More Data, More Apps: Improving data ingestion in Tinybird
Low-code analytics with James Devonport of UserLoop
May 05, 2023
Modern data management with real-time Change Data Capture
You can now explore and analyze time series data in Tinybird
To the limits of SQL... and beyond
Export data from Tinybird to Amazon S3 with the S3 Sink
Mar 21, 2024
8 example projects to master real-time data engineering
Killing the ProcessPoolExecutor
Aug 09, 2023
The Data Journey: Unlocking data for the right now
Tinybird is SOC 2 Type II compliant
Dec 19, 2022

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.
Need more? Contact sales for Enterprise support.