Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

ClickHouse tips #6: Filtering data in subqueries to avoid joins

Sometimes you can replace joins on ClickHouse using where clauses, having the same performance as with Join engines. Learn how here.
Xoel López
Apr 23, 2021
 ・ 
  min read

Imagine that you want to joint two tables, and filter by a column that comes from the table in the right side of the join. On ClickHouse the query a bit differently than what you’d do in other databases like Postgres will result in a great performance improvemen.

Let’s say one of the tables is this {% code-line %}events{% code-line-end %} table, with 100M rows:

And the other table is this {% code-line %}products{% code-line-end %} one, with ~2M rows

If you were in Postgres, you’d do this:

But that is slow. We’ve explained how to make joins like that faster in this guide. This would be the query avoce rewritten to join {% code-line %}events{% code-line-end %} with a {% code-line %}products{% code-line-end %} table that has a Join engine. As you’d expect, it’s much faster:

If you will always filter the result after making the join as in the query above, you don’t need to make a join at all. ClickHouse saves data column-by-column, so filtering by the values in a column is a very fast operation. If you rewrite the query as follows, it would be just as fast. And you wouldn’t have create a Join table for it:

Tinybird lets you create real-time API endpoints on in minutes instead of hours of days, powered by ClickHouse. We’re still in private beta, but if you want to try out product, create an account here.

Xoel López
Share this post

Subscribe to our newsletter

Musings on transformations, tables and everything in between.