PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now.

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now.

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

ClickHouse®️ JOINs... 100x faster

We recently introduced two pull requests to ClickHouse that significantly improve JOIN performance in common scenarios.
The Data Base
Maksim Kita
Maksim KitaPrincipal Software Engineer

Recently, we introduced two new pull requests to ClickHouse that will be available with ClickHouse 24.4. These changes improve the performance of JOINs across many production scenarios, in some cases increasing query speed by multiple orders of magnitude.

PR #1: JOIN predicate pushdown using equivalence classes

Predicate pushdown is a query optimization technique used in database management systems to significantly reduce the amount of data a query processes.

In ClickHouse, as in most other SQL databases, query execution is divided into multiple stages:

  1. Query parsing.
  2. Query analysis.
  3. Build a logical query plan.
  4. Optimize logical query plan.
  5. Build a physical query plan.
  6. Optimize physical query plan.
  7. Execute physical query plan.

In most databases, the logical query plan is a tree where each node is a relational algebra operator, and the leaves of the query plan tree are data sources, usually table scans.

It's convenient to represent the steps of a query plan with relational algebra. The relational algebra and its operators are well-studied, and many known optimizations exist.

One such optimization is predicate pushdown.

Predicate pushdown improves query performance by pushing predicates closer to operators that scan data. Early filtering helps subsequent steps of a query plan to process much less data, improve index usage, and - in distributed databases - significantly reduce the amount of data transfer between nodes.

A predicate pushdown optimization can be applied to most relational algebra operators, such as projections, aggregations, sorts, and unions. But the most important optimization is the JOIN step predicate pushdown, simply because JOIN operators typically produce huge amounts of data.

It's important to note that for some operators it is safe to push only part of the predicate. In such a scenario, the predicate needs to be split into parts. In databases, it makes sense to store predicates in conjunctive normal form, applying each predicate pushdown optimization separately for each conjunction, if necessary.

Here's an example:

Let's look at the ClickHouse logical query plan for this query:

In this example, the predicate is pushed to the LEFT side of the JOIN. Note that I added SETTINGS optimize_move_to_prewhere = 0, because otherwise that Filter step would be converted to PREWHERE for the left table.

Before ClickHouse 24.4, a simple version of predicate pushdown optimization was used for JOINs. Notably, it did not consider equivalence classes of JOINed columns (that is, equivalent columns after the JOIN is performed).

In PR #61216, we introduced a more complex analysis of predicates that uses equivalence classes and can transform predicates that are applied to one side of JOIN to predicates that can be applied to another side of JOIN. In addition, the predicate will be split into different parts and only safe parts will be pushed down, if necessary.

Consider the example:

In this example, we know that the id column from test_table_1 is equivalent to the id column of test_table_2, and we can transform the test_table_1.id = 5 predicate to test_table_2.id = 5 and push it to the right table.

Filter pushdown optimization for different JOIN types has different logic:

  1. For INNER JOINs we can push all predicates to both sides of the JOIN. We can also transform predicates that use only equivalent columns from the LEFT side to the RIGHT side and vice versa.
  2. For LEFT/RIGHT JOINs we can push conditions that use only columns from the LEFT/RIGHT table to the LEFT/RIGHT JOIN side. We can also transform predicates that use only equivalent columns from the LEFT side to the RIGHT side for LEFT JOINs and from the RIGHT side to the LEFT side for RIGHT JOINs.

Let's look at the query plan after this optimization was implemented:

Now, the predicate is pushed to both the LEFT and RIGHT sides of the JOIN. We also see improvements in query performance of INNER, LEFT and RIGHT JOINs in absolute numbers:

Here are the complete results of ClickHouse performance tests, with the highlights shown below.

A chart showing performance improvements for changes made to ClickHosue 24.4 to speed up JOIN performance. The results indicate that 4 sample queries had speedups by between 7x and 187x.
These changes improved JOIN speed on INNER and LEFT JOINs by over 180x.

This optimization additionally resolves several related, preexisting issues in ClickHouse. Here are some examples:

  1. #10913
  2. #45242
  3. #55054
  4. #55058
  5. #26268

PR #2: Automatically converting OUTER to INNER JOIN

We introduced another change that allows ClickHouse to automatically convert an OUTER JOIN to an INNER JOIN if the predicate after JOIN filters all non-joined rows with default values.

This technique allows for additional optimization opportunities because after the JOIN is converted from OUTER to INNER, we can apply predicate pushdown in more scenarios.

Using the same table setup as the previous optimization...

Here's the ClickHouse logical query plan:

Note
With actions = 1, we can see more query plan details like JOIN type, concrete actions that will be executed, and other details. Note that I just kept the key part of the query plan so we can see that we have LEFT JOIN type.

In this example, we know the predicate test_table_2.id = 5 will always filter non-joined rows from the LEFT JOIN with default values.

In #62907, we introduced an analysis that can automatically convert an OUTER JOIN to an INNER JOIN. During that analysis, we can understand that the predicate after an OUTER JOIN will always filter non-joined rows with default values. In that case, we can convert an OUTER JOIN to INNER JOIN.

We do this by trying to perform constant folding of the predicate that is executed after the OUTER JOIN, where we replace all RIGHT/LEFT side columns for LEFT/RIGHT JOIN with constant default values columns. If the result of predicate constant folding is constant False or NULL we can convert the OUTER JOIN to an INNER JOIN.

Here's the query plan after this optimization was implemented:

Here, the LEFT JOIN is changed to an INNER JOIN. We can see the Filter step is pushed down because for an INNER JOIN it is safe to push test_table_2.id = 5 to both sides of the JOIN.

We also see improvements in initial query performance after the optimization was applied:

In the ClickHouse performance test results we see huge improvements for INNER, LEFT, and RIGHT JOINs:

A table showing performance improvements for sample queries after our work. Performance speedups ranged from 2x to over 250x depending on the query.
These changes improved ClickHouse JOIN performance significantly, often by several orders of magnitude.

Conclusion

In databases, You can achieve significant performance improvement by using high-level logical optimizations on top of the query plan. Such optimizations work well together and can be combined to provide even better performance improvements, such as we have shown here.

These two PRs, which are available in ClickHouse 24.4, have dramatically improved JOIN performance across many production scenarios, and will markedly improve performance for ClickHouse users.

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now.

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark

Related posts

The Data Base
Apr 23, 2021
ClickHouse tips #6: Filtering data in subqueries to avoid joins
Xoel Lopez
Xoel LopezFounder at TheirStack
1ClickHouse tips #6: Filtering data in subqueries to avoid joins
The Data Base
Aug 13, 2024
Tinybird vs. ClickHouse®: What's the difference?
Ariel Perez
Ariel PerezHead of Product & Technology
1Tinybird vs. ClickHouse®: What's the difference?
The Data Base
Feb 16, 2021
Clickhouse, Open Source and Tinybird
Javi Santana
Javi SantanaCo-founder
1Clickhouse, Open Source and Tinybird
The Data Base
May 29, 2019
ClickHouse Meetup Madrid videos
Javi Santana
Javi SantanaCo-founder
1ClickHouse Meetup Madrid videos
The Data Base
Aug 20, 2024
Best managed ClickHouse services compared in 2025
Cameron Archer
Cameron ArcherTech Writer
1Best managed ClickHouse services compared in 2025
The Data Base
Feb 25, 2021
ClickHouse tips #3: the transform function
Javi Santana
Javi SantanaCo-founder
1ClickHouse tips #3: the transform function
The Data Base
Mar 07, 2021
ClickHouse Tips #4: Materializing UNION ALL queries
Elena Torro
Elena TorroSoftware Developer
1ClickHouse Tips #4: Materializing UNION ALL queries
The Data Base
Mar 24, 2021
ClickHouse tips #5: Adding and subtracting intervals
Alberto Romeu
Alberto RomeuSoftware Engineer
1ClickHouse tips #5: Adding and subtracting intervals
The Data Base
Sep 22, 2021
ClickHouse tips #11: Best way to get query types
Tinybird
TinybirdTeam
1ClickHouse tips #11: Best way to get query types
The Data Base
Jul 08, 2021
ClickHouse tips #9: Filling gaps in time-series on ClickHouse
Xoel Lopez
Xoel LopezFounder at TheirStack
1ClickHouse tips #9: Filling gaps in time-series on ClickHouse