PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

Experimental ClickHouse: Projections

ClickHouse tags a major release around once a month, which is an order of magnitude more often than similar projects, and it does it while maintaining speed and stability.
The Data Base
Raúl Marín
Raúl MarínSoftware Engineer

ClickHouse tags a major release around once a month, which is an order of magnitude more often than similar projects, and it does it while maintaining speed and stability.

One of the drawbacks of having a fast release cycle is that large developments are harder to introduce, specially if they can’t be broken down into smaller changes. The way ClickHouse handles this is by introducing these features as experimental meaning they are not ready for production use, either because they are not fully implemented, the design needs more iterations, they require more testing or further reviews, etc. With time they might be promoted to a stable feature, replaced by a better alternative or discarded.

Some features that were experimental in the past are decimal or LowCardinality types; some that are currently experimental are Live Views or Window Functions. A new addition to this list are projections, introduced in 21.6 by Amos Bird, which can be thought of as materialized views for table parts. A nice aspect about them is that the database will decide whether to use, and which one, projections automatically based on the query.

An example

To present an example with real data I took the Github events dataset and loaded the events from the months of January and February of 2020.

Once fully merged, I have one partition per day and each of them is internally ordered by repository, so the data related to events in the same repository will stored together in disk. If I wanted to know how many events happened around the Postgis repository I could count them with:

To know the result, ClickHouse will go to each one of the table partitions (60), open the block containing the repository names and count how many of them contain this exact string. As the data is ordered it doesn’t need to check all names and instead it can look for where 'postgis/postgis' starts and ends. As a result it is only exploring ~500k rows instead of the whole 113 million.

On the other hand, if I wanted to know how many events were generated by my username, ClickHouse needs to read the whole username blocks completely since the ORDER BY doesn’t help us in this query:

‍

We will also need to read the complete blocks if we mix both columns and need to know which repositories are interacted with by the users interacting with Postgis:

One of the subqueries, the one looking for interactions with Postgis, benefits from ordering by repository while the subquery that counts events per username would benefit from ordering by username. There isn’t a good way to improve both subqueries by sorting the data in a different way.

A possible solution could be to create a materialized view. By having a second table, this time ordered by username, we could have the data ordered by repository and by username and use one or the other depending on the query needs. This has a clear drawback were you know need to remember which table to use depending on what’s best when writing the query.

Let’s see how we could address this with projections instead:

Now, using the exact same query requires reading way less rows from disk:

In the background ClickHouse will:

  • Find all the usernames that have interacted with the 'postgis/postgis' repository. To do that it will use the default partitions, ordered by repository. This requires reading only ~0.5 M rows.
  • Out of those usernames discard the ones ending in [bot]. No need to read anything from disk.
  • For each remaining username (50), find which repositories they have interacted with. Using the projection partitions, ordered by username, it can read only ~0.5M rows instead of the whole dataset. 50 usernames x 0.5M rows = 25M rows (if you go one by one).
  • Lastly, it groups everything together to count them. No extra reading required.

Without requiring any change from the caller we’ve reduced the disk usage by about 85% at the cost of duplicating the storage disk usage.

But wait, there is more. ClickHouse really shines when you know the kind of queries you are going to use. In the same way that we could have an aggregation in a materialized view, we could create a projection that stores the relation between usernames and repositories:

When we execute the exact same query, which is the magical part that wasn’t available in ClickHouse before, it will detect that using the new partition makes sense and use it:

Note that we could theoretically improve things further if the new projection was sorted to make it easier to do search later, but this isn’t currently supported. It’s an experimental feature at the end of the day.

Projections and Skipping Index

How does projections differ from other ClickHouse features like Skipping Index? Well, indexes are more limited since they are applied per granule ("the smallest indivisible data set that ClickHouse reads when selecting data") and you only have a few types to choose from, but they can be equally powerful depending on the query:

The result is the same with slightly more data read:

Both Skipping Indexes and Projections share the limitation that they are only available for the MergeTree family and the benefit that they are automatically applied over the queries. As indexes can be much smaller, I would consider them as the first option when optimizing the storage for a query and only think of projections when a more complex transformation or aggregation is needed.

Projections and Materialized Views

Materialized views have some clear benefits over projections aside from the fact that they are already a stable feature: you can use them with any table engine and you can output several views to the same target table. With materialized views you can easily enrich the input rows with data from multiple sources.

Having a target table is both a benefit and a drawback of materialized views. For simple cases, like aggregations or transformations of the source table, having to change your queries or know there are materialized tables aside from the main one can be inconvenient, but as queries become more complex, hiding them behind a materialized table can be a very good thing.

One important distinction is that even though both projections and materialized views are generated after the data has been inserted in the main table, how they act if the query used fails is completely different. On the one hand, a failure while adding data through a materialized view means that this data won’t be available in the target table. This situation leads to many headaches as it could happen that some data has been already inserted and some failed, and recovering from this inconsistent status requires extra information. On the other hand, as projections are generated in the background, they deal with failures silently. This is ok since you can mix data from both projections and normal parts, but since the issue was caused by the applied query, it is expected that once you go and apply that query later you will get the exception. I find this way of error handling preferable as the caller will either get a clear error from their SQL query or a full result with no missing data.

‍

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

The Data Base
Jul 08, 2021
ClickHouse tips #9: Filling gaps in time-series on ClickHouse
Xoel López
Xoel LópezFounder at TheirStack
1ClickHouse tips #9: Filling gaps in time-series on ClickHouse
The Data Base
Mar 16, 2021
Coming soon on ClickHouse: Window functions
Xoel López
Xoel LópezFounder at TheirStack
1Coming soon on ClickHouse: Window functions
The Data Base
Jun 01, 2021
ClickHouse tips #8: Generating time-series on ClickHouse
Xoel López
Xoel LópezFounder at TheirStack
1ClickHouse tips #8: Generating time-series on ClickHouse
The Data Base
Aug 20, 2024
Want a managed ClickHouse®? Here are some options
Cameron Archer
Cameron ArcherTech Writer
1Want a managed ClickHouse®? Here are some options
The Data Base
Jul 21, 2021
ClickHouse tips #10: Null behavior with LowCardinality columns
Alejandro del Amo
Alejandro del AmoEngineering Ops Tech Lead
1ClickHouse tips #10: Null behavior with LowCardinality columns
The Data Base
Apr 23, 2021
ClickHouse tips #6: Filtering data in subqueries to avoid joins
Xoel López
Xoel LópezFounder at TheirStack
1ClickHouse tips #6: Filtering data in subqueries to avoid joins
The Data Base
Aug 13, 2024
Tinybird vs. ClickHouse®: What's the difference?
Ariel Pérez
Ariel PérezHead of Product & Technology
1Tinybird vs. ClickHouse®: What's the difference?
The Data Base
Apr 01, 2025
I've operated petabyte-scale ClickHouse® clusters for 5 years
Javi Santana
Javi SantanaCo-founder
1I've operated petabyte-scale ClickHouse® clusters for 5 years
The Data Base
Sep 29, 2021
ClickHouse Tips #12: Apply Functions to Columns with a Single Call
Tinybird
TinybirdTeam
1ClickHouse Tips #12: Apply Functions to Columns with a Single Call
The Data Base
Mar 24, 2021
ClickHouse tips #5: Adding and subtracting intervals
Alberto Romeu
Alberto RomeuSoftware Engineer
1ClickHouse tips #5: Adding and subtracting intervals