Master Materialized Views in Tinybird

Advanced

Before you start building Materialized Views in Tinybird, you need to know how they work. Tinybird approaches Materialized Views a bit differently than you may be used to, so instead of launching right in, we recommend you read this guide first to understand how Materialized Views work in Tinybird.

What is a Materialized View in Tinybird?

In the simplest terms, a Materialized View is the streaming results of a transformation Pipe saved as a new Data Source. As new data is ingested into original Data Sources, the transformed results from that Pipe will continually populate in the new Materialized View Data Source, and you can query it just like you would query any other Data Source in Tinybird.

How is data populated into a Materialized View in Tinybird?

When a Materialized View is created in the UI, Tinybird initially populates the Materialized View with a subset of the data. This allows you to quickly validate results. If you only care about future data, you can leave it as is. If you’d like historical data to be populated in the Materialized View, you can select “Populate With All Data” on creation in the UI. Likewise in the CLI, you can choose to populate with all data or populate partially. This process ensures that all saved data in the original Data Source will be processed through the transformation Pipe and stored in the resulting Materialized View.

In Tinybird, data is ingested into Materialized Views in blocks. This process is explained in the diagram below. Every time data is ingested into the original Data Source, the materialization process is triggered, applying the transformation pipe over the data ingested and saving the output of that pipe, which is a partial result, in the Materialized View.

{% tip-box title="NOTE" %}Taking into account how data is populated into Materialized Views, there are certain operations that shouldn’t be included in the SQL for your transformation Pipe. Since materializations are performed only over the new data being ingested, and not over the whole data source, certain operations like those that follow don’t make sense to use, such as {%code-line%}ORDER BY{%code-line-end%}, {%code-line%}Neighbor{%code-line-end%}, {%code-line%}DISTINCT{%code-line-end%}.{% tip-box-end %}

Doing aggregations the right way with Materialized Views

There are two basic types of Materialized Views: those that aggregate and those that don’t. For aggregated Materialized Views, you’ll need to keep the following in mind.

Occasionally, a background process in Tinybird merges partial results saved in intermediate states during the ingestion process, compacting the results and reducing the number of rows. The diagram below explains this process in more detail through a simple example.

Let’s say we’re an ecommerce store and want to materialize the count of units sold by product. We are ingesting a JSON object every minute, with a product (represented by a capital letter e.g. “A” in the diagram) and the quantity sold during the last minute.

We could define in our Pipe some simple SQL to sum the count of units sold per minute as data is ingested, and this Pipe will be applied over each new block of appended data. The output will immediately be saved in intermediate states into the Materialized View.

Every 8-10 minutes, the background process we mentioned will merge the intermediate states, completing the aggregation across the entire Data Source.

Because we are working in real-time, we can’t always wait for this background process to take place. Fortunately, this is easily accounted for: When querying the Materialized View, you should use the proper merge combinator and {% code-line %}GROUP BY{% code-line-end %} in the query itself.

Understanding State and Merge combinators for Aggregates

As we mentioned, it’s not practical to track when the background process that merges aggregate results in a Materialized View has occurred. Because of this, you need to store intermediate states using the {% code-line %}-State{% code-line-end %} suffix. Note that if you’re creating from the UI, this will be done automatically.

Here’s an example of using {% code-line %}-State{% code-line-end %} when defining the transformation Pipe to calculate these intermediate states:

You also need to specifically define the appropriate schema for the Materialized View:

Finally, you’ll need to retrieve the data using the {% code-line %}-Merge{% code-line-end %} suffix in your endpoint node to make sure the merge process is completed for all data in the Materialized View:

Understanding the Materialized View parameters

When you create a Materialized View in the UI, Tinybird will automatically recommend the best parameters for most use cases. It is useful to understand these parameters for more complex use cases.

Sorting Key

The Sorting Key defines how data is sorted and is critical for great performance when filtering. You should choose the order of your sorting keys depending on how you are going to query them. Here are a few examples for a simple Materialized View containing {% code-line %}day{% code-line-end %}, {% code-line %}city{% code-line-end %}, and {% code-line %}avg{% code-line-end %} columns:

  • You want to query the average for all cities on a particular day: The {% code-line %}day{% code-line-end %} column should be the first sorting key.
  • You wanted the average over the last month for a particular city: The {% code-line %}city{% code-line-end %} column should be the first sorting key.

Note: For Materialized Views containing aggregations, every column in the {% code-line %}GROUP BY{% code-line-end %} statement has to be in the sorting keys, and only those columns can be sorting keys.

For non-aggregated Materialized Views, you can select other columns if they fit better for your use case, but we don’t recommend adding too many. You get only a negligible performance boost after the 4th sorting key column.

Partition By

A partition is a logical combination of records by a given criterion. In most cases you don't need a partition key, or a partition by month is enough.

Tinybird will guess the best partition key if your materialization query has a Date or DateTime column. If there aren’t any Date columns, Tinybird doesn't set a partition key. As we recommend here, having no partition is better than having the wrong one. If you are comfortable with partitions and you really want to group records by another criteria, you can switch to the advanced tab and add your custom partition code.

TTL

If you have certain lifetime requirements (such as to satisfy GDPR regulations) on the data in your Materialized Views, you can specify a Time To Live (TTL) parameter when creating a Materialized View.

This can also be useful if you only intend to query a brief history of the data. For example, if you always and only query for data from within the last week, you can set a TTL of 7 days. When a TTL is set, all rows older than the TTL will be removed from the Materialized View.

Advanced Tab (UI)

Most of the time, the defaults recommended by Tinybird will be the best parameters to use. Occasionally, however, you may need to tweak these parameters. For this, you can use the Advanced tab. Here you can directly write code that will be passed to the View creation.

Warning: You’ll need certain ClickHouse expertise to modify these parameters, so if you aren’t sure, stick with the defaults.

Create a Materialized View

Now that you understand how Materialized Views work in Tinybird, you can read our guide on how to create them in the Tinybird UI or CLI, and start building Materialized View Data Sources in your own Workspace!

ON THIS GUIDE