Iterating Data Sources & Materialized Views¶
Advanced
When iterating data projects, sometimes you need to edit a Data Source schema. Tinybird lets you add more columns in an easy way, but other operations such as changing the sorting key or changing a column type, still require you to recreate the Data Source.
We are actively working to improve this workflow in Tinybird. Please reach out to Tinybird support (support@tinybird.co) if you have any questions.
This is an easy process when you are still defining your project but can be a bit tricky if the Data Source is already in production. This guide will walk you through the process for different scenarios.
This is an Advanced guide that assumes you are familiar with the Tinybird CLI. Never used the Tinybird CLI before? Follow the Quick Start CLI guide .
This guide covers four scenarios, and you should read the section that best suits you:
Scenario 1: I’m not in Production covers the steps you should take if you are not yet in production and don’t need to preserve data
Scenario 2: I can stop data ingestion covers the steps you should take if you are already in production, but you are able to pause your data ingestion
Scenario 3: I need to change a Materialized View & I can’t stop data ingest covers the steps you should take if you are writing data to Materialized View & can’t stop ingestion to the Data Source
Scenario 4: It’s too complex and I can’t figure it out
Scenario 1: I’m not in Production¶
This scenario assumes that you are not in Production and can accept losing any data you have already ingested.
If you are not in production, and you can accept losing data, you can use the Tinybird CLI to pull your Data Source down to a file, modify it, and push it back into Tinybird.
Begin with tb pull
to pull your Tinybird resources down to files. Then, modify the .datasource
file for the Data Source you want to change. When you’re finished modifying the Data Source, you need to delete the existing Data Source from Tinybird, either in the CLI with tb datasource rm
or through the UI. Finally, push the new Data Source to Tinybird with tb push
.
See a screencast example:
Scenario 2: I can stop data ingestion¶
This scenario assumes that you have stopped all ingestion into the affected Data Sources.
1. Use the CLI to pull your Tinybird resources down into files
Use tb pull --auto
to pull your Tinybird resources down into files. The --auto
flag will organise the resources into directories, with your Data Sources being places into a datasources
directory.
2. Create the new Data Source
Create a copy of the Data Source file that you want to modify and rename it.
For example, datasources/original_ds.datasource
-> datasources/new_ds.datasource
.
Modify the new Data Source file to make the changes you need.
Now push the new Data Source to Tinybird with tb push datasources/new_ds.datasource
.
3. Populate the new Data Source with existing data
If you want to move the existing data from the original Data Source to the new Data Source, you can do so by using a Pipe that materializes data into the new Data Source.
You can use the Pipe to transform data from the original schema into the modified schema of the new Data Source.
For example, let’s say our original Data Source included a nullable(Int64)
, and the new Data Source replaces this with just Int64
. In this case, we need to tranform the original data, replacing any occurances of NULL
with a number, like 0
.
We can do this using the following Pipe, paying particular attention to the TYPE
and DATASOURCE
configs at the end:
NODE temp_0
SQL >
SELECT
* EXCEPT (my_nullable_column),
toInt64(coalesce(my_nullable_column,0)) as my_column
FROM original_ds
TYPE materialized
DATASOURCE new_ds
We can then push the Pipe to Tinybird, passing the --populate
flag to force it to immediately start processing data.
tb push pipes/temp.pipe --populate --wait
When it’s done, we can remove the Pipe.
tb pipe rm temp
At this point, you can review your new Data Source and ensure that everything is as expected.
4. Delete the original Data Source and rename the new Data Source
You can now delete the original Data Source, then, rename the new Data Source to use the name of the original Data Source.
By renaming the new Data Source to use the same name as the original Data Source, any SQL in your Pipes or Endpoints that referred to the original Data Source will continue to work.
If you have a Materialized View that depends on the Data Source, you must first disconnect the Pipe that is materializing data. You can modify and reconnect your Pipe after completing the steps above.
Scenario 3: I need to change a Materialized View & I can’t interrupt service¶
This scenario assumes you want to modify a Materialized View that is actively receiving data and serving API Endpoints, and you want to avoid service downtime.
Because this is a complex scenario, we’ll introduce some names for the example resources to make it a bit easier to follow along.
Let’s assume that you have a Data Source that is actively receiving data, we’ll call this the Landing Data Source
. From the Landing Data Source
, you have a Pipe that is writing to a Materialized View: we will call these the Materializing Pipe
and Materialized View Data Source
respectively.
1. Use the CLI to pull your Tinybird resources down into files
Use tb pull --auto
to pull your Tinybird resources down into files. The --auto
flag will organise the resources into directories, with your Data Sources being places into a datasources
directory.
2. Duplicate the Materializing Pipe & Materialized View Data Source
Duplicate the Materializing Pipe
& Materialized View Data Source
.
For example:
pipes/original_materializing_pipe.pipe -> pipes/new_materializing_pipe.pipe
datasources/original_materialized_view_data_source.datasource -> datasources/new_materialized_view_data_source.datasource
Modify the new files to change the schema as needed.
Lastly, you’ll need to add a WHERE
clause to the new Materializing Pipe
. This clause is going filter out old rows, so that the Materializing Pipe
is only materializing rows newer than a specific time. For the purpose of this guide, we’ll call this the Future Timestamp
. Do not use variable time functions for this timestamp (e.g. now()
), you should pick a static time that is in the near future, e.g. if it is currently 2022-01-12 13:00:00
then you could choose 2022-01-12 13:15:00
. Five to fifteen minutes should be enough. The condition should be >
.
For example:
WHERE … AND my_timestamp > "2022-01-12 13:15:00"
3. Push the Materializing Pipe & Materialized View Data Source
Push the Materializing Pipe
& Materialized View Data Source
to Tinybird:
tb push datasources/new_materialized_view_data_source.datasource
tb push pipes/new_materializing_pipe.pipe
4. Create a new Pipe to transform & materialize the old schema to the new schema
We now have two Materialized Views, the one with the original schema, and the new one with the new schema. We need to take the data from the original Materialized View, transform it into the new schema, and write it into the new Materialized View.
To do this, we will create a new Pipe, and for this guide we’ll refer to it as the Transform Pipe
. In your Transform Pipe
you should first create the SQL SELECT
logic that transforms the old schema to the new schema.
Lastly, your Transform Pipe
should have a WHERE
clause that only selects rows that are older than our Future Timestamp
. The condition should be <=
.
For example:
WHERE … AND my_timestamp <= "2022-01-12 13:00:00"
5. Wait until after the Future Timestamp, then push & populate with the Transform Pipe
Now, to avoid any potential for creating duplicates or missing rows, wait until after the Future Timestamp
time has passed. This means that there should no longer be any rows arriving that have a timestamp that is older than the Future Timestamp
.
Then, push the Transform Pipe
and force a populate:
tb push pipes/new_materializing_pipe.pipe --populate --wait
6. Wait for the populate to finish, then change your API Endpoint to read from the new Materialized View Data Source
You should wait until the previous command has completed, to ensure that all data from the original Materialized View has been written to the new Materialized View Data Source
.
When it is complete, modify the API Endpoint that is querying the old Materialized View to query from the new Materialized View Data Source
.
For example:
SELECT * from original_materialized_view_data_source
Would become:
SELECT * from new_materialized_view_data_source
7. Test, then clean up old resources
You should now test that your API Endpoint is serving the correct data. If everything looks good, you can tidy up your Workspace by deleting the original Materialized View & the new Transform Pipe
.
Scenario 4: It’s too complex and I can’t figure it out¶
If you are dealing with a very complex scenario, please contact Tinybird support (support@tinybird.co) and we can help guide you.