Deployment strategies


So, you’ve made your Workspace production-ready and started working with the Data Project following the Git workflow. You’ve configured your Continuous Integration (CI) pipeline and everything is green, and now, how do you bring those changes to your main Environment?

Deploying a Data Project can be complicated, working with new, updated or deleted resources, maintaining streaming ingestion, API requests, and data operations, while being careful about the health and lifecycle of your Data Product.

In this guide you’ll learn about the default method for implementing Continuous Deployment (CD), how to bypass the default deployment strategy to create your custom deployments, and finally strategies to take into account when migrating data.

How deployment works

Before Workspaces could be integrated with Git, you had to carefully use the tb push command to deploy your local changes to your Workspace. Git integration enables a better workflow.

With the Git integration:

  • The Data Project is the real source of truth.

  • The remote Workspace saves a reference to the Git commit deployed.

This way we can make deployments a little bit smarter and easier to execute.

There are two steps in the CI/CD pipeline where you need to deploy changes to the remote Workspace or Environment.

  • With CI pipelines, a new Environment is created from the main one and deployment is done with tb deploy --populate --fixtures.

  • With CD pipelines, the default deployment strategy just runs tb deploy, which means Data Operations (such as populations) are left out.

The new tb deploy command is just a smarter version of tb push that does the following:

  • Checks the current commit in the Workspace and validates that is an ancestor of the commit in the Pull Request being deployed. If not, usually you have to git rebase your branch.

  • Performs a git diff from the current branch to the main branch so it can get a list of the Datafiles that changed.

  • Deploys them, in the case of CI to the remote Environment, in order, also deploying downstream dependent endpoints. For instance if you change a Materialized View or create a new version, Pipes and their API Endpoints depending on it are also deployed.

At this point, if you run a tb diff between the Git branch and the remote Environment, there should not be any changes.

When used with the --populate --fixtures flags, once resources have been deployed, it populates the Materialized Views if needed and appends data fixtures, so API Endpoints are ready to be tested. These flags are only recommended on CI pipelines and not in the main Environment.

The strategy to deploy to the main Environment in the CD pipeline is the same, taking into account the user deploying the branch to the main Environment is responsible to run the Data Operations.

Guide preparation

You can follow along using the ecommerce_data_project.

Download the project by running:

Git clone the project
git clone
cd ecommerce_data_project

Then, create a new Workspace and authenticate using your user admin Auth Token. If you don’t know how to authenticate or use the CLI, check out the CLI Quick Start.

Authenticating to EU
tb auth -i

** List of available regions:
   [1] us-east (
   [2] eu (
   [0] Cancel

Use region [1]: 2

Copy the admin token from and paste it here :

Finally, push the Data Project to Tinybird:

Recreating the project
tb push --push-deps --fixtures

** Processing ./datasources/events.datasource
** Processing ./datasources/top_products_view.datasource
** Processing ./datasources/products.datasource
** Processing ./datasources/current_events.datasource
** Processing ./pipes/events_current_date_pipe.pipe
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/top_products_agg.pipe
** Building dependencies
** Running products_join_by_id
** 'products_join_by_id' created
** Running current_events
** 'current_events' created
** Running events
** 'events' created
** Running products
** 'products' created
** Running top_products_view
** 'top_products_view' created
** Running products_join_by_id_pipe
** Materialized pipe 'products_join_by_id_pipe' using the Data Source 'products_join_by_id'
** 'products_join_by_id_pipe' created
** Running top_product_per_day
** Materialized pipe 'top_product_per_day' using the Data Source 'top_products_view'
** 'top_product_per_day' created
** Running events_current_date_pipe
** Materialized pipe 'events_current_date_pipe' using the Data Source 'current_events'
** 'events_current_date_pipe' created
** Running sales
** => Test endpoint at
** 'sales' created
** Running top_products_agg
** => Test endpoint at
** 'top_products_agg' created
** Running top_products_params
** => Test endpoint at
** 'top_products_params' created
** Running top_products
** => Test endpoint at
** 'top_products' created
** Pushing fixtures
** Warning: datasources/fixtures/products_join_by_id.ndjson file not found
** Warning: datasources/fixtures/current_events.ndjson file not found
** Checking ./datasources/events.datasource (appending 544.0 b)
**  OK
** Checking ./datasources/products.datasource (appending 134.0 b)
**  OK
** Warning: datasources/fixtures/top_products_view.ndjson file not found

One you have the Data Project deployed to a Workspace make sure you connect it to Git and push the CI/CD pipelines to the repository.

Custom deployments

Think of tb deploy as a helper that would make it possible to forget about deployments in the vast majority of cases.

Having said that, the complexity of data pipelines in a Data Project vary across projects and certain changes in a branch are not that “simple” to deploy. For those cases, the owner of the Git branch being merged is able to perform a custom deployment.

When to do a custom deployment?

  • Either you need to have full control of the sequence of commands required to deploy changes to the ephimeral Environments or the main one.

  • Or the default tb deploy reports an error and it’s not capable of doing the default deployment.

  • Or you need to perform some Data Operations before or after resources have been deployed.

To do a custom deployment follow these steps:

  • Edit the .tinyenv file at the root of your Data Project and increase the VERSION environment variable, following the semver notation. Let’s say you bump it from 0.0.0 to 0.0.1.

  • Create these files in the Data Project folder: deploy/0.0.1/ and deploy/0.0.1/ and make sure they have execution permissions chmod +x -R deploy/0.0.1/

  • Performing the custom deployment is as simple as writing the CLI commands you would run in your terminal to deploy the changes to the Environment or Workspace.

  • The CI and CD pipelines will find the and files and they’ll be run in CI and CD respectively.

That way you have full control on the deployment commands. At the same time you are contributing to the shared knowledge of your Data Project, since that custom deployment will be part of the Git repository.

Once the branch has been merged, on the next Pull Request, remember to bump the VERSION in the .tinyenv file, so the custom deployment for the previous changes are not executed with the changes in the new branch.

Find below some examples on how and when to use custom deployments, specially when a data migration is required.

Data migration paths

There are several cases in which you have to migrate data from one Data Source to another. The complexity of the migration varies depending on some factors, mainly if there’s streaming ingestion or not.

There are mainly three scenarios covered by the Iterating Data Sources guide:

  • I’m not in production

  • I’m in production but I can stop data ingestion

  • I’m in production and I cannot stop data ingestion

Let’s see how to cover some of the most common scenarios with custom deployments.

When working on custom deployments you might find useful the staging and production Workspaces deployment pattern as described in this guide

Practical examples

Example 1: Overwrite a Data Source

By deafult tb deploy does not overwrite Data Sources to avoid unintended deployments.

Certain changes in a Data Source that are not breaking changes are supported by the Tinybird APIs, for instance, adding a new column to a Data Source. Let’s see an example:

Edit the events.datasource Datafile to add a new new_column String column like this:

    this contains all the events produced by Kafka, there are 4 fixed columns
    plus a `json` column which contains the rest of the data for that event.
    See [documentation](url_for_docs) for the different events.

    `timestamp` DateTime,
    `product` String,
    `user_id` String,
    `action` String,
    `json` String,
    `new_column` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYear(timestamp)"

Now commit the change to a new git branch and create a Pull Request like this one

The CI pipeline fails in the deployment step with this error:

** Running events
** The description or schema of 'events' has changed.
**   -  ADD COLUMN `new_column` String
** Failed running ./datasources/events.datasource:
** Please confirm you want to apply the changes above y/N:
Error: Process completed with exit code 1.

As described in the Custom Deployments section, we need to provide the commands to make the deployment both in CI and CD. In this case, it is as simple as following the steps as decribed in this git commit, you just needs to use the --yes flag to overwrite the events.datasource.

Example 2: Overwrite a Materialized View

You want to overwrite a Materialized View when you are not changing the resulting schema but just the query used to materialize.

Overwriting a Materialized Pipe is supported by the tb deploy command. If you don’t have to perform any further data migration, then that’s the way to go. For instance, let’s add a new filter to the top_product_per_day.pipe:

NODE only_buy_events
    filters all the buy events


        toDate(timestamp) date,
        JSONExtractFloat(json, 'price') as price,
    FROM events

NODE top_per_day

        topKState(10)(product) top_10,
        sumState(price) total_sales
    from only_buy_events
    where date > now() - interval 30 day -- <- THIS IS THE CHANGE
    group by date, action

TYPE materialized
DATASOURCE top_products_view

Take a look at the commit with the new filter. In this case the default tb deploy command will overwrite top_product_per_day.pipe so new rows ingested in the events Data Source will be materialized only if they are less than 30 days old.

Now depending if we want to do some Data Operation or not we could go with a custom deployment like this one. For instance, imagine you want to apply the new filter and get rid of data older than 30 days. You would first tb deploy and then perform the delete operation.

Of course the commands required to perform the data operation might vary depending on the nature of the change in the Materialized View.

Example 3: Version a Materialized View with data migration

You want to version a Materialized View when there are some breaking changes that affect API Endpoints or when you made a change in a Materialized View that requires some complex data migration.

As an example let’s modify the top_product_per_day.pipe Materialized View to aggregate by user_id.

Let’s start by versioning both top_product_per_day.pipe and top_products_view.datasource

NODE only_buy_events
    filters all the buy events
    @@ -9,7 +9,8 @@ SQL >
        toDate(timestamp) date,
        JSONExtractFloat(json, 'price') as price,
    FROM events

    @@ -21,9 +22,10 @@ SQL >
        topKState(10)(product) top_10,
        sumState(price) total_sales,
    from only_buy_events
    group by date, action, user_id

TYPE materialized
DATASOURCE top_products_view

    `date` Date,
    `action` String,
    `user_id` String,
    `top_10` AggregateFunction(topK(10), String),
    `total_sales` AggregateFunction(sum, Float64)

ENGINE "AggregatingMergeTree"
ENGINE_SORTING_KEY "date, action, user_id"

When you commit those changes to a Git branch and create a Pull Request, there are two interesting things that happen in the CI pipeline.

First, two Datafiles are detected to have changed:

changed: top_product_per_day
changed: top_products_view

Also the Pipes using those resources are pushed to the Environment, so they make use the new version of the Materialized View and they can be tested for regressions.

** Building dependencies
** Running top_products_view => v2 (remote latest version: v1)
** 'top_products_view__v2' created
** Running top_product_per_day => v2 (remote latest version: v1)
** Materialized pipe 'top_product_per_day__v2' using the Data Source 'top_products_view__v2'
** Populating job url ***/v0/jobs/d7d8b5aa-306f-4cfd-a9f8-fac0d2b8ea48
** 'top_product_per_day__v2' created
** Running top_products_params
** Token read_token found, adding permissions
** => Test endpoint with:
** $ curl ***/v0/pipes/top_products_params.json?token=p.eyJ1IjogIjViNjdmNjg4LWZmYjktNDk2Mi1hNTczLTAwNjM5MTYxNDlmYiIsICJpZCI6ICIwYzNiMWU3Zi03NWFiLTQ4OTUtODBjOC1lMDEyOTA2NmJhNWYiLCAiaG9zdCI6ICJldV9zaGFyZWQifQ.nXG6hCJVo9fJOaTjM0cn5VttWNakBnxtmjEAypTO0ik
** 'top_products_params' created
** Running top_products_agg
** Token read_token found, adding permissions
** => Test endpoint with:
** $ curl ***/v0/pipes/top_products_agg.json?token=p.eyJ1IjogIjViNjdmNjg4LWZmYjktNDk2Mi1hNTczLTAwNjM5MTYxNDlmYiIsICJpZCI6ICIwYzNiMWU3Zi03NWFiLTQ4OTUtODBjOC1lMDEyOTA2NmJhNWYiLCAiaG9zdCI6ICJldV9zaGFyZWQifQ.nXG6hCJVo9fJOaTjM0cn5VttWNakBnxtmjEAypTO0ik
** 'top_products_agg' created
** Running top_products
** Token read_token found, adding permissions
** => Test endpoint with:
** $ curl ***/v0/pipes/top_products.json?token=p.eyJ1IjogIjViNjdmNjg4LWZmYjktNDk2Mi1hNTczLTAwNjM5MTYxNDlmYiIsICJpZCI6ICIwYzNiMWU3Zi03NWFiLTQ4OTUtODBjOC1lMDEyOTA2NmJhNWYiLCAiaG9zdCI6ICJldV9zaGFyZWQifQ.nXG6hCJVo9fJOaTjM0cn5VttWNakBnxtmjEAypTO0ik
** 'top_products' created
New release deployed: '78882650bbaefda891a7d41a2197a56d9dfddb79'

After that, the CI pipeline complain about regression tests failing:

==== Failures Detail ====

❌ top_products(coverage) - ***/v0/pipes/top_products.json?date_start=2020-04-24&date_end=2020-04-25&q=SELECT+%0A++date%2C%0A++count%28%29+total%0AFROM+top_products%0AGROUP+BY+date%0AHAVING+total+%3C+0%0A&cli_version=1.0.0b410+%28rev+145e3d7%29&pipe_checker=true

** 32.0 not less than 25 : Processed bytes has increased 32.0%
💡 Hint: Use `--assert-bytes-read-increase-percentage -1` if it's expected and want to skip the assert.

That’s good since you are changing the aggregation of the Materialized View used by the top_products endpoint and regression testing warns you about the API endpoint processing more data than the previous version.

At this point you have two options:

  • You can increase the VERSION number in the related pipes, so regression testing does not run over them. Then run a custom deployment to just deploy the changed files and not the related Pipes.

  • Ignore the regression with a --assert-bytes-read-increase-percentage -1 label as suggested in the 💡 Hint above.

For this example, let’s ignore the regression since the API endpoint interface did not change and there’s no need to create a new VERSION.

Once CI is green, we need to think how to bring these changes to the main Environment where data is being ingested and API endpoints are receiving requests. A typical approach is as follows:

  • Deploy the versioned resources first, in this case top_product_per_day and top_products_view. When they are deployed they are connected to the ingestion but disconnected from the API Endpoints, that’s the scenario we want to achieve.

  • Once the Materialized View is deployed it automatically starts materializing data that’s being ingested. Before connecting it to the API endpoints, data needs to be backfilled.

  • Optionally, once data is backfilled you want to perform some data quality check between current and previous version.

  • Finally you need to deploy the rest of API Endpoints depending on the resources changed so they start using the new versions.

Let’s go through this custom deployment using the .tinyenv and custom script described above. Bump the VERSION in .tinyenv to 0.0.1 and create deploy/0.0.1/

# deploy the versioned resources alone
tb push datasources/top_products_view.datasource
BACKFILL_TIME=$(date +"%Y-%m-%d %H:%M:%S")
tb push pipes/top_product_per_day.pipe
# backfill old data with a populate
tb pipe populate top_product_per_day__v2 --node top_per_day --sql-condition "timestamp < '$BACKFILL_TIME'" --wait
# do the data quality check, checking that a sum in top_products_view__v1 and top_products_view__v2 return the same value
diff=$(tb --no-version-warning sql "with (select sumMerge(total_sales) from top_products_view__v2) as new, (select sumMerge(total_sales) from top_products_view__v1) as old select old - new as diff" --format json | python -c "import sys, json; print(json.load(sys.stdin)['data'][0]['diff'])")
echo "Diff: $diff"

if [ $diff -eq 0 ]; then
    echo "Diff is equal."
    exit 0
    echo "Diff is not equal."
# deploy the depending API endpoints
tb deploy

To test this custom deployment you can go through a staging-production Workspace set up or test it manually in a dedicated Environment for that purpose.

Once deployment is validated, you can just merge the Pull Request and the script will run in the main Environment.

What to do in case of a failure? Since we are versioning resources you can “rollback” the deployment by removing the newly created resources top_products_view__v2 and top_product_per_day__v2.

What do to in other cases?

Please read the Iterating Data Sources guide to look for other common use cases and scenarios or reach us at and we’ll help you on the best deployment path given your use case.

What’s coming next

Reading the above guide, you may have realized that to deploy changes, especially those that involve data migrations, to your main Environment you have to:

  • Use Versioning when there are breaking changes.

  • Carefully craft your Data Sources iterations especially when there’s streaming ingestion.

  • Perform a series of controlled steps which become part of the “tribal knowledge” of your Data Project.

This guide does not solve all possible deployment cases yet, reach us at if you need help running a custom deployment.

We are working on a better way to deploy the changes of your Data Projects that will enable preview and rollback releases. These abilities will make it easier to control the life cycle of your Data Products and provide a clear path to iterate any resource. Stay tunned for more info about this.