Common Use Cases¶
This document shows some common use cases where the Commmand Line Interface (CLI) can help you on your day to day workflow.
We recommend you to take a look to the quick intro and make use of the integrated help while you work on the CLI.
Data projects and source control repositories¶
The Command Line Tool (CLI) works better when your data project is connected to a source control repository (such as git).
That way all your data source schemas, exploration pipes, transformations and endpoints can be managed as source code, making it possible for several developers to collaborate on the same data project, trace endpoints to changes through the commit history or managing multiple Environments and versions of your endpoints seamlessly.
When asking for support, the Tinybird team would be able to work with you on the same code base, if that’s what you want.
Some of the next use cases assume your data project is connected to a source control repository (SCR). Take into account that’s something you have to manage yourself. The CLI just manages your data files and the connection to your Tinybird account, but you have to push code to the SCR separately.
When to use versions¶
Specially when any of your endpoints is in production, increase the version
number when:
You change the schema of a data source
You change the output of an endpoint
See the section working with versions for more information.
That way you can keep both the old and new version working. You can use the --force
flag to overwrite pipes with the same version. Nevertheless, you cannot overwrite data sources.
For the case of data sources, if you need to change the schema or the engine settings, you have to version it. Otherwise you’ll have to delete / recreate it which will force you to delete any endpoint using it as well.
When adding new columns, it’s not strictly necessary to add a new data source version in terms of functionality. With Tinybird, it’s backwards compatible adding new columns to a Data Source schema.
Using workspaces as Environments¶
When you are developing, you often want to create a new development branch in your source control repository and push your data sources and pipes to your Tinybird account. We recommend creating multiple workspaces for each environment to avoid colliding with your production ones.
You can create a workspace for each of the different Environments you want in Tinybird: production, staging, etc. You can also create separated workspaces (personal or by project) if you want to iterate before bringing the changes to your work Environments. Tinybird allows you to organize your workspaces to the best of your needs. Once you have decided on the structure, you can share datasources between different workspaces. For example, you can have all your production data in a workspace with limited access and share it with other Environments to avoid direct changes to your production data, or define the strategy that fits you better.
Downloading all the pipes and data sources from your account¶
There are two ways you can start working with the CLI, either you start a new data project from scratch or if you already have some data and endpoints in your Tinybird account, you pull it to your local disk to continue working from there.
For this second option, use the --match
flag to filter pipes or data sources containing the string passed as parameter.
For instance, to pull all the files named project
:
tb pull --match project
[D] writing project.datasource(demo)
[D] writing project_geoindex.datasource(demo)
[D] writing project_geoindex_pipe.pipe(demo)
[D] writing project_agg.pipe(demo)
[D] writing project_agg_API_endpoint_request_log_pipe_3379.pipe(demo)
[D] writing project_exploration.pipe(demo)
[D] writing project_moving_avg.pipe(demo)
Once the files are pulled you can diff
or push
the changes to your source control repository and continue working from the command line.
When you pull data sources or pipes, your data is not downloaded, just the data source schemas and pipes definition, so they can be replicated easily.
The pull command does not preserve the directory structure, so all your data files will be downloaded to your current directory.
Pushing the whole data project¶
If you want to push the whole project you can run:
tb push --push-deps
Pushing a pipe with all its dependencies¶
tb push pipes/mypipe.pipe --push-deps
Adding a new column to a data source¶
Data Source schemas are mostly immutable, but you have the possibility to append new columns at the end of an existing Data Source with an Engine from the MergeTree Family. If you want to change columns, add columns in other positions or modify the engine, you need to first create a new version of the Data Source with the modified schema, then ingest the data, and finally point the pipes to this new endpoint. To force a pipe replacement use the --force
flag when pushing it.
Append new columns to an existing Data Source¶
Let’s suppose that you have a Data Source defined like this and that this Data Source has been already pushed to Tinybird:
VERSION 1
SCHEMA >
`test` Int16,
`local_date` Date,
`test3` Int64
If you want to append a new column, you will need to change the *.datasource
file with the new column new_column
. You can append as many columns as you need at the same time:
VERSION 1
SCHEMA >
`test` Int16,
`local_date` Date,
`test3` Int64,
`new_column` Int64
Remember that the only kind of alter column operation supported is appending new columns to an existing Data Source and that the engine of that Data Source must be of the MergeTree family.
After that, you will need to execute tb push my_datasource.datasource --force
and confirm the addition of the column(s). The --force
parameter is required for this kind of operation.
Existing imports will continue working once the new columns are added, even if those imports don’t carry values for the added columns. In those cases, the new columns will just contain empty values like 0
for numeric values or ''
for Strings, or if defined, the default values in the schema.
Create a new version of the Data Source to make the rest of the add/change columns operations that you may need¶
To create a new version of a Data Source, just increase the VERSION
number inside the Data Source file, taking into account VERSION
should be an integer number.
VERSION 1
SCHEMA >
`test` Int16,
`local_date` Date,
`test3` Int64,
`new_column` Int64
In this case we are creating a VERSION
1 of our test.datasource adding a new_column
.
Once you have increased the VERSION
number, you just have to push the Data Source, ingest new data and --force
push any dependent pipe where you want to use the new Data Source version.
The version number is appended to the Data Source or pipe name, so if you have a pipe called this_is_my_pipe
when you push a VERSION
1 of it, the new version we’ll be pushed as this_is_my_pipe__v1
How to create materialized views¶
Materialized Views allow to transform the data from an origin
data source to a destination
data source. There are several use cases where materialized views are really handy and can make a difference in the response times of your analyses, to name a few:
Denormalize several normalized tables into one via a
JOIN
Transform data using an optimized
ENGINE
for a concrete analysisTransform your source data on the fly as you ingest data in your origin data source
One important thing to know is that materialized views are live views of your origin data source. Any time you append
or replace
data to your origin data source, all the destination data sources crated as materialized views are properly synced. It means you don’t have to worry about costly re-sync processes.
Let’s say you have an origin
data source (my_origin.datasource
) like this one:
VERSION 0
SCHEMA >
`id` Int16,
`local_date` Date,
`name` String,
`count` Int64
And you need an optimized version of this data source that pre-aggregates the count
for each ID. You should create a new data source that uses a SimpleAggregateFunction
, which will be a materialized view.
First define the destination
data source (my_destination.datasource
):
VERSION 0
SCHEMA >
`id` Int16,
`local_date` Date,
`name` String,
`total_count` SimpleAggregateFunction(sum, UInt64)
ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(local_date)"
ENGINE_SORTING_KEY "local_date,id"
And then you’ll write a transformation pipe (my_transformation.pipe
) like this:
VERSION 0
NODE transformation_node
SQL >
SELECT
id,
local_date,
name,
sum(count) as total_count
FROM
my_origin
GROUP BY
id,
local_date,
name
TYPE materialized
DATASOURCE my_destination
Once you have the origin and destination data sources defined and the transformation pipe you can push them:
tb push my_origin.datasource
tb push my_destination.datasource
tb push my_transformation.pipe --populate
Any time you ingest data into my_origin
, the data in my_destination
will be automatically updated.
Guided process: tb materialize¶
Alternatively you can use the tb materialize
command to generate the target .datasource
file needed to push a new materialized view.
The goal of the command is to guide you through all the needed steps to create a materialized view. Given a pipe, tb materialize
:
Asks you what node of the pipe you want to materialize. By default, it selects the last one in the pipe. If there’s only one, it’s automatically selected, skipping asking you. From the selected query, the commands guesses the best parameters for the following steps.
It warns you of the errors the query has, if any, that prevents it from materializing. If everything is correct, it continues.
It creates the target Data Source file that will receive the results of the materialization, setting default engine parameters. If you are materializing an aggregation you should make sure the
ENGINE_SORTING_KEY
columns in the.datasource
file are in the right order you are going to filter the table.It modifies the query to set up the materialization settings and pushes the pipe to create the materialization. You can skip the pipe checks if needed as well.
It asks you if you want to populate the materialized view with existing data. If you select to populate, it will ask you if you want to use a subset of the data (to populate faster and check if everything is correct) or fully populate with all existing data.
It creates a backup file of the pipe (adding the
_bak
suffix to the file extension). We do this because the command modifies the file. It completes the aggregate functions with-State
combinators (see docs) when corresponding and adds the target Data Source name. We keep the backup file in case you want to recover the original query.
The command generates and modifies the files involved in the materialization. If you run into an error or you need to modify something in the materialization, you will have already the files, that can serve you as a better starting point than creating everything on your own. We recommend you to double check the generated datafiles.
For the same case as above:
Given the same origin
datasource (my_origin.datasource
) we need only to write the query that will materialize:
NODE transformation_node
SQL >
SELECT
id,
local_date,
name,
sum(count) as total_count
FROM
my_origin
GROUP BY
id,
local_date,
name
See that this time we don’t need to define and create first the destination Data Source, only the query.
We then run tb materialize
. As you can see, the command will guide through all the process.
❯ tb materialize pipes/my_transformation.pipe mv_transformation
This feature is under development and released as a beta version. You can report any feedback (bugs, feature requests, etc.) to support@tinybird.co
** Pushing the pipe my_transformation to your workspace to analyze it
** Running my_transformation
** 'my_transformation' created
** Analyzing the pipe my_transformation
** Created backup file with name my_transformation.pipe_bak
** => Saved local file mv_transformation.datasource
Delete the Data Source from the workspace and push mv_transformation.datasource again? [y/N]: y
** Running mv_transformation
** 'mv_transformation' created
** => Saved local file my_transformation.pipe
[1] Push my_transformation.pipe and override if it exists
[2] Push my_transformation.pipe and override if it exists with no checks
=> Choose one of the above options to continue... [1]: 1
** Running my_transformation
** Materialized node 'transformation_node' using the Data Source 'mv_transformation'
** 'my_transformation' created
Do you want to populate the materialized view with existing data? (It will truncate the materialized view before population starts) [y/N]: y
[1] Partially populate: Uses a 10 percent subset of the data (quicker, useful to validate the materialized view)
[2] Fully populate
=> Choose one of the above options to continue... [1]: 2
Populating [████████████████████████████████████] 100%
** Materialized view mv_transformation created!
** Materialized views work as insert triggers, anytime you append data into the source Data Source the materialized node query will be triggered.
After the command is successfully run, you’ll end up with:
- A new datasource and its corresponding .datasource
file: mv_transformation
- A modified pipe my_transformation.pipe
file, adapted to materialize from the chosen query.
- The materialized view is properly created in the workspace, next time you insert data in my_origin
the transformation_node
SQL will be triggered, so data will be aggregated on the fly in the mv_transformation
Materialized View.
How to force populate materialized views¶
Sometimes you want to force populating a materialized view, most likely because you changed the transformation in the pipe and you want the data from the origin data source to be re-ingested.
tb push my_may_view_pipe.pipe --force --populate
You’ll get as a response a Jobs API job_url
so you can check its progress and status.
Debug mode¶
Specially when you work with pipes that make use of several versions of different data sources, you might need to double check which version of which data source the pipe is pointing at before you push it to your Tinybird account.
To do so, use the --dry-run --debug
flags like this:
tb push my_pipe.pipe --dry-run --debug
Once you’ve validated the content of the pipe, you can just push your pipe normally.
Automatic regression tests for your API endpoints¶
Any time you --force
push a pipe which has a public endpoint that has received requests, some automatic regression tests are executed.
What the CLI does is checking for the top ten requests, if the previous version of the endpoint returns the same data as the version you are pushing. This can help you to validate if you are introducing a regression in your API.
Other times, you are consciously --force
pushing a new version which returns different data, in that case you can avoid the regression tests with the --no-check
flag
tb push my_may_view_pipe.pipe --force --no-check
When pushing a Pipe with a public endpoint, the endpoint will be maintained based on the node name. If the existing endpoint node is renamed, the last node of the Pipe will be recreated as an endpoint. The latter option is not an atomic operation: the endpoint will be down for a few instants while the new endpoint is created.