Tinybird CLI¶
The Tinybird CLI allows you to use all the Tinybird functionality directly from the command line. Additionally, it includes several functions to create and manage data projects easily. It's used behind the scenes on Git and CI/CD workflows.
How to install¶
Option 1: Install it locally¶
You need to have installed Python 3 and pip:
Supported Python versions: 3.8, 3.9, 3.10, 3.11, 3.12
Example: creating a virtual environment for Python 3
python3 -m venv .venv source .venv/bin/activate
If you are not used to python virtual environments you can read this guide about venv.
Install tinybird-cli
pip install tinybird-cli
Option 2: Use a prebuilt docker image¶
Let's say your project is in projects/data
path:
Build local image setting your project path
docker run -v ~/projects/data:/mnt/data -it tinybirdco/tinybird-cli-docker cd mnt/data
Authenticate¶
The first step is to check everything works correctly and that you're able to authenticate:
Authenticate
tb auth -i ** List of available regions: [1] us-east (https://ui.us-east.tinybird.co) [2] eu (https://ui.tinybird.co) [0] Cancel Use region [1]: Copy the "admin your@email" token from from https://app.tinybird.co/tokens and paste it here: <pasted Token> ** Auth successful! ** Configuration written to .tinyb file, consider adding it to .gitignore
First choose the Tinybird region, note dedicated regions do not appear in the list.
It'll ask for the admin Token associated to your account (user@domain.com), you have to copy it from the Workspace Tokens page and paste it.
Note you can also pass the Token directly with the --token
flag:
Authenticate
tb auth --token <your Token> ** Auth successful! ** Configuration written to .tinyb file, consider adding it to .gitignore
It saves your credentials in the .tinyb
file in your current directory. Add it to .gitignore (or the ignore list in the SCM you use) because it contains Tinybird credentials.
The CLI tries it's best to auth you in the proper region for your account, but you may want to override this behavior. In that case, you need to provide the --host
flag with the corresponding URL for your region.
See the API Reference docs for the list of supported regions.
You can also view this list of available regions within the tool using the command tb auth ls
.
Quick intro¶
The CLI works indistinctly with CSV, NDJSON, and Parquet files.
Create a new project:
Initialize
tb init
Generate a Data Source file (we will explain this later) based on a sample CSV file and add a few lines:
Generate Data Source
$ tb datasource generate /tmp/sample.csv ** Generated datasources/sample.datasource ** => Run `tb push datasources/sample.datasource` to create it on the server ** => Add data with `tb datasource append sample /tmp/sample.csv` ** => Generated fixture datasources/fixtures/sample.csv
You can also generate a Data Source file from an NDJSON or Parquet file. It uses the Analyze API. behind the scenes, so it will guess and apply the jsonpath
for each column in the schema.
Push it to Tinybird:
Push Data Source
$ tb push datasources/sample.datasource ** Processing datasources/sample.datasource ** Building dependencies ** Creating sample ** not pushing fixtures
Append some data:
Append data
$ tb datasource append sample datasources/fixtures/sample.csv 🥚 starting import process 🐥 done
Query the data:
Query the data
$ tb sql "select count() from sample" Query took 0.000475 seconds, read 1 rows // 4.1 KB ----------- | count() | ----------- | 384 | -----------
Check the Data Source is in the Data Sources list:
List Data Sources
$ tb datasource ls name row_count size created at updated at ------------------------- ----------- ----------- -------------------------- -------------------------- sample 384 20k 2020-06-24 15:09:00.409266 2020-06-24 15:09:00.409266 madrid_traffic 87123456 1.5Gb 2019-07-02 10:40:03.840151 2019-07-02 10:40:03.840152 ...
Go to your Tinybird dashboard to check the Data Source is present there.
Data projects¶
A data project is a set of files that describes how your data should be stored, processed, and exposed through APIs.
In the same way you maintain source code files in a repository, use a CI, make deployments, run tests, etc., Tinybird provides a set of tools to work following a similar pattern but with data pipelines. In other words: the source code in your project would be the datafiles in Tinybird.
Following this approach, any data project can be managed with a list of text-based files that allow you to:
- Define how the data should flow, from the start (the schemas) to the end (the API).
- Manage your datafiles under version control.
- Use branches in your datafiles.
- Run tests.
- Deploy a data project like you'd deploy any other software application.
Let's see an example. Imagine an ecommerce site where we have events from users and a list of products with their attributes. Our purpose is to expose several API endpoints to return sales per day and top product per day.
The data project would look like this:
ecommerce_data_project/ datasources/ events.datasource products.datasource fixtures/ events.csv products.csv pipes/ top_product_per_day.pipe endpoints/ sales.pipe top_products.pipe
Every file in this folder maps to a Data Source or a Pipe in Tinybird. You can create a project from scratch with tb init
, but in this case let's assume it's already created and stored in a GitHub repository.
Uploading the project¶
Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project.git cd ecommerce_data_project
Refer to the how to install section to connect the ecommerce_data_project
with your Tinybird account.
You can push the whole project to your Tinybird account to check everything is fine. The tb push
command uploads the data to Tinybird, but previously it checks the project dependencies and the SQL syntax, between others. In this case, we use the --push-deps
flag to push everything:
Push dependencies
$ tb push --push-deps ** Processing ./datasources/events.datasource ** Processing ./datasources/products.datasource ** Processing ./pipes/top_product_per_day.pipe ** Processing ./endpoints/top_products_params.pipe ** Processing ./endpoints/sales.pipe ** Processing ./endpoints/top_products.pipe ** Building dependencies ** Creating products ** Creating events ** Creating products_join_by_id ** Creating top_product_per_day ** Creating sales ** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json ** Creating products_join_by_id_pipe ** Creating top_products_params ** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json ** Creating top_products ** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json ** not pushing fixtures
Once it finishes, the endpoints defined in our project (sales
and top_products
) will be available and we can start pushing data to the different Data Sources. The project is ready.
Now, let's go through the different files in the project in order to understand how to deal with them individually.
Define Data Sources¶
Data Sources define how your data is going to be stored. You can add data to these Data Sources using Data Sources API.
Each Data Source is defined by a schema and other properties we will explain later (more on this in the datafiles docs).
Let's see event.datasource
:
DESCRIPTION > # Events from users This contains all the events produced by Kafka, there are 4 fixed columns. plus a `json` column which contains the rest of the data for that event. See [documentation](url_for_docs) for the different events. SCHEMA > timestamp DateTime, product String, user_id String, action String json String ENGINE MergeTree ENGINE_SORTING_KEY timestamp
As we can see, there are three main sections:
- A general description (using markdown in this case).
- The schema.
- How the data is sorted. In this case, the access pattern is most of the time by the
timestamp
column. If noSORTING_KEY
is set, Tinybird picks one by default, date or datetime columns most of the time.
Now, let's push the Data Source:
Push the events Data Source
$ tb push datasources/events.datasource ** Processing datasources/events.datasource ** Building dependencies ** Creating events ** not pushing fixtures
You cannot override Data Sources. If you try to push a Data Source that already exists in your account you'll get an output saying: events already exists, skipping
. If you need to override the Data Source, remove it or upload a new one with a different name.
Define data Pipes¶
You usually don't use the data as it comes in. For example, in this project, we are dealing with Kafka events so we could be using the events
Data Source but generating a live Materialized View of that table is better.
For this purpose, we have pipes. Let's see how to create a data Pipe that transforms the data as it's inserted. This is the content of pipes/top_product_per_day.pipe
.
NODE only_buy_events DESCRIPTION > filters all the buy events SQL > SELECT toDate(timestamp) date, product, JSONExtractFloat(json, 'price') AS price FROM events WHERE action = 'buy' NODE top_per_day SQL > SELECT date, topKState(10)(product) top_10, sumState(price) total_sales FROM only_buy_events GROUP BY date TYPE materialized DATASOURCE top_per_day_mv ENGINE AggregatingMergeTree ENGINE_SORTING_KEY date
Each Pipe can have one or more Nodes. In this Pipe, as we can see, we're defining two Nodes: only_buy_events
and top_per_day
.
- The first one filters "buy" events and extracts some data from the
json
column. - The second one runs the aggregation.
The pattern to define a "pipeline" is simple: use NODE
to start a new Node and then use SQL >
to define the SQL for that Node. Notice you can use other Nodes inside the SQL. In this case, the second Node uses the first one only_buy_events
.
Pushing a Pipe is the same as pushing a Data Source:
Populate
$ tb push pipes/top_product_per_day.pipe --populate ** Processing pipes/top_product_per_day.pipe ** Building dependencies ** Creating top_product_per_day ** Populate job url https://api.tinybird.co/v0/jobs/c7819921-aca0-4424-98c5-9223ca2475c3 ** not pushing fixtures
In this case, it's a materialized Node. If you want to populate with the existing data in events
table you can use --populate
flag.
When using the --populate
flag you get a job URL. Data population is done in background, so you can check the status of the job by checking the URL provided.
Define API Endpoints¶
API Endpoints are the way you expose the data to be consumed. They look pretty similar to Pipes and, well, they are actually Pipes that transform the data but add an extra step that exposes the data.
Let's look into endpoints/top_products.pipe
:
NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date > today() - interval 7 day GROUP BY date
The syntax is exactly the same we're using in the data transformation Pipes, but now, the results can be accessed through the endpoint https://api.tinybird.co/v0/top_products.json?token=TOKEN
.
When you push an endpoint a TOKEN with PIPE:READ permissions is automatically created. You can see it from the Tokens UI or directly from the CLI with the command tb pipe token_read <endpoint_name>
.
Alternatively you can use the TOKEN token_name READ
command to automatically create a Token with name token_name
with READ permissions over the endpoint or add READ permissions to the existing token_name
over the endpoint. This is a very convenient way of handling tokens on your data project.
TOKEN public_read_token READ NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date > today() - interval 7 day GROUP BY date
Let's push it now:
Push the top products Pipe
$ tb push endpoints/top_products.pipe ** Processing endpoints/top_products.pipe ** Token public_read_token not found, creating one ** Building dependencies ** Creating top_products ** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json?token=****** ** not pushing fixtures
Note the Token public_read_token
was created automatically and it's provided in the test URL.
It's possible to add parameters to any endpoint. For example, let's parametrize the dates to be able to filter the data between two dates:
NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > % SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date between {{Date(start)}} AND {{Date(end)}} GRUP BY date
Now, the endpoint can receive start
and end
parameters: https://api.tinybird.co/v0/top_products.json?start=2018-09-07&end=2018-09-17&token=TOKEN
.
You can print the results from the CLI using the pipe data
command. For instance, for the example above:
Print the results of the top products endpoint
$ tb pipe data top_products --start '2018-09-07' --end '2018-09-17' --format CSV "date","top_10" "2021-04-28","['sku_0001','sku_0004','sku_0003','sku_0002']"
Check tb pipe data --help
for more options.
The supported types for the parameters are: Boolean
, DateTime64
, DateTime
, Date
, Float32
, Float64
, Int
, Integer
, Int8
, Int16
, UInt8
, UInt16
, UInt32
, Int32
, Int64
, UInt64
, Int128
, UInt128
, Int256
, UInt256
, Symbol
, String
.
Note that for the parameters templating to work you need to start your NODE SQL definition by the character %
.
Override an endpoint or a data Pipe¶
When working on a project, you usually need to push several versions of the same file. You can override a Pipe that has already been pushed using the --force
flag.
Override the Pipe
$ tb push endpoints/top_products_params.pipe --force ** Processing endpoints/top_products_params.pipe ** building dependencies ** Creating op_products_params current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2020-01-01&end=2010-01-01 new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2020-01-01&end=2010-01-01 ... ok current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2010-01-01&end=2021-01-01 new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2010-01-01&end=2021-01-01 ... ok ** => Test endpoint at https://api.tinybird.co/v0/pipes/op_products_params.json
It will override the endpoint. If the endpoint has been called before, it runs regression tests with the most frequent requests. If the new version doesn't return the same data, then it's not pushed. You can see in the example how to run all the requests tested (up to 10).
However, it's possible to force the push without running the checks using the --no-check
flag:
Force override
$ tb push endpoints/top_products_params.pipe --force --no-check ** Processing endpoints/top_products_params.pipe ** Building dependencies ** Creating top_products_params ** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
This is a security check to avoid breaking production environments. It's better to add an extra parameter than to be sorry
Downloading datafiles from Tinybird¶
Sometimes you use the user interface to create pipes, and then you want to store them in your data project. It's possible to download datafiles using the pull
command:
Pull a specific file
$ tb pull --match endpoint_im_working_on
It will download the endpoint_im_working_on.pipe
directly to the current folder.
Members management¶
You can manage Workspace members using the Web UI or the CLI. For the latter, use the workspace members
commands.
You can add members:
Adding users to the current workspace
$ tb workspace members add "user1@example.com,user2@example.com,user3@example.com"
Remove members:
Removing members from current workspace
$ tb workspace members rm user3@example.com
And list them:
Listing current Workspace members
$ tb workspace members ls --------------------- | email | --------------------- | user1@example.com | | user2@example.com | ---------------------
You can also manage roles:
Removing members from current workspace
$ tb workspace members set-role admin user@example.com
Removing members from current workspace
$ tb workspace members set-role guest user@example.com
Integrated help¶
Once you've installed the CLI you can access the integrated help:
Integrated help
$ tb --help Usage: tb [OPTIONS] COMMAND [ARGS]... Options: --debug / --no-debug Prints internal representation, can be combined with any command to get more information. --token TEXT Use auth token, defaults to TB_TOKEN envvar, then to the .tinyb file --host TEXT Use custom host, defaults to TB_HOST envvar, then to https://api.tinybird.co --version-warning / --no-version-warning Don't print version warning message if there's a new available version. You can use TB_VERSION_WARNING envvar --version Show the version and exit. -h, --help Show this message and exit. Commands: auth Configure auth check Check file syntax connection Connection commands datasource Data sources commands dependencies Print all data sources dependencies diff Diffs a local datafiles to the corresponding remote files in the workspace. fmt Formats a .datasource, .pipe or .incl file init Initialize folder layout job Jobs commands materialize Given a local Pipe datafile (.pipe) and a node name it generates the target Data Source and materialized Pipe ready to be pushed and guides you through the process to create the materialized view pipe Pipes commands prompt Learn how to include info about the CLI in your shell PROMPT pull Retrieve latest versions for project files from Tinybird push Push files to Tinybird sql Run SQL query over data sources and pipes test Test commands token Token commands workspace Workspace commands
And you can do the same for every available command, so you don't need to know every detail for every command:
Integrated command help
$ tb datasource --help Usage: tb datasource [OPTIONS] COMMAND [ARGS]... Data sources commands Options: --help Show this message and exit. Commands: analyze Analyze a URL or a file before creating a new data source. append Create a data source from a URL, local file or a connector. connect Create a new datasource from an existing connection. delete Delete rows from a datasource. generate Generates a data source file based on a sample CSV, NDJSON or Parquet file from local disk or url. ls List data sources replace Replaces the data in a data source from a URL, local file or... rm Delete a data source. share Share a datasource. truncate Truncate a data source.
Supported platforms¶
It supports Linux and macOS > 10.14.
Configure the shell PROMPT¶
When working with the Tinybird CLI from the command line it's useful to have the current Workspace in the command line PROMPT, in the same way you have your active Git branch for instance.
The Tinybird CLI stores the credentials in a local file called .tinyb
, so it's relatively easy extract from there the information needed for the PROMPT and customize it to your needs.
You can copy this function to your shell config file (~/.zshrc, ~/.bashrc, etc.) and include it in your PROMPT:
Parse the .tinyb file to use the output in the PROMPT
prompt_tb() { if [ -e ".tinyb" ]; then TB_CHAR=$'\U1F423' branch_name=`grep '"name":' .tinyb | cut -d : -f 2 | cut -d '"' -f 2` region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 2 | cut -d : -f 1` if [ "$region" = "tinybird" ]; then region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 1` fi TB_BRANCH="${TB_CHAR}tb:${region}=>${branch_name}" else TB_BRANCH='' fi echo $TB_BRANCH }
Once the function is available, make the output visible on the PROMPT depends on your shell installation, for instance, for the case of zsh
this should work in most cases:
Include info of the Tinybird CLI in the zsh PROMPT
echo 'export PROMPT="' $PS1 ' $(prompt_tb)"' >> ~/.zshrc
Once properly configured, and you are in the root directory of a data project (the one with the .tinyb file), you'll see the Tinybird region and Workspace in your PROMPT:
CLI telemetry¶
Since version 1.0.0b272, the Tinybird CLI includes telemetry. The feature collects the use of the CLI commands and information about exceptions and crashes anonymously and sends it only to Tinybird. Telemetry data helps Tinybird understand how the commands are used so we can improve our command line experience. Information on undesired outputs helps the team resolve potential issues and fix bugs.
On each tb
execution, we collect information about your system, your Python environment, the CLI version installed and the command you ran.
How to opt out¶
CLI telemetry feature is enabled by default. To opt out of the telemetry feature, set the TB_CLI_TELEMETRY_OPTOUT
environment variable to 1 or true.