Tinybird CLI

The Tinybird CLI allows you to use all the Tinybird functionality directly from the command line. Additionally, it includes several functions to create and manage data projects easily. It's used behind the scenes on Git and CI/CD workflows.

How to install

Option 1: Install it locally

You need to have installed Python 3 and pip:

Supported Python versions: 3.8, 3.9, 3.10, 3.11

Example: creating a virtual environment for Python 3
python3 -m venv .venv
source .venv/bin/activate

If you are not used to python virtual environments you can read this guide about venv.

Install tinybird-cli
pip install tinybird-cli

Option 2: Use a prebuilt docker image

Let's say your project is in projects/data path:

Build local image setting your project path
docker run -v ~/projects/data:/mnt/data -it tinybirdco/tinybird-cli-docker
cd mnt/data

Authenticate

The first step is to check everything works correctly and that you're able to authenticate:

Authenticate
tb auth -i

** List of available regions:
[1] us-east (https://ui.us-east.tinybird.co)
[2] eu (https://ui.tinybird.co)
[0] Cancel

Use region [1]:
Copy the admin token from https://ui.us-east.tinybird.co/tokens and paste it here: <pasted token>
** Auth successful!
** Configuration written to .tinyb file, consider adding it to .gitignore

First choose the Tinybird region, note dedicated regions do not appear in the list.

It'll ask for the admin token associated to your account (user@domain.com), you have to copy it from the workspace tokens page and paste it.

Note you can also pass the token directly with the --token flag:

Authenticate
tb auth --token <your token>
** Auth successful!
** Configuration written to .tinyb file, consider adding it to .gitignore

It saves your credentials in the .tinyb file in your current directory. Add it to .gitignore (or the ignore list in the SCM you use) because it contains Tinybird credentials.

The CLI tries it's best to auth you in the proper region for your account, but you may want to override this behavior. In that case, you need to provide the --host flag with the corresponding URL for your region.

We now currently have:

You can always view the up to date list of available regions within the tool using the command tb auth ls.

Quick intro

The CLI works indistinctly with CSV, NDJSON, and Parquet files.

Create a new project:

Initialize
tb init

Generate a Data Source file (we will explain this later) based on a sample CSV file and add a few lines:

Generate Data Source
$ tb datasource generate /tmp/sample.csv
** Generated datasources/sample.datasource
**   => Run `tb push datasources/sample.datasource` to create it on the server
**   => Add data with `tb datasource append sample /tmp/sample.csv`
**   => Generated fixture datasources/fixtures/sample.csv

You can also generate a Data Source file from an NDJSON or Parquet file. It uses the Analyze API. behind the scenes, so it will guess and apply the jsonpath for each column in the schema.

Push it to Tinybird:

Push Data Source
$ tb push datasources/sample.datasource
** Processing datasources/sample.datasource
** Building dependencies
** Creating sample
** not pushing fixtures

Append some data:

Append data
$ tb datasource append sample datasources/fixtures/sample.csv
🥚 starting import process
🐥 done

Query the data:

Query the data
$ tb sql "select count() from sample"

Query took 0.000475 seconds, read 1 rows // 4.1 KB

-----------
| count() |
-----------
|     384 |
-----------

Check the Data Source is in the Data Sources list:

List Data Sources
$ tb datasource ls

    name                    row_count    size         created at                  updated at
-------------------------  -----------  -----------  --------------------------  --------------------------
sample                             384     20k       2020-06-24 15:09:00.409266  2020-06-24 15:09:00.409266
madrid_traffic                87123456     1.5Gb     2019-07-02 10:40:03.840151  2019-07-02 10:40:03.840152
...

Go to your Tinybird dashboard to check the Data Source is present there.

Data projects

A data project is a set of files that describes how your data should be stored, processed, and exposed through APIs.

In the same way we maintain source code files in a repository, use a CI, make deployments, run tests, etc., Tinybird provides a set of tools to work following a similar pattern but with data pipelines. In other words: the source code in your project would be the data files in Tinybird.

Following this approach, any data project can be managed with a list of text-based files that allow you to:

  • Define how the data should flow, from the start (the schemas) to the end (the API).
  • Manage your data files under version control.
  • Use branches in your data files.
  • Run tests.
  • Deploy a data project like you'd deploy any other software application.

Let's see an example. Imagine an ecommerce site where we have events from users and a list of products with their attributes. Our purpose is to expose several API endpoints to return sales per day and top product per day.

The data project would look like this:

ecommerce_data_project/
    datasources/
        events.datasource
        products.datasource
        fixtures/
            events.csv
            products.csv
    pipes/
        top_product_per_day.pipe

    endpoints/
        sales.pipe
        top_products.pipe

Every file in this folder maps to a Data Source or a Pipe in Tinybird. You can create a project from scratch with tb init, but in this case let's assume it's already created and stored in a GitHub repository.

Uploading the project

Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project.git
cd ecommerce_data_project

Refer to the how to install section to connect the ecommerce_data_project with your Tinybird account.

You can push the whole project to your Tinybird account to check everything is fine. The tb push command uploads the data to Tinybird, but previously it checks the project dependencies and the SQL syntax, between others. In this case, we use the --push-deps flag to push everything:

Push dependencies
$ tb push --push-deps
** Processing ./datasources/events.datasource
** Processing ./datasources/products.datasource
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products.pipe
** Building dependencies
** Creating products
** Creating events
** Creating products_join_by_id
** Creating top_product_per_day
** Creating sales
** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json
** Creating products_join_by_id_pipe
** Creating top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
** Creating top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** not pushing fixtures

Once it finishes, the endpoints defined in our project (sales and top_products) will be available and we can start pushing data to the different Data Sources. The project is ready.

Now, let's go through the different files in the project in order to understand how to deal with them individually.

Define Data Sources

Data sources define how your data is going to be stored. You can add data to these Data Sources using Data Sources API.

Each Data Source is defined by a schema and other properties we will explain later (more on this in the Datafile reference).

Let's see event.datasource:

DESCRIPTION >
    # Events from users
    This contains all the events produced by Kafka, there are 4 fixed columns.
    plus a `json` column which contains the rest of the data for that event.
    See [documentation](url_for_docs) for the different events.

SCHEMA >
    timestamp DateTime,
    product String,
    user_id String,
    action String
    json String

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp

As we can see, there are three main sections:

  • A general description (using markdown in this case).
  • The schema.
  • How the data is sorted. In this case, the access pattern is most of the time by the timestamp column. If no SORTING_KEY is set, Tinybird picks one by default, date or datetime columns most of the time.

Now, let's push the Data Source:

Push the events Data Source
$ tb push datasources/events.datasource
** Processing datasources/events.datasource
** Building dependencies
** Creating events
** not pushing fixtures

You cannot override Data Sources. If you try to push a Data Source that already exists in your account you'll get an output saying: events already exists, skipping. If you need to override the Data Source, remove it or upload a new one with a different name.

Define data pipes

You usually don't use the data as it comes in. For example, in this project, we are dealing with Kafka events so we could be using the events Data Source but generating a live materialized view of that table is better.

For this purpose, we have pipes. Let's see how to create a data pipe that transforms the data as it's inserted. This is the content of pipes/top_product_per_day.pipe.

NODE only_buy_events
DESCRIPTION >
    filters all the buy events

SQL >
    SELECT
        toDate(timestamp) date,
        product,
        JSONExtractFloat(json, 'price') AS price
    FROM events
    WHERE action = 'buy'


NODE top_per_day
SQL >
   SELECT date,
          topKState(10)(product) top_10,
          sumState(price) total_sales
    FROM only_buy_events
    GROUP BY date

TYPE materialized
DATASOURCE top_per_day_mv
ENGINE AggregatingMergeTree
ENGINE_SORTING_KEY date

Each pipe can have one or more nodes. In this pipe, as we can see, we're defining two nodes: only_buy_events and top_per_day.

  • The first one filters "buy" events and extracts some data from the json column.
  • The second one runs the aggregation.

The pattern to define a pipeline is simple: use NODE to start a new node and then use SQL > to define the SQL for that node. Notice you can use other nodes inside the SQL. In this case, the second node uses the first one only_buy_events.

Pushing a pipe is the same as pushing a Data Source:

Populate
$ tb push pipes/top_product_per_day.pipe --populate
** Processing pipes/top_product_per_day.pipe
** Building dependencies
** Creating top_product_per_day
** Populate job url https://api.tinybird.co/v0/jobs/c7819921-aca0-4424-98c5-9223ca2475c3
** not pushing fixtures

In this case, it's a materialized node. If you want to populate with the existing data in events table you can use --populate flag.

When using the --populate flag you get a job URL. Data population is done in background, so you can check the status of the job by checking the URL provided.

Define endpoints

Endpoints are the way you expose the data to be consumed. They look pretty similar to pipes and, well, they are actually pipes that transform the data but add an extra step that exposes the data.

Let's look into endpoints/top_products.pipe:

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date > today() - interval 7 day
    GROUP BY date

The syntax is exactly the same we're using in the data transformation pipes, but now, the results can be accessed through the endpoint https://api.tinybird.co/v0/top_products.json?token=TOKEN.

When you push an endpoint a TOKEN with PIPE:READ permissions is automatically created. You can see it from the tokens UI or directly from the CLI with the command tb pipe token_read <endpoint_name>.

Alternatively you can use the TOKEN token_name READ command to automatically create a token with name token_name with READ permissions over the endpoint or add READ permissions to the existing token_name over the endpoint. This is a very convenient way of handling tokens on your data project.

TOKEN public_read_token READ

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date > today() - interval 7 day
    GROUP BY date

Let's push it now:

Push the top products pipe
$ tb push endpoints/top_products.pipe
** Processing endpoints/top_products.pipe
** Token public_read_token not found, creating one
** Building dependencies
** Creating top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json?token=******
** not pushing fixtures

Note the token public_read_token was created automatically and it's provided in the test URL.

It's possible to add parameters to any endpoint. For example, let's parametrize the dates to be able to filter the data between two dates:

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    %
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date between {{Date(start)}} AND {{Date(end)}}
    GRUP BY date

Now, the endpoint can receive start and end parameters: https://api.tinybird.co/v0/top_products.json?start=2018-09-07&end=2018-09-17&token=TOKEN.

You can print the results from the CLI using the pipe data command. For instance, for the example above:

Print the results of the top products endpoint
$ tb pipe data top_products --start '2018-09-07' --end '2018-09-17' --format CSV
"date","top_10"
"2021-04-28","['sku_0001','sku_0004','sku_0003','sku_0002']"

Check tb pipe data --help for more options.

The supported types for the parameters are: Boolean, DateTime64, DateTime, Date, Float32, Float64, Int, Integer, Int8, Int16, UInt8, UInt16, UInt32, Int32, Int64, UInt64, Int128, UInt128, Int256, UInt256, Symbol, String.

Note that for the parameters templating to work you need to start your NODE SQL definition by the character %.

Override an endpoint or a data pipe

When working on a project, you usually need to push several versions of the same file. You can override a pipe that has already been pushed using the --force flag.

Override the pipe
$ tb push endpoints/top_products_params.pipe --force

** Processing endpoints/top_products_params.pipe
** building dependencies
** Creating op_products_params
current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2020-01-01&end=2010-01-01
    new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2020-01-01&end=2010-01-01 ... ok
current https://api.tinybird.co/v0/pipes/top_products_params.json?start=2010-01-01&end=2021-01-01
    new https://api.tinybird.co/v0/pipes/top_products_params__checker.json?start=2010-01-01&end=2021-01-01 ... ok
**    => Test endpoint at https://api.tinybird.co/v0/pipes/op_products_params.json

It will override the endpoint. If the endpoint has been called before, it runs regression tests with the most frequent requests. If the new version doesn't return the same data, then it's not pushed. You can see in the example how to run all the requests tested (up to 10).

However, it's possible to force the push without running the checks using the --no-check flag:

Force override
$ tb push endpoints/top_products_params.pipe --force --no-check
** Processing endpoints/top_products_params.pipe
** Building dependencies
** Creating top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json

This is a security check to avoid breaking production environments. It's better to add an extra parameter than to be sorry

Downloading data files from Tinybird

Sometimes you use the user interface to create pipes, and then you want to store them in your data project. It's possible to download data files using the pull command:

Pull a specific file
$ tb pull --match endpoint_im_working_on

It will download the endpoint_im_working_on.pipe directly to the current folder.

Members management

You can manage Workspace members using the Web UI or the CLI. For the latter, use the workspace members commands.

You can add members:

Adding users to the current workspace
$ tb workspace members add "user1@example.com,user2@example.com,user3@example.com"

Remove members:

Removing members from current workspace
$ tb workspace members rm user3@example.com

And list them:

Listing current Workspace members
$ tb workspace members ls

---------------------
| email             |
---------------------
| user1@example.com |
| user2@example.com |
---------------------

You can also manage roles:

Removing members from current workspace
$ tb workspace members set-role admin user@example.com
Removing members from current workspace
$ tb workspace members set-role guest user@example.com

Integrated help

Once you've installed the CLI you can access the integrated help:

Integrated help
$ tb --help
Usage: tb [OPTIONS] COMMAND [ARGS]...

Options:
    --debug / --no-debug            Prints internal representation, can be
                                    combined with any command to get more
                                    information.
    --token TEXT                    Use auth token, defaults to TB_TOKEN envvar,
                                    then to the .tinyb file
    --host TEXT                     Use custom host, defaults to TB_HOST envvar,
                                    then to https://api.tinybird.co
    --version-warning / --no-version-warning
                                    Don't print version warning message if
                                    there's a new available version. You can use
                                    TB_VERSION_WARNING envar
    --version                       Show the version and exit.
    -h, --help                      Show this message and exit.

Commands:
    auth          Configure auth
    check         Check file syntax
    connection    Connection commands
    datasource    Data sources commands
    dependencies  Print all data sources dependencies
    diff          Diffs a local datafiles to the corresponding remote files in
                    the workspace.
    fmt           Formats a .datasource, .pipe or .incl file
    init          Initialize folder layout
    job           Jobs commands
    materialize   Given a local Pipe datafile (.pipe) and a node name it
                    generates the target Data Source and materialized Pipe ready
                    to be pushed and guides you through the process to create the
                    materialized view
    pipe          Pipes commands
    prompt        Learn how to include info about the CLI in your shell PROMPT
    pull          Retrieve latest versions for project files from Tinybird
    push          Push files to Tinybird
    sql           Run SQL query over data sources and pipes
    test          Test commands
    token         Token commands
    workspace     Workspace commands

And you can do the same for every available command, so you don't need to know every detail for every command:

Integrated command help
$ tb datasource --help
Usage: tb datasource [OPTIONS] COMMAND [ARGS]...

Data sources commands

Options:
    --help  Show this message and exit.

Commands:
    analyze   Analyze a URL or a file before creating a new data source.
    append    Create a data source from a URL, local file or a connector.
    connect   Create a new datasource from an existing connection.
    delete    Delete rows from a datasource.
    generate  Generates a data source file based on a sample CSV, NDJSON or
                Parquet file from local disk or url.
    ls        List data sources
    replace   Replaces the data in a data source from a URL, local file or...
    rm        Delete a data source.
    share     Share a datasource.
    truncate  Truncate a data source.

Supported plaforms

It supports Linux and macOS > 10.14.

Configure the shell PROMPT

When working with the Tinybird CLI from the command line it's useful to have the current Workspace in the command line PROMPT, in the same way you have your active Git branch for instance.

The Tinybird CLI stores the credentials in a local file called .tinyb, so it's relatively easy extract from there the information needed for the PROMPT and customize it to your needs.

You can copy this function to your shell config file (~/.zshrc, ~/.bashrc, etc.) and include it in your PROMPT:

Parse the .tinyb file to use the output in the PROMPT
prompt_tb() {
if [ -e ".tinyb" ]; then
    TB_CHAR=$'\U1F423'
    branch_name=`grep '"name":' .tinyb | cut -d : -f 2 | cut -d '"' -f 2`
    region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 2 | cut -d : -f 1`
    if [ "$region" = "tinybird" ]; then
    region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 1`
    fi
    TB_BRANCH="${TB_CHAR}tb:${region}=>${branch_name}"
else
    TB_BRANCH=''
fi

echo $TB_BRANCH
}

Once the function is available, make the output visible on the PROMPT depends on your shell installation, for instance, for the case of zsh this should work in most cases:

Include info of the Tinybird CLI in the zsh PROMPT
echo 'export PROMPT="' $PS1 ' $(prompt_tb)"' >> ~/.zshrc

Once properly configured, and you are in the root directory of a data project (the one with the .tinyb file), you'll see the Tinybird region and Workspace in your PROMPT:

image

CLI telemetry

Since version 1.0.0b272, the Tinybird CLI includes telemetry. The feature collects the use of the CLI commands and information about exceptions and crashes anonymously and sends it only to Tinybird. Telemetry data helps Tinybird understand how the commands are used so we can improve our command line experience. Information on undesired outputs helps the team resolve potential issues and fix bugs.

On each tb execution, we collect information about your system, your Python environment, the CLI version installed and the command you ran.

How to opt out

CLI telemetry feature is enabled by default. To opt out of the telemetry feature, set the TB_CLI_TELEMETRY_OPTOUT environment variable to 1 or true.