Getting started with the CLI

This section covers ingesting, querying and publishing your first data API using the Tinybird CLI. Prefer the UI? Get started with the UI here.

Example use case

For this guide, we are going to build an API that tells us what the top 10 most searched products are in an eCommerce website.

We’ll ingest some eCommerce events track the actions of our users, such as viewing an item, adding items to their cart or going through checkout. This data is available as a CSV file with 50 million rows.

Once we’ve ingested the data, we’ll write some queries to filter, aggregate & transform the data into the top 10 list.

Finally, we’ll publish that top 10 result as an HTTP API.

Your first Workspace

Wondering how to create an account? It’s free, start here.

After you create your account, you’ll be prompted to choose a region to work in. There are two regions, EU and US, and you’re welcome to pick the region that works best for you.

You’ll then be asked to set your Workspace name; you can call it whatever you want, generally we see people naming their Workspace after the project they are working on.

For this tutorial, you can ignore the box that asks if you want to use a Starter Kit, just leave it with the default None option selected.

The GIF below shows what this step looks like.

_images/quickstart-tb-workspace-15seconds.gif

Setting up the CLI

Let’s get our environment setup so we can get started with the CLI!

The Tinybird CLI is a Python package, so you need to have Python available on your machine. It is built for Python 3, and we recommend Python 3.7 or above.

Start by checking your Python version. On some systems, you may have both python and python3 available, so try the following commands and use the one that returns Python3.7 or above. This guide will use python3.

python --version
python3 --version

It’s not absolutely required, but it’s good practice to use a Python virtual environment. It’s easy to setup, just use the following command:

python3 -m venv .venv

This will create a venv directory that contains your virtual environment. You need to activate the virtual environment to use it.

source .venv/bin/activate

With our virtual environment created & activated, we can install the Tinybird CLI using pip.

pip install tinybird-cli

The following step requires an action in the UI. This is the last time you’ll need the UI for the rest of this CLI guide!

This quick start assumes that you created your workspace in the EU region. If you chose the US, bear in mind that all urls will have an additional us-east subdomain label, like: ui.us-east.tinybird.co or api.us-east.tinybird.co

With the Tinybird CLI installed, we need to authenticate it to connect to our Tinybird Workspace. We need to go into the Tinybird UI to copy our Admin token.

Go to https://ui.tinybird.co/ and sign in. On the left hand side navigation, go to the Auth tokens page (see Mark 1 below). Ensure the admin token is selected on the left hand side (see Mark 2 below), and then copy the token shown in the center of the page (see Mark 3 below).

_images/quickstart-cli-auth-token-1.png

Now that you have copied the admin token, return the CLI.

Run the following command to authenticate.

tb auth

When prompted, paste the token and press Enter.

Copy the admin token from https://ui.tinybird.co/tokens and paste it here :
** Auth successful!
** Configuration written to .tinyb file, consider adding it to .gitignore
** Remember to use https://api.tinybird.co in all your API calls.
** Warning: youre using an admin token that is not associated with any user. If you want to be able to switch between workspaces, please use an admin token associated with your account.

The CLI is now ready to use. Review the available commands with:

tb --help

Creating a data source

Tinybird is very flexible and can import data from many different sources, but let’s start off simple with a CSV file that Tinybird has published for you.

Data Source operations are performed using the tb datasource commands.

Step 1. Add the new Data Source

To create your first Data Source, you need to create a datasource file.

The tb datasource generate command can create this file for you. You can pass in a URL to a file, which it uses to generate the appropriate schema.

Run the following command, which will create a file called events_50M_1.datasource.

tb datasource generate https://storage.googleapis.com/tinybird-assets/datasets/guides/events_50M_1.csv

The contents of this file will look like this:

DESCRIPTION >
    Generated from https://storage.googleapis.com/tinybird-assets/datasets/guides/events_50M_1.csv

SCHEMA >
    ``date`` DateTime,
    ``product_id`` String,
    ``user_id`` Int64,
    ``event`` String,
    ``extra_data`` String%

The name of the file controls the name of the Data Source that is created. Let’s rename the file to shopping_data.datasource.

mv events_50M_1.datasource shopping_data.datasource

Now we can push this file to Tinybird to create the Data Source.

Pushing the Data Source file does not import the data, it simply creates the schema.

tb push shopping_data.datasource

List your Data Sources with the following command. Notice that the name of our Data Source is shopping_data.

tb datasource ls

** Data Sources:
-------------------------------
version:
shared from:
name: shopping_data
row_count: -
size: -
created at: 2022-11-11 21:49:22
updated at: 2022-11-11 21:52:05
connection:
-------------------------------

Step 2. Start the data import

Run the following command to import the data from the CSV file:

tb datasource append shopping_data https://storage.googleapis.com/tinybird-assets/datasets/guides/events_50M_1.csv shopping_data

The import doesn’t take very long, but you don’t need to wait for it.

You can ctrl+c to exit the command and the data import will continue in the background.

Creating a Pipe

In Tinybird, we write all of our SQL queries inside Pipes. One Pipe can be made up of many individual SQL queries, and we call these individual queries Nodes; each Node is simply a SQL SELECT statement. A Node can query the result of another Node in the same Pipe. This means that you can break large queries down into a few, smaller queries and chain them together, making it much easier to build & maintain in the future.

Pipe operations are performed using the tb pipe commands.

Step 1. Filter the data

To create your first Pipe, you need to create a pipe file.

In this Pipe, we want to create a list of the top 10 most searched products. When we created the Data Source, we saw the Schema in the shopping_data.datasource file. You’ll notice an event column, which describes what kind of event happened. This column has various values, including view, search, buy and more. We are only interested in rows where the event is search so the first thing we want to do in our Pipe is to filter the rows.

The CLI can generate the pipe file for us by giving it a SQL query.

Run the following command, which will generate a file called top_10_searched_products.pipe:

tb pipe generate top_10_searched_products "SELECT * FROM shopping_data WHERE event == 'search'"

The contents of this file will look like this:

NODE endpoint
DESCRIPTION >
    Generated from the command line
SQL >
    SELECT * FROM shopping_data WHERE event == 'search'

Let’s change the name, open the file in your favorite text editor.

The original name was endpoint which comes after the NODE text.

Change the name to search_events. You can also change the description text here.

The contents of the file should look like this:

NODE search_events
DESCRIPTION >
    Finds the top 10 products with the most searches
SQL >
    SELECT * FROM shopping_data WHERE event == 'search'

Step 2. Aggregate the data

Next, we want to work out how many times each individual product has been searched for. This means we’re going to want to do a count and aggregate by the product id. To keep our queries clean, let’s create a second Node to do this aggregation.

The really cool thing here is that this second Node can query the result of the search_events Node we just created, meaning that we do not need to duplicate our WHERE filter in this next query, as we’re already querying the filtered rows.

We’ll use the following query for our next Node:

SELECT product_id, count() as total FROM search_events
GROUP BY product_id
ORDER BY total DESC

Edit the top_10_searched_products.pipe file to add the second Node. Call this node aggregate_by_product_id.

NODE search_events
DESCRIPTION >
    Finds the top 10 products with the most searches
SQL >
    SELECT * FROM shopping_data WHERE event == 'search'

NODE aggregate_by_product_id
DESCRIPTION >
    Create a count of searches aggregated by the product ID
SQL >
    SELECT product_id, count() as total FROM search_events
    GROUP BY product_id
    ORDER BY total DESC

Step 3. Transform the result

Finally, let’s create the last Node that we will use to publish as an API and limit the results to the top 10 products.

We’ll use the following query:

SELECT product_id, total FROM aggregate_by_product_id
LIMIT 10

Edit the top_10_searched_products.pipe to add the final Node. Call this node endpoint.

NODE search_events
DESCRIPTION >
    Finds the top 10 products with the most searches
SQL >
    SELECT * FROM shopping_data WHERE event == 'search'

NODE aggregate_by_product_id
DESCRIPTION >
    Create a count of searches aggregated by the product ID
SQL >
    SELECT product_id, count() as total FROM search_events
    GROUP BY product_id
    ORDER BY total DESC

NODE endpoint
DESCRIPTION >
    Exposes top 10 rows as an API
SQL >
    SELECT product_id, total FROM aggregate_by_product_id
    LIMIT 10

With that, we have built the logic required to show the top 10 most searched products.

Step 4. Push to Tinybird

You can now push this Pipe to Tinybird.

tb push top_10_searched_products.pipe

** Processing top_10_searched_products.pipe
** Building dependencies
** Running top_10_searched_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_10_searched_products.json
** 'top_10_searched_products' created
** Not pushing fixtures

Publishing & using an API

Now, let’s say we have an application that wants to be able to get the top 10 result. The magic of Tinybird, is that we can choose any query and instantly publish the results as a REST API. Any applications can simply hit the API and get the very latest result.

Step 1. Publish the API

Although we have pushed the Pipe to Tinybird, we have not yet published an API from it, so let’s do that now.

We want to publish the endpoint Node from our top_10_searched_products Pipe.

We must give the name of the Pipe to the tb pipe publish command, but the Node name is optional. If you do not give the Node name, it will automatically select the final Node in the Pipe.

Run the following command to publish the API, you will see your API URL in the output.

tb pipe publish top_10_searched_products

** Using last node endpoint as endpoint
** Endpoint published!
** Pipe URL: https://api.tinybird.co/v0/pipes/top_10_searched_products

Step 2. Test the API

To finish up, let’s test the API so we can see the output.

Hitting the API triggers your Pipe to execute, and you get a nice JSON formatted response with the results.

Run the following command to test your API:

tb pipe data top_10_searched_products

{
    "meta": [{
        "name": "product_id",
        "type": "String"
    }, {
        "name": "total",
        "type": "UInt64"
    }],
    "data": [{
        "product_id": "6c28b658-1aaa-11eb-8cb5-acde48001122",
        "total": 33
    }, {
        "product_id": "6b50e868-1aaa-11eb-a226-acde48001122",
        "total": 32
    }, {
        "product_id": "6c482f92-1aaa-11eb-b193-acde48001122",
        "total": 31
    }, {
        "product_id": "6aaacc1c-1aaa-11eb-b4d1-acde48001122",
        "total": 31
    }, {
        "product_id": "6c0357be-1aaa-11eb-8e3f-acde48001122",
        "total": 31
    }, {
        "product_id": "6bc78434-1aaa-11eb-8a21-acde48001122",
        "total": 31
    }, {
        "product_id": "6c29f858-1aaa-11eb-bd1a-acde48001122",
        "total": 31
    }, {
        "product_id": "6c02518c-1aaa-11eb-9307-acde48001122",
        "total": 31
    }, {
        "product_id": "6c5b3222-1aaa-11eb-950e-acde48001122",
        "total": 30
    }, {
        "product_id": "6b0fcb8a-1aaa-11eb-bc59-acde48001122",
        "total": 30
    }],
    "rows": 10,
    "rows_before_limit_at_least": 1724264,
    "statistics": {
        "elapsed": 0.724678236,
        "rows_read": 50000000,
        "bytes_read": 3200003296
    }
}

🎉 Celebrate 🎉

Congrats! You’ve finished creating your first API in Tinybird!

If you’d like to try doing this in the UI, follow the Get started with the UI guide.

Ready for something new? Tinybird is great for working with streaming event data & you can integrate Tinybird’s Events API directly into your application to ingest events with no additional infrastructure. Check out this guide for the Events API.