Intro to transforming data with Tinybird

Tinybird transforms your data on-the-fly, either at ingestion or at request time. This guide summarizes how to do it using Pipes and SQL queries.

If you've followed the Intro to ingesting data guide, you should have two Data Sources already in your account. Now you're ready to create Pipes and start exploring and transforming your raw data.

What's a Pipe?

If you've used Jupyter Notebooks, you'll be familiar with Pipes. A Pipe is a series of transformation nodes, where you can work with your data using SQL and see the results iteratively, and in real time. Tinybird is built on ClickHouse, so all the power of SQL in ClickHouse is available to you.

Each pipe can have multiple transformation nodes, that behave similarly to cells in a notebook. And nodes in a Pipe can refer to the results of previous nodes (think of them as subqueries) or just query raw Data Sources.

Pipes are a great way to keep your queries clean and short so that they're easy to maintain and debug. They allow for a smoother experience when working with (big) data—they also help you identify parts of the queries that are performing worse—and, ultimately, make the process much more enjoyable.

You can create new Pipes from the Data Source modal window, the sidebar and your Dashboard view

Clicking on the + button in the Pipes section of the lateral bar, you'll create a new one. Give it a name like {% code-line %}ecommerce_example{% code-line-end %}, for example.

{% tip-box title="GETTING STARTED PIPE" %}By default, you'll have a 'Getting Started Pipe' in your account where we introduce you to some analyses you can make on the NYC Taxi Trip Duration dataset using Tinybird, and to some of the functions available for you to use. You can edit the code in every node, run it and see the results immediately.{% tip-box-end %}

Exploring data with the UI

Most of the SQL syntax you already know also works with Tinybird. To get started, let's create an empty Pipe by clicking on the small "+" sidebar button, and rename the pipe as "Ecommerce". After doing so, let's do a first query to explore what's in the {% code-line %}products{% code-line-end %} Data Source.

Clicking on the run button, will show you the results of your query below together with some very useful information about the rows you have processed, the memory you've used and the duration of the query.

Apart from the query performance information, you can see quick actions for your transformation node right below your query results

Now, let's create another transformation node and do the exactly the same with the {% code-line %}events{% code-line-end %} Data Source.

Under normal circumstances, you would see that querying both Data Sources takes less than 100ms. Given how fast it is, we recommend you to spend some time doing simple filter or aggregation queries to better see the shape of your data, such as looking at the number of events per day:

or ranking the different user_ids by number of events:

Joining Data Sources

Joining Data Sources at scale is one of the most challenging operations in analytical platforms, and Tinybird excels at that. Let's now enrich the events data with info about the product related to each event by performing a simple JOIN.

As you can see, joining 100M events with 2M products takes around a second or two in Tinybird. There ways of making this query faster, check the best practices for writing faster SQL queries or contact us for guidance.

{% tip-box title="OTHER TYPES OF JOINS" %}ClickHouse supports all standard SQL JOIN types.{% tip-box-end %}

Calculated columns

ClickHouse comes with lots of powerful functions that will let us transform data in multiple ways. With Tinybird, you can use them all, so we encourage you to check out the ClickHouse docs on everything that's available for you to use. Let's see a few below.

JSON functions

As you might have seen, the {% code-line %}events{% code-line-end %} data source has a column containing JSON data for locations and prices. We can easily extract the price and the city values to a column by using the {% code-line %}JSONExtract{% code-line-end %} functions.

{% tip-box title="WORKING WITH JSON COLUMNS" %}ClickHouse comes with a set of functions to work with JSON data, which is especially useful when working with Data Streams or events data.{% tip-box-end %}

Date functions

ClickHouse also comes with an extensive list of functions to transform dates and times. As you have seen above, extracting the date part from a Datetime column is easy and fast, and very useful when doing time filters:

Enriching data on ingestion

Operations like these can be done on-the-fly at a decent performance as you can see, but if they're done on ingestion they can speed up queries, especially if they read lots of data. To do it, you can use Materialized Views to calculate data on ingestion, as we explain in this guide.