Ingest from RudderStack

Intermediate

To better understand the behaviour of their customers, companies need to unify timestamped data coming from a wide variety of products and platforms. Typical events to track would be 'sign up', 'login', 'page view' or 'item purchased'.

A customer data platform can be used to capture complete customer data from wherever your customers interact with your brand. It defines events, collects them from different platforms and products, and routes them to where they need to be consumed.

RudderStack is an open-source customer data pipeline tool. It collects, processes and routes data from your websites, apps, cloud tools, and data warehouse. By using Tinybird's event ingestion endpoint for high-frequency ingestion as a Webhook in RudderStack, you can stream customer data in real time to Data Sources.

This guide covers two different methods to send events from RudderStack to Tinybird. This 2-minute video shows you how to set up high-frequency ingestion through RudderStack using the steps in method 1.

Method 1: A separate Data Source for each event type

This preferred approach sends each type of event to its corresponding Data Source. The advantages of this method are:

  • Your data will be well organised from the get-go.
  • Different event types can have different attributes (columns in their Data Source).
  • Whenever new attributes are added to an event type you will be prompted to add new columns.
  • New event types will get a new Data Source.

We start by generating a token in the UI to allow RudderStack to write to Tinybird.

Create a Tinybird Auth Token

Go to the workspace in Tinybird where you want to receive data via Rudderstack and click on Manage Auth Tokens in the side panel.

Create a new Auth token by clicking CREATE NEW (top right).

Give your token a descriptive name.

In the section DATA SOURCES SCOPES mark the Data Sources management checkbox (Enabled) to give your token permission to create Data Sources.

Click on "SAVE CHANGES" at the bottom of the page to create your token.

Create a token with scope to create Data Sources

Create a RudderStack Destination

Log in to RudderStack

Click on Destinations in the side panel and then on New destination (top right).

Select Webhook:

  • Give the destination a descriptive name
  • Connect your source(s), you can test with the Rudderstack Sample HTTP Source
  • Input the following Connection Settings: Webhook URL: https://api.tinybird.co/v0/events URL Method: POST Headers Key: Authorization Headers Value: Bearer TINYBIRD_AUTH_TOKEN
Webhook connection settings for high-frequency ingestion

On the next page, click on CREATE NEW TRANSFORMATION

You can code a function in the box to apply to events when this transformation is active. Here we dynamically append the target Data Source to the target URL of the Webhook. Give your transformation a descriptive name and a helpful description.

You are free to change the transformation code to better suit your needs. Here we use the prefix "rudderstack_" followed by the name of the event in lower case, and its words separated by an underscore (a "Product purchased" event would go to a Data Source named "rudderstack_product_purchased").

Save the transformation.

Destination created successfully.

Test Ingestion

Click on Sources, Rudderstack Sample HTTP.

Click on Live events (top right).

Click on Send test event and paste the provided curl command into your terminal.

The event will appear on the screen and be sent to Tinybird.

If after sending some events through RudderStack, you see that your Data Source in Tinybird exists but is empty (0 rows after sending a few events), you will need to authorise the token that you created to append data to the Data Source. Go to Manage Auth tokens, click in the side panel on the token you created, click on Add Data Source scope under the section Data Sources management, choose the name of the Data Source that you want to write to, mark the Append checkbox and save the changes.

Method 2: All events in the same Data Source

This second approach consists of sending all events into a single Data Source and then splitting them using Tinybird. By preconfiguring the Data Source, any events that RudderStack sends will be ingested with the JSON object in full as a String in a single column. This is very useful when you have complex JSON objects as explained in our docs but be aware that using JSONExtract to parse data from the JSON object after ingestion has performance implications.

New columns from parsing the data will be detected and you will be asked if you want to save them. You can adjust the inferred data types before saving any new columns. Pipes can be used to filter the Data Source by different events.

Here we use Tinybird's CLI tool (see the installation guide).

Preconfigure a Data Source

Using Tinybird's CLI tool, authenticate to your workspace by typing tb auth and introducing your admin token for the workspace where you want to ingest data from RudderStack.

Create a new file in your local workspace, named rudderstack_events.datasource, for example, to configure the empty Data Source.

Push the file to your workspace using tb push rudderstack_events.datasource

Note that this preconfigured Data Source is only required if you need a column containing the JSON object in full as a String. Otherwise, just skip this step and let Tinybird infer the columns and data types when you send the first event. You will then be able to select which columns you wish to save and adjust their data types. Create the Auth token as in method 1.

Create a Tinybird Auth Token

Go to the workspace in Tinybird where you want to receive data via Rudderstack and click on Manage Auth tokens in the side panel.

Create a new Auth token by clicking CREATE NEW (top right).

Give your token a descriptive name.

In the section DATA SOURCES SCOPES click on Add Data Source scope, choose the name of the Data Source that you just created, and mark the Append checkbox.

Click on "SAVE CHANGES" at the bottom of the page to create your token.

Create a RudderStack Destination

Log in to RudderStack

Click on Destinations in the side panel and then on New destination (top right).

Select Webhook

  • Give it a name to help you identify this destination.
  • Connect your source(s), you can test with the Rudderstack Sample HTTP Source
  • Input the following Connection Settings: Webhook URL: https://api.tinybird.co/v0/events?name=rudderstack_events / URL Method: POST / Headers Key: Authorization / Headers Value: Bearer TINYBIRD_AUTH_TOKEN
Webhook connection settings with Data Source name

Click on 'No transformation needed'

Destination created successfully.

Test Ingestion

Click on Sources, Rudderstack Sample HTTP.

Click on Live events (top right).

Click on Send test event and paste the provided curl command into your terminal*.*

The event will appear on the screen and be sent to Tinybird.

The value column contains the full JSON object. You will also have the option of having the data parsed into columns. When viewing the new columns you can select which ones to save and adjust their data types.

New columns detected not in schema

Whenever new columns are detected in the stream of events you will be asked if you wish to save them.