Data Sources¶
What is a Data Source?¶
Data Sources make it super easy to bring your data into Tinybird. Think of it like a table in a database, but with a little extra on top.
When you ingest data, it is written to a Data Source. You can then write SQL to query data from a Data Source.
A Data Source combines the functionality of accessing external data and writing it to a table.
What should I use Data Sources for?¶
You will ingest your data into a Data Source, and build your queries against a Data Source.
If your event data lives in a Kafka topic, for instance, you can create a Data Source that connects directly to Kafka and writes the events to Tinybird. You can then create a Pipe to query your fresh event data.
A Data Source can also be the result of materializing a SQL query through a Pipe.
Creating Data Sources¶
Creating Data Sources in the UI¶
In your workspace, you’ll find the Data Sources section at the bottom of the left side navigation.
Click the Plus (+) icon to add a new Data Source (see Mark 1 below).

Events API¶
In the Data Source window, click on the Events API tab (see Mark 1 below). You can switch between multiple different code snippets for different languages (see Mark 2 below). Use the Copy snippet button to copy the desired snippet to your clipboard (see Mark 3 below).
The Events API does not require you to create a Data Source upfront, you can add the copied snippet directly into your application and Tinybird will automatically create the Data Source for you when data is received.

Kafka¶
In the Data Source window, click on the Kafka tab (see Mark 1 below). Enter your connection details in the New connection form. (see Mark 2 below).
When you are finished configuring the connection, click Connect to finish (see Mark 3 below).

In the next screen, you can select which Topic to consume from (see Mark 1 below) and configure the Consumer Group name (see Mark 2 below).
When you are finished configuring the Topic consumer, click Connect to finish (see Mark 3 below).

In the last screen, you can choose where the consumer should start from the Earliest or Latest offset (see Mark 1 below). You can also see a preview of the schema & data (see Mark 2 below).
Click Continue (see Mark 3 below) to start importing the data.

Remote URL¶
In the Data Source window, click on the Remote URL tab (see Mark 1 below). In the text box, you can enter a URL to a remote file available over HTTP (see Mark 2 below).
When you are finished entering the URL, click Add to finish (see Mark 3 below).

On the next screen you can give the Data Source a name & description (see Mark 1 below). You can also see a preview of the schema & data (see Mark 2 below).
Click Continue (see Mark 3 below) to start importing the data.

Local File¶
In the Data Source window, click on the Local file tab (see Mark 1 below). Click on the Choose a CSV, NDJSON or Parquet file to upload text (see Mark 2 below) to open a file selector and choose a file you want to upload.
When you are finished selecting a file, click Add to finish (see Mark 3 below).

On the next screen you can give the Data Source a name & description (see Mark 1 below). You can also see a preview of the schema & data (see Mark 2 below).
Click Continue (see Mark 3 below) to start importing the data.

Creating Data Sources in the CLI¶
Data Sources operations are performed using the tb datasource
commands.
Events API¶
The Events API does not require you to create a Data Source upfront, Tinybird will automatically create the Data Source for you when data is received.
Kafka¶
To create a Kafka Data Source from the CLI, you must first create a Kafka connection:
tb connection create kafka --bootstrap-server HOST:PORT --key KEY --secret SECRET --connection-name CONNECTION_NAME
You can then interactively create the Data Source using the connnection. You will be prompted to enter the consumer details:
tb datasource connect CONNECTION_NAME DATASOURCE_NAME
Kafka topic:
Kafka group:
Kafka doesnt seem to have prior commits on this topic and group ID
Setting auto.offset.reset is required. Valid values:
latest Skip earlier messages and ingest only new messages
earliest Start ingestion from the first message
Kafka auto.offset.reset config:
Proceed? [y/N]:
You can also do this non-interactively:
tb datasource connect CONNECTION_NAME DATASOURCE_NAME --topic TOPIC --group GROUP --auto-offset-reset OFFSET
Remote URL¶
If you have a remote file available over HTTP, you can create & import the file into a new Data Source with the following command:
tb datasource append DATA_SOURCE_NAME URL
Alternatively, if you want to generate a .datasource
file to version control your new Data Source, you can instead use this command:
tb datasource generate URL
After creating the .datasource
file, you will need to push it to Tinybird:
tb push DATA_SOURCE_FILE
Local File¶
If you have a local file available, you can create & import the file into a new Data Source with the following command:
tb datasource append DATA_SOURCE_NAME FILE_PATH
Alternatively, if you want to generate a .datasource
file to version control your new Data Source, you can instead use this command:
tb datasource generate FILE_PATH
After creating the .datasource
file, you will need to push it to Tinybird:
tb push DATA_SOURCE_FILE
Setting Data Source TTL¶
You can apply a TTL (Time To Live) to a Data Source in Tinybird. A TTL allows you to define how long data should be stored for.
For example, you might define a TTL of 7 Days, which means that any data older than 7 Days should be deleted. Data that is older than the defined TTL is deleted automatically.
You must define the TTL at the time of creating the Data Source & your data must have a column who’s type can represent a date. Valid types are any of the Date or Int types.
Setting Data Source TTL in the UI¶
This section describes setting the TTL when creating a new Data Source in the Tinybird UI.
When creating your new Data Source, you can select a TTL on the Schema preview modal (see Mark 1 below). You must select a column that represents a date (see Mark 2 below).
If you are using the Tinybird Events API & want to use a TTL, you must create the Data Source with a TTL first before sending data.

After selecting a column, you can then define the TTL period in days (see Mark 1 below).

Alternatively, if you need to apply transformation to the date column, or want to use more complex logic, you can select the Use custom SQL option (see Mark 1 below).

You can then enter some custom SQL to define your TTL (see Mark 1 below).

Setting Data Source TTL in the CLI¶
This section describes setting the TTL when creating a new Data Source in the CLI.
When creating a new Data Source, you can add a TTL to the .datasource
file.
At the end of a .datasource
file you will find the Engine settings. Add a new setting called ENGINE_TTL
and enter your TTL string enclosed in double quotes (“).
SCHEMA >
`date` DateTime,
`product_id` String,
`user_id` Int64,
`event` String,
`extra_data` String
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYear(date)"
ENGINE_SORTING_KEY "date, user_id, event, extra_data"
ENGINE_TTL "date + toIntervalDay(90)"
Changing Data Source TTL¶
It is possible to modify the TTL of an existing Data Source. You can add a TTL if one was not specified previously, or update an existing TTL.
Changing Data Source TTL in the UI¶
This section describes changing the TTL of an existing Data Source in the Tinybird UI.
First, navigate to the Data Source details page by clicking on the Data Source who’s TTL you wish to change (see Mark 1 below). Then, click on the Schema tab (see Mark 2 below). You’ll find the Data Source’s TTL at the bottom of the right hand column, click the TTL text (see Mark 3 below).

A dialog window will open. Click into the dropdown menu (see Mark 1 below) to show the available fields to use for the TTL. Click on an item from the dropdown to select it as the field for the TTL (see Mark 2 below).

With the field selected, you can change what the TTL interval will be (see Mark 1 below). When you are finished, click Save (see Mark 2 below).

Finally, you will see the updated TTL value in the Data Source’s Schema page (see Mark 1 below).

Changing Data Source TTL in the CLI¶
This section describes changing the TTL of an existing Data Source in the CLI.
At the end of a .datasource
file you will find the Engine settings.
If no TTL has been applied, add a new setting called ENGINE_TTL
and enter your TTL string enclosed in double quotes (“). If a TTL has already been applied, modify the existing TTL string between the double quotes (“).
The ENGINE_TTL
setting looks like this:
ENGINE_TTL "date + toIntervalDay(90)"
When finished modifying the .datasource
file, you must push the changes to Tinybird using the CLI:
tb push DATA_SOURCE_FILE -f
Data Sources supported ingestion methods¶
Kafka
Events API
Local files
Remote files reachable through a URL
Data Sources supported data formats¶
CSV
The Quarantine Data Source¶
Every data source you create in your Workspace has a quarantine Data Source associated. If you send rows that don’t fit the Data Source schema, they are automatically sent to the quarantine table. This way, the whole ingestion process doesn’t fail, and you can review quarantined rows later or perform operations on them using Pipes. This is a great source of information for you to fix your origin source, or a very powerful way to do the needed changes on-the-fly during the ingestion process.
By convention, the quarantine Data Source is named {datasource_name}_quarantine
.
Quarantine Data Source schema contains the columns of the original row plus some extra ones —c__error_column
, c__error
, c__import_id
, and insertion_date
— with information about the issues that made it go to quarantine.
See the Quarantine Guide for practical examples on using the Quarantice Data Source.