Data Sources¶
What is a Data Source?¶
Data Sources make it super easy to bring your data into Tinybird. Think of it like a table in a database, but with a little extra on top.
When you ingest data, it is written to a Data Source. You can then write SQL to query data from a Data Source.
A Data Source combines the functionality of accessing external data and writing it to a table.
What should I use Data Sources for?¶
You will ingest your data into a Data Source, and build your queries against a Data Source.
If your event data lives in a Kafka topic, for instance, you can create a Data Source that connects directly to Kafka and writes the events to Tinybird. You can then create a Pipe to query your fresh event data.
A Data Source can also be the result of materializing a SQL query through a Pipe.
Creating Data Sources¶
Creating Data Sources in the UI¶
In your workspace, you’ll find the Data Sources section at the bottom of the left side navigation.
Click the Plus (+) icon to add a new Data Source (see Mark 1 below).

Events API¶
In the Data Source window, click on the Events API selector (see Mark 1 below).

You can switch between multiple different code snippets for different languages (see Mark 1 below). Use the Copy snippet button to copy the desired snippet to your clipboard (see Mark 2 below).

The Events API does not require you to create a Data Source upfront, you can add the copied snippet directly into your application and Tinybird will automatically create the Data Source for you when data is received.
You can choose to create a Data Source schema upfront, and then start sending data to it over the Events API. This gives you greater control over the data types and sorting keys used by the Data Source. If you’re new to the CLI, check out the CLI Quick Start guide.
Kafka¶
In the new Data Source modal, click on the Kafka connector (see Mark 1 below).

Enter your connection details in the form. When you are finished configuring the connection, click Next (see Mark 1 below).

In the next screen, you can select which Topic to consume from (see Mark 1 below) and configure the Consumer Group name (see Mark 2 below).
When you are finished configuring the Topic consumer, click Next (see Mark 3 below).

You can now choose where the consumer should start from, either the Earliest or Latest offset. Make your selection, then click Next (see Mark 1 below).

You will see a preview of the schema & data (see Mark 1 below). Click Create Data Source (see Mark 2 below) to finish, and start importing the data.

Remote URL¶
In the new Data Source modal, click on the Remote URL connector (see Mark 1 below).

On the next screen, select the format of your data (see Mark 1 below). Then, enter the URL to a remote file available over HTTP(S) (see Mark 2 below). Click Add (see Mark 3 below).

You will now see a preview of the schema & data (see Mark 1 below). Click Create Data Source (see Mark 2 below) to start importing the data.

Local File¶
In the new Data Source modal, click on the File Upload connector (see Mark 1 below).

On the next screen, click into the Upload box (see Mark 1 below) to open a file selector and choose a file you want to upload. You can also drag & drop a file onto the box.

You’ll see a confirmation of the file name (see Mark 1 below). If it looks correct, click Add (see Mark 2 below).

You will now see a preview of the schema & data (see Mark 1 below). Click Create Data Source (see Mark 2 below) to start importing the data.

Creating Data Sources in the CLI¶
Data Sources operations are performed using the tb datasource
commands.
Events API¶
The Events API does not require you to create a Data Source upfront, Tinybird will automatically create the Data Source for you when data is received.
Kafka¶
To create a Kafka Data Source from the CLI, you must first create a Kafka connection:
tb connection create kafka --bootstrap-servers HOST:PORT --key KEY --secret SECRET --connection-name CONNECTION_NAME
You can then interactively create the Data Source using the connnection. You will be prompted to enter the consumer details:
tb datasource connect CONNECTION_NAME DATASOURCE_NAME
Kafka topic:
Kafka group:
Kafka doesnt seem to have prior commits on this topic and group ID
Setting auto.offset.reset is required. Valid values:
latest Skip earlier messages and ingest only new messages
earliest Start ingestion from the first message
Kafka auto.offset.reset config:
Proceed? [y/N]:
You can also do this non-interactively:
tb datasource connect CONNECTION_NAME DATASOURCE_NAME --topic TOPIC --group GROUP --auto-offset-reset OFFSET
Remote URL¶
If you have a remote file available over HTTP, you can create & import the file into a new Data Source with the following command:
tb datasource append DATA_SOURCE_NAME URL
Alternatively, if you want to generate a .datasource
file to version control your new Data Source, you can instead use this command:
tb datasource generate URL
After creating the .datasource
file, you will need to push it to Tinybird:
tb push DATA_SOURCE_FILE
Local File¶
If you have a local file available, you can create & import the file into a new Data Source with the following command:
tb datasource append DATA_SOURCE_NAME FILE_PATH
Alternatively, if you want to generate a .datasource
file to version control your new Data Source, you can instead use this command:
tb datasource generate FILE_PATH
After creating the .datasource
file, you will need to push it to Tinybird:
tb push DATA_SOURCE_FILE
Setting Data Source TTL¶
You can apply a TTL (Time To Live) to a Data Source in Tinybird. A TTL allows you to define how long data should be stored for.
For example, you might define a TTL of 7 Days, which means that any data older than 7 Days should be deleted. Data that is older than the defined TTL is deleted automatically.
You must define the TTL at the time of creating the Data Source & your data must have a column who’s type can represent a date. Valid types are any of the Date or Int types.
Setting Data Source TTL in the UI¶
This section describes setting the TTL when creating a new Data Source in the Tinybird UI.
If you are using the Tinybird Events API & want to use a TTL, you must create the Data Source with a TTL first before sending data.
When creating your new Data Source, click the Advanced Settings tab (see Mark 1 below). Click onto the TTL dropdown (see Mark 2 below). You must select a column that represents a date (see Mark 3 below).

After selecting a column, you can then define the TTL period in days (see Mark 1 below).

Alternatively, if you need to apply transformation to the date column, or want to use more complex logic, you can select the Code editor tab (see Mark 1 below). You can then enter some custom SQL to define your TTL (see Mark 2 below).

Setting Data Source TTL in the CLI¶
This section describes setting the TTL when creating a new Data Source in the CLI.
When creating a new Data Source, you can add a TTL to the .datasource
file.
At the end of a .datasource
file you will find the Engine settings. Add a new setting called ENGINE_TTL
and enter your TTL string enclosed in double quotes (“).
SCHEMA >
`date` DateTime,
`product_id` String,
`user_id` Int64,
`event` String,
`extra_data` String
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYear(date)"
ENGINE_SORTING_KEY "date, user_id, event, extra_data"
ENGINE_TTL "date + toIntervalDay(90)"
Changing Data Source TTL¶
It is possible to modify the TTL of an existing Data Source. You can add a TTL if one was not specified previously, or update an existing TTL.
Changing Data Source TTL in the UI¶
This section describes changing the TTL of an existing Data Source in the Tinybird UI.
First, navigate to the Data Source details page by clicking on the Data Source who’s TTL you wish to change (see Mark 1 below). Then, click on the Schema tab (see Mark 2 below). You’ll find the Data Source’s TTL at the bottom of the right hand column, click the TTL text (see Mark 3 below).

A dialog window will open. Click into the dropdown menu (see Mark 1 below) to show the available fields to use for the TTL. Click on an item from the dropdown to select it as the field for the TTL (see Mark 2 below).

With the field selected, you can change what the TTL interval will be (see Mark 1 below). When you are finished, click Save (see Mark 2 below).

Finally, you will see the updated TTL value in the Data Source’s Schema page (see Mark 1 below).

Changing Data Source TTL in the CLI¶
This section describes changing the TTL of an existing Data Source in the CLI.
At the end of a .datasource
file you will find the Engine settings.
If no TTL has been applied, add a new setting called ENGINE_TTL
and enter your TTL string enclosed in double quotes (“). If a TTL has already been applied, modify the existing TTL string between the double quotes (“).
The ENGINE_TTL
setting looks like this:
ENGINE_TTL "date + toIntervalDay(90)"
When finished modifying the .datasource
file, you must push the changes to Tinybird using the CLI:
tb push DATA_SOURCE_FILE -f
Supported Engines & Settings¶
We do not recommend changing these settings unless you are familiar with ClickHouse and understand their impact. If you’re unsure, please contact us in our Slack community or email us at support@tinybird.co.
Tinybird uses ClickHouse as the underlying storage engine. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.
Tinybird allows you to choose the ClickHouse Table Engine for your Data Source. To make things simpler, the API hides some of the complexity around ClickHouse’s Table Engines. For instance, it simplifies the replication configuration, so you don’t have to worry about it.
Currently, Tinybird supports the following Engines:
MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree
Null
If you need to use any other Table Engine, contact us in our Slack community or email us at support@tinybird.co.
When using the Data Sources API you can use the engine
parameter to specify the name of any of the available engines, e.g. engine=ReplacingMergeTree
. In order to specify the engine parameters and the engine options, you can use as many engine_*
request parameters as needed. These settings can also be configured in .datasource
files.
The supported parameters for each engine are:
ENGINE |
SIGNATURE |
PARAMETER |
DESCRIPTION |
---|---|---|---|
([ver]) |
engine_ver |
Optional. The column with version. |
|
([columns]) |
engine_columns |
Optional. The names of columns where values will be summarized |
|
(sign) |
engine_sign |
Name of the column for computing the state |
|
(sign, version) |
engine_sign |
Name of the column for computing the state |
|
engine_version |
Name of the column with the version of the object state. |
The engine options, in particular the MergeTree engine options, match ClickHouse terminology: engine_partition_key
, engine_sorting_key
, engine_primary_key
, engine_sampling_key
, engine_ttl
and engine_settings
. Check the ClickHouse documentation for a detailed explanation of the different engine options available.
If engine_partition_key
is left empty or not passed as a parameter, the underlying Data Source would not have any partition unless there’s a Date column, in that case the Data Source will be partitioned by year. If you want to create a Data Source with no partitions just send engine_partition_key=tuple()
engine_settings
allows for fine-grained control over the parameters of the underlying Table Engine. In general, we do not recommend changing these settings unless you are absolutely sure about their impact.
The supported engine settings are:
index_granularity
merge_with_ttl_timeout
ttl_only_drop_parts
min_bytes_for_wide_part
min_rows_for_wide_part
See the Data Sources API for examples of creating Data Sources with custom engine settings using the Tinybird REST API.
Supported data types¶
The supported ClickHouse data types are:
Int8
,Int16
,Int32
,Int64
,Int128
,Int256
UInt8
,UInt16
,UInt32
,UInt64
,UInt128
,UInt256
Float32
,Float64
String
UUID
Date
,Date32
DateTime([TZ])
,DateTime64(P, [TZ])
Array(T)
Supported data formats¶
CSV
The Quarantine Data Source¶
Every data source you create in your Workspace has a quarantine Data Source associated. If you send rows that don’t fit the Data Source schema, they are automatically sent to the quarantine table. This way, the whole ingestion process doesn’t fail, and you can review quarantined rows later or perform operations on them using Pipes. This is a great source of information for you to fix your origin source, or a very powerful way to do the needed changes on-the-fly during the ingestion process.
By convention, the quarantine Data Source is named {datasource_name}_quarantine
.
Quarantine Data Source schema contains the columns of the original row plus some extra ones —c__error_column
, c__error
, c__import_id
, and insertion_date
— with information about the issues that made it go to quarantine.
See the Quarantine Guide for practical examples on using the Quarantice Data Source.
Partitioning¶
Partitioning is configured using the ENGINE_PARTITION_KEY
setting.
Partitioning is not intended to speed up SELECT queries, we recommend you experiment with more efficient sorting keys (ENGINE_SORTING_KEY
) for that. Partitions are intended for data manipulation. A bad partition key (creating too many partitions) can negatively impact query performance.
Some options to consider when choosing a partition key:
Leave the ENGINE_PARTITION_KEY key empty. If the table is small (a few GB), or you aren’t sure what the best partition key should be, leave it empty. The data will be placed in a single partition.
Use a date column. Depending on the filter, you can choose more or less granularity based on your needs.
toYYYYMM(date_column)
ortoYear(date_column)
are usually good default choices.
If you have questions about choosing a partition key, contact us in our Slack community or email us at support@tinybird.co.
Examples¶
ENGINE_PARTITION_KEY "tuple()"
ENGINE_PARTITION_KEY "toYYYYMM(date_column)"
ENGINE_PARTITION_KEY "event_type % 8"
FAQs¶
Does Tinybird support updates and deletes?¶
Yes, but proceed with caution. Deletes and replaces are not part of ClickHouse, so we built on top. See this guide. Depending on the frequency needed, you may want to convert these upserts and deletes into an append problem that will be easier to solve with deduplication.