Data Sources API - Importing Data and Managing your Data Sources

The Data Sources API enables you to create and manage your Data Sources as well as importing Data into them.

In order to use the Data Sources API, you must use an Auth token with the right permissions depending on whether you want to CREATE, APPEND, READ or DROP (or a combination of those)

Importing data into Tinybird Analytics

Tinybird Analytics is specifically designed to ingest, process and analyze data in CSV, NDJSON, and Parquet format. CSV files must have one line for each row of data and have comma-separated fields, with the column headers in the first row. See the API reference to learn about how to ingest NDJSON and Parquet files.

You can import your data into Tinybird Analytics by creating a new Data Source. Tinybird Analytics will automatically detect and optimize your column types so you don’t have to worry about anything and can start analyzing your data right away.

Creating a Data Source from a remote file
curl \
-H "Authorization: Bearer <import_token>" \
-X POST "https://api.tinybird.co/v0/datasources?url=https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv"

Advanced users can explicitly analyze their data before importing.

If you are looking for information on how to create a new Pipe, take a look at the Pipes API reference.

POST /v0/datasources/?

There are three mode operations to import data into a Data Source: create, append, and replace:

  • create: Create the Data Source with its schema.

  • append: Once the Data Source exists to load data into the Data Source. If the Data Source has dependent Materialized Views, data will be appended in cascade.

  • replace: Replace selective data from an existent Data Source or completely replace the content of the existent Data Source. If the Data Source has dependent Materialized Views, these would be replaced in cascade.

When creating a Data Source, it’s necessary to define the schema of the Data Source. It can be done in two different ways:

  • Automatically: When you have a CSV file or a URL for a CSV file, you can use create a Data Source using the file itself. Tinybird will take care of guessing the schema. Setting type_guessing=false will disable guessing and all the columns will be set as String.

  • Manually: When you already know your schema or you want to change or optimize a schema.

You can guess the best schema for a CSV, NDJSON, or Parquet file (both local or remote) with the Analyze API or using the UI.

In the first case, you’ll be actually doing two operations in one: first, a create operation using the guessed schema and then an append operation, ingesting the data.

Creating a Data Source from a schema

To create an empty Data Source, you must pass a schema with your desired column names and types and leave the url parameter empty.

When you want to have a more granular control about the Data Source schema, you can directly specify it when creating the Data Source. These are some cases where setting the schema is the best option:

  • You are already familiar with your data and know the types in advance.

  • For optimization purposes. For example, you know a column is a 32 bit integer instead of a 64 bits one.

  • Making some columns optional by defining them as Nullable.

  • When Tinybird’s guessing fails and incorrectly identifies a column data type. We do our best, but sometimes we get it wrong!

Creating a CSV Data Source from a schema
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"

The supported ClickHouse data types are:

  • Int8, Int16, Int32, Int64, Int128, Int256

  • UInt8, UInt16, UInt32, UInt64, UInt128, UInt256

  • Float32, Float64

  • String

  • UUID

  • Date, Date32

  • DateTime([TZ]), DateTime64(P, [TZ])

  • Array(T)

Quarantine rows

Imports do not stop when Tinybird finds rows that do not match the Data Source schema; instead, those rows are stored into a “quarantine” Data Source. This “quarantine” Data Source is automatically created along with each Data Source and it contains the same columns as the original Data Source but with Nullable(String) as the data type. Those records can be processed later to be recovered. Learn more about quarantine Data Sources

Importing data from a CSV

You import data by specifying the URL of a CSV file or by sending the CSV data in the request body.

Tinybird guesses the Data Source schema (the columns and their data types) and the partition and sorting keys (how the data is stored) from the CSV contents. Once the Data Source has been created, it is recommended not to include the CSV header anymore for performance reasons. Check out our guide for tuning CSVs for fast ingestion. However, in case the header is included and it contains all names present in the Data Source schema, the ingestion will work even if the columns follow a different order than the one used when the Data Source was created.

Using a URL

When using a URL for the CSV file, the URL must be remotely accessible. If the server supports HTTP Range headers the import process will be parallelized. The URL must be encoded.

Creating a Data Source from a remote CSV file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
--data-urlencode "url=http://example.com/file.csv"

When importing a CSV via a URL, the response will not be the final result of the import but a Job. You can check the job status and progress using the Jobs API. In the response, id, job_id, and import_id should have the same value.

Using a local file

You can also use this endpoint to import data to a Data Source from a local file.

Creating a Data Source from local CSV files
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-F csv=@local_file.csv
Successful response
{
    "id": "e9ae235f-f139-43a6-7ad5-a1e17c0071c2",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name",
        "cluster": null,
        "tags": {},
        "created_at": "2019-03-12 17:45:04",
        "updated_at": "2019-03-12 17:45:04",
        "statistics": {
            "bytes": 1397,
            "row_count": 4
        },
        "replicated": false,
        "version": 0,
        "project": null,
        "used_by": [],
        "type": "csv"
    },
    "error": false
}

In this case, you will receive the final import result in the response, so you won’t need to check a job status. You can also track the progress by using the progress=true parameter.

Since the default mode is create, if you want to append or replace data into a Data Source that already exists, you should specify both the mode and the name:

Appending data to a Data Source from a local CSV file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?mode=append&name=<data_source_name>" \
-F csv=@local_file.csv
Replacing a Data Source from a URL
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='replace' \
-d name='data_source_name' \
--data-urlencode "url=http://example.com/file.csv"

There’s a debug option, which is false by default. We can pass the debug information we want to receive, it could be: blocks, block_log, and / or hook_log

Using debug parameter
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?debug=blocks,block_log" \
-F csv=@local_file.csv

Response example:

Successful response
{
    "import_id": "e9ae235f-f139-43a6-7ad5-a1e17c0071c2",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name",
        "cluster": null,
        "tags": {},
        "created_at": "2019-03-12 17:45:04",
        "updated_at": "2019-03-12 17:45:04",
        "statistics": {
            "bytes": 1397,
            "row_count": 4
        },
        "replicated": false,
        "version": 0,
        "project": null,
        "used_by": []
    },
    "error": false,
    "blocks": [{
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "process_return": [{
            "lines": 100,
            "parser": "python",
            "quarantine": 0,
            "time": 0.0274507999420166,
            "invalid_lines": 0,
            "empty_lines": 1,
            "bytes": 1234,
        }],
        "processing_time": 0.13909363746643066,
        "processing_error": null,
        "processing_error_type": null
    }],
    "block_log": [{
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "status": "processing",
        "timestamp": 1594742501.624614
    }, {
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "status": "guessing",
        "timestamp": 1594742501.624639
    }, {
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "status": "inserting_chunk:0",
        "timestamp": 1594742501.747998
    }, {
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "status": "done_inserting_chunk:0",
        "timestamp": 1594742501.753025
    }, {
        "block_id": "0a261696-c531-4f33-bd95-b871a73eaa04",
        "status": "done",
        "timestamp": 1594742501.753836
    }]
}

Importing data from NDJSON or Parquet

You import data by specifying the URL of an NDJSON or Parquet file or by sending the NDJSON/Parquet data in the request body, but before that you have to manually create the Data Source.

The process consists on:

  • (Optionally) Use the Analyze API to get the suggested schema and JSONPaths for creating the Data Source

  • Create the Data Source with the schema and JSONPath with mode=create, from previous step or created by you manually.

  • Ingest data passing the file with mode=append (or mode=replace, currently in beta)

Creating a Data Source using NDJSON/Parquet Data

To create an Data Source using NDJSON/Parquet data, you must pass a schema with your desired column names, types and JSONPath and indicate format=ndjson or format=parquet as appropiate. Let’s see an example:

Creating a Data Source using NDJSON
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "format=ndjson" \
-d "name=events" \
-d "mode=create" \
--data-urlencode "schema=date DateTime \`json:$.date\`, event String \`json:$.event\`"

Note we are escaping the backticks with a backslash because it’s a curl command otherwise it’s not needed.

JSONPaths

When creating a Data Source using NDJSON/Parquet data, for each column in the schema you have to provide a JSONPath using the JSONPath syntax.

Let’s see a complete example. Given this NDJSON object:

{ "field": "test", "nested": { "nested_field": "bla" }, "an_array": [1, 2, 3], "a_nested_array": { "nested_array": [1, 2, 3] } }

The schema would be something like this:

schema with jsonpath
a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`,
an_array Array(Int16) `json:$.an_array[:]`,
field String `json:$.field`,
nested_nested_field String `json:$.nested.nested_field`

Our JSONPath syntax support has some limitations: It support nested objects at multiple levels, but it just support nested arrays at the first level, as in the example above. To ingest and transform more complex JSON object you should use the root object JSONPath syntax as described in the next section.

JSONPaths and the root object

Defining a column as “column_name String json:$” in the Data Source schema will ingest each line in the NDJSON file as a String in the column_name column.

This is very handy in these scenarios:

  1. When you have nested arrays, such as polygons:

Nested arrays
{
    "id": 49518,
    "polygon": [
        [
            [30.471785843000134,-1.066836591999916],
            [30.463855835000118,-1.075127054999925],
            [30.456156047000093,-1.086082457999908],
            [30.453003785000135,-1.097347919999962],
            [30.456311076000134,-1.108096617999891],
            [30.471785843000134,-1.066836591999916]
        ]
    ]
}

You can parse the id and then add the whole JSON string to the root column to extract the polygon with JSON functions.

schema definition
id String `json:$.id`,
root String `json:$`
  1. When you have complex objects:

Complex JSON objects
{
    "elem": {
        "payments": [
            {
                "users": [
                    {
                        "user_id": "Admin_XXXXXXXXX",
                        "value": 4
                    }
                ]
            }
        ]
    }
}

Same for more complex objects, in this case we just push the whole JSON string in the root column:

schema definition
root String `json:$`
  1. Schemaless events:

Schemaless events
{
    "user_id": "1",
    "data": {
        "whatever": "bla",
        "whatever2": "bla"
    }
}

{
    "user_id": "1",
    "data": [1, 2, 3]
}
schema definition
root String `json:$`

Using the Analyze API

You don’t need to remember the exact JSONPath notation when creating an NDJSON/Parquet Data Source, just rely on the Analyze API and it’ll guess a valid schema based on a sample NDJSON/Parquet file:

analyze a NDJSON file to get a valid schema
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/analyze" \
-F "ndjson=@local_file_path"
Successful analyze response
{
    "analysis": {
        "columns": [
            {
                "path": "$.a_nested_array.nested_array[:]",
                "recommended_type": "Array(Int16)",
                "present_pct": 3,
                "name": "a_nested_array_nested_array"
            },
            {
                "path": "$.an_array[:]",
                "recommended_type": "Array(Int16)",
                "present_pct": 3,
                "name": "an_array"
            },
            {
                "path": "$.field",
                "recommended_type": "String",
                "present_pct": 1,
                "name": "field"
            },
            {
                "path": "$.nested.nested_field",
                "recommended_type": "String",
                "present_pct": 1,
                "name": "nested_nested_field"
            }
        ],
        "schema": "a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`, an_array Array(Int16) `json:$.an_array[:]`, field String `json:$.field`, nested_nested_field String `json:$.nested.nested_field`"
    },
    "preview": {
        "meta": [
            {
                "name": "a_nested_array_nested_array",
                "type": "Array(Int16)"
            },
            {
                "name": "an_array",
                "type": "Array(Int16)"
            },
            {
                "name": "field",
                "type": "String"
            },
            {
                "name": "nested_nested_field",
                "type": "String"
            }
        ],
        "data": [
            {
                "a_nested_array_nested_array": [
                    1,
                    2,
                    3
                ],
                "an_array": [
                    1,
                    2,
                    3
                ],
                "field": "test",
                "nested_nested_field": "bla"
            }
        ],
        "rows": 1,
        "statistics": {
            "elapsed": 0.00032175,
            "rows_read": 2,
            "bytes_read": 142
        }
    }
}

Now you can create the Data Source using the schema in the analyze response.

Using a URL

When using a URL for the NDJSON/Parquet file, the URL must be remotely accessible and the url param must be encoded.

Creating a Data Source from a remote NDJSON file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "mode=append" \
-d "format=ndjson" \
-d "name=events" \
--data-urlencode "url=http://example.com/file.ndjson"

When importing an NDJSON/Parquet via a URL, the response will not be the final result of the import but a Job. You can check the job status and progress using the Jobs API. In the response, id, job_id, and import_id should have the same value.

Using a local NDJSON file

You can also use this endpoint to import data to a Data Source from a local file.

Creating a Data Source from local NDJSON files
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources?mode=append&format=ndjson&name=events" \
-F ndjson=@local_file.ndjson
Successful response
{
    "id": "e9ae235f-f139-43a6-7ad5-a1e17c0071c2",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name",
        "cluster": null,
        "tags": {},
        "created_at": "2019-03-12 17:45:04",
        "updated_at": "2019-03-12 17:45:04",
        "statistics": {
            "bytes": 1397,
            "row_count": 4
        },
        "replicated": false,
        "version": 0,
        "project": null,
        "used_by": [],
        "type": "ndjson"
    },
    "error": false
}

In this case, you will receive the final import result in the response, so you won’t need to check a job status.

Data Source Engines

Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.

Tinybird allows you to choose the ClickHouse Table Engine for your Data Source. To make things simpler, the API hides some of the complexity around ClickHouse’s Table Engines. For instance, it simplifies the replication configuration, so you don’t have to worry about it.

Currently, Tinybird supports the following Engines:

  • MergeTree

  • ReplacingMergeTree

  • SummingMergeTree

  • AggregatingMergeTree

  • CollapsingMergeTree

  • VersionedCollapsingMergeTree

  • Null

If you need to use any other Table Engine, get in touch with us.

You can use the engine parameter to specify the name of any of the available engines, e.g. engine=ReplacingMergeTree. In order to specify the engine parameters and the engine options, you can use as many engine_* request parameters as needed.

Engine parameters and options
Engine options ------------------------
                                      ↓
                                -------------
ReplacingMergeTree(insert_date) ORDER BY (pk)
                   -----------
                        ↑
Engine parameters -------

Find the different parameters for each engine in the following table:

Engine parameters

ENGINE

SIGNATURE

PARAMETER

DESCRIPTION

ReplacingMergeTree

([ver])

engine_ver

Optional. The column with version.

SummingMergeTree

([columns])

engine_columns

Optional. The names of columns where values will be summarized

CollapsingMergeTree

(sign)

engine_sign

Name of the column for computing the state

VersionedCollapsingMergeTree

(sign, version)

engine_sign

Name of the column for computing the state

engine_version

Name of the column with the version of the object state.

The engine options, in particular the MergeTree engine options, match ClickHouse terminology: engine_partition_key, engine_sorting_key, engine_primary_key, engine_sampling_key, engine_ttl and engine_settings. Check the ClickHouse documentation for a detailed explanation of the different engine options available.

If engine_partition_key is left empty or not passed as a parameter, the underlying Data Source would not have any partition unless there’s a Date column, in that case the Data Source will be partitioned by year. If you want to create a Data Source with no partitions just send engine_partition_key=tuple()

Creating a Data Source with a ReplacingMergeTree engine
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "schema=pk UInt64, insert_date Date, close Float32" \
-d "engine=ReplacingMergeTree" \
-d "engine_sorting_key=pk" \
-d "engine_ver=insert_date" \
-d "name=test123"

Replacing data

This feature is currently in beta and may change in the future.

You can completely, or selectively, replace the content of a Data Source using the mode=replace parameter. By default, if you do not specify any condition all the content of the Data Source will be destroyed and replaced by the new content you provide. The operation is atomic.

Replacing a Data Source from a URL
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='replace' \
-d name='data_source_name' \
--data-urlencode "url=http://example.com/file.csv"

Alternatively, you might want to replace just certain rows of your Data Source. In that case, you have to use the replace_condition parameter too. This parameter defines the condition, the filter, that will be applied so all the matching rows will be deleted before ingesting the new file and, only those rows matching the condition will be ingested. Note that if the source file contains rows that do not match the condition, these rows will not be ingested. This operation is atomic.

Selective replacing rows on a Data Source from a URL
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d mode='replace' \
-d name='data_source_name' \
-d replace_condition='replacement_condition' \
--data-urlencode "url=http://example.com/file.csv"

The replacements are made by partition so it is mandatory that the replace_condition contains the partition field.

It is important to understand that all the dependencies of the Data Source (materialized views) will be recomputed so your data will be consistent after the replacement. If you have n-level dependencies they will be also updated by this operation. Taking the example A –> B –> C, if you replace data in A, Data Sources B and C will automatically be updated accordingly. You need to take into account that the Partition Key of Data Source C must also be compatible with Data Source A.

You have some examples about how to selectively replace data in our guide .

Important notes

Replacements are atomic but we cannot assure data consistency if you continue appending data to any related Data Source at the same time the replacement takes place. The new incoming data will be discarded. This is an important limitation and we are working to provide you with a more flexible alternative solution.

Handling errors

When ingesting data using create, append, and replace operations it’s possible to encounter some errors. For example, if you’re trying to ingest a file via a URL that doesn’t exist. While in most of the cases we return a HTTP error, the following situation may occur: As we’ve explained before, it’s possible that while importing a file, some data is appended and some data goes to quarantine.

In this case, we return a 200 HTTP Status code. In addition, we return in the response an error message, along with the invalid_lines and the quarantine_rows:

Successful ingestion with errors
{
    "import_id": "e9ae235f-f139-43a6-7ad5-a1e17c0071c2",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name",
        "cluster": null,
        "tags": {},
        "created_at": "2019-03-12 17:45:04",
        "updated_at": "2019-03-12 17:45:04",
        "statistics": {
            "bytes": 1397,
            "row_count": 4
        },
        "replicated": false,
        "version": 0,
        "project": null,
        "used_by": []
    },
    "error": "There was an error with file contents: 2 rows in quarantine and 2 invalid lines",
    "quarantine_rows": 2,
    "invalid_lines": 2
}
Request parameters

KEY

TYPE

DESCRIPTION

mode

String

Default: create. Other modes: append and replace.

The create mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data.
The append mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet).
The replace mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete.

The create mode will automatically name the Data Source if no name parameter is provided; for the append and replace modes to work, the name parameter must be provided and the schema must be compatible.

name

String

Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the append or replace modes.

url

String

Optional. The URL of the CSV with the data to be imported

dialect_delimiter

String

Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct delimiter. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.

dialect_escapechar

String

Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.

schema

String

Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format ndjson the schema must include the jsonpath for each column, see the JSONPaths section for more details.

engine

String

Optional. Engine for the underlying data. Requires the schema parameter.

engine_*

String

Optional. Engine parameters and options, check the Engines section for more details

progress

String

Default: false. When using true and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.

token

String

Auth token with create or append permissions. Required only if no Bearer Authorization header is found

type_guessing

String

Default: true The type_guessing parameter is not taken into account when replacing or appending data to an existing Data Source. When using false all columns are created as String otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.

debug

String

Optional. Enables returning debug information from logs. It can include blocks, block_log and/or hook_log

replace_condition

String

Optional. When used in combination with the replace mode it allows you to replace a portion of your Data Source that matches the replace_condition SQL statement with the contents of the url or query passed as a parameter. See this guide to learn more.

format

String

Default: csv. Indicates the format of the data to be ingested in the Data Source. By default is csv and you should specify format=ndjson for NDJSON format, and format=parquet for Parquet files.

GET /v0/datasources/?
getting a list of your Data Sources
curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources"

Get a list of the Data Sources in your account.

The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.

Successful response
{
    "datasources": [{
        "id": "t_a049eb516ef743d5ba3bbe5e5749433a",
        "name": "your_datasource_name",
        "cluster": "tinybird",
        "tags": {},
        "created_at": "2019-11-13 13:53:05.340975",
        "updated_at": "2022-02-11 13:11:19.464343",
        "replicated": true,
        "version": 0,
        "project": null,
        "headers": {},
        "shared_with": [
            "89496c21-2bfe-4775-a6e8-97f1909c8fff"
        ],
        "engine": {
            "engine": "MergeTree",
            "engine_sorting_key": "example_column_1",
            "engine_partition_key": "",
            "engine_primary_key": "example_column_1"
        },
        "description": "",
        "used_by": [],
        "type": "csv",
        "columns": [{
                "name": "example_column_1",
                "type": "Date",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_1"
            },
            {
                "name": "example_column_2",
                "type": "String",
                "codec": null,
                "default_value": null,
                "jsonpath": null,
                "nullable": false,
                "normalized_name": "example_column_2"
            }
        ],
        "statistics": {
            "bytes": 77822,
            "row_count": 226188
        },
        "new_columns_detected": {},
        "quarantine_rows": 0
    }]
}
Request parameters

Key

Type

Description

attrs

String

comma separated list of the Data Source attributes to return in the response. Example: attrs=name,id,engine. Leave empty to return a full response

Note that the statistics’s bytes and row_count attributes might be null depending on how the Data Source was created.

POST /v0/datasources/(.+)/alter

Modify the Data Source schema.

This endpoint supports the operation to alter the following fields of a Data Source:

Request parameters

Key

Type

Description

schema

String

Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.

description

String

Optional. Sets the description of the Data Source.

kafka_store_raw_value

Boolean

Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.

ttl

String

Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL.

dry

Boolean

Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.

The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.

Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like SELECT * FROM Data Source .... If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the * operator.

Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.

To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.

For example, having a Data Source created like this:

Creating a Data Source from a schema
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "schema=symbol String, date Date, close Float32"

if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:

Adding a new column to an existing Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \
-d "schema=symbol String, date Date, close Float32, concept String"

If everything went ok, you will get the operations done in the response:

ADD COLUMN operation resulted from the schema change.
{
    "operations": [
        "ADD COLUMN `concept` String"
    ]
}

You can also view the inferred operations without executing them adding dry=true in the parameters.

  • To modify the description of a Data Source:

Modifying the description a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "description=My new description"
  • To save in the “value” column of a Kafka Data Source the JSON as a raw string:

Saving the raw string in the value column of a Kafka Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "kafka_store_raw_value=true"
  • To modify the TTL of a Data Source:

Modifying the TTL of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "ttl=12 hours"
  • To remove the TTL of a Data Source:

Modifying the TTL of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/datasources" \
-d "name=stocks" \
-d "ttl=false"

You can alter Data Sources of type ndjson/parquet, in that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.

POST /v0/datasources/(.+)/truncate

Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the DATASOURCES:CREATE scope.

Truncating a Data Source
curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    -X POST "https://api.tinybird.co/v0/datasources/name/truncate"

This works as well for the quarantine table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.

Truncating the quarantine table from a Data Source
curl \
    -H "Authorization: Bearer <DATASOURCES:DROP token>" \
    -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"
POST /v0/datasources/(.+)/delete

Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the DATASOURCES:CREATE scope.

Deleting rows from a Data Source given a SQL condition
curl \
    -H "Authorization: Bearer <DATASOURCES:CREATE token>" \
    --data "delete_condition=(country='ES')" \
    "https://api.tinybird.co/v0/datasources/:name/delete"

When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response, id, job_id, and delete_id should have the same value:

Job created and waiting to start
{
    "id": "job_id",
    "job_id": "job_id",
    "delete_id": "job_id",
    "job_url": "http://api.tinybird.co/v0/jobs/job_id",
    "job": {
        "kind": "delete_data",
        "...": "Full job representation as described below"
    },
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name"
    },
    "status": "waiting"
}

Job status can be one of the following:

  • waiting: The initial status of a job. When creating a job, it has to wait if there’re other jobs running

  • working: Once the job operation has started

  • done: The job has finished successfully

  • error: The job has finished with an error

Job has finished successfully
{
    "id": "c8ae13ef-e739-40b6-8bd5-b1e07c8671c2",
    "job_id": "c8ae13ef-e739-40b6-8bd5-b1e07c8671c2",
    "kind": "delete_data",
    "status": "done",
    "delete_condition": "column_01=1",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name"
    }
}

If there’s been an error in the import operation, the job response will also include a detailed error:

Job has finished with errors
{
    "id": "1f6a5a3d-cfcb-4244-ba0b-0bfa1d1752fb",
    "job_id": "1f6a5a3d-cfcb-4244-ba0b-0bfa1d1752fb",
    "kind": "delete_data",
    "status": "error",
    "delete_condition": "column_01=1",
    "datasource": {
        "id": "t_0ab7a11969fa4f67985cec481f71a5c2",
        "name": "your_datasource_name"
    },
    "error": "[Error] Missing columns: 'random_column' while processing query"
}

Data Source engines supported

Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.

Currently, Tinybird supports deleting data for data sources with the following Engines:

  • MergeTree

  • ReplacingMergeTree

  • SummingMergeTree

  • AggregatingMergeTree

  • CollapsingMergeTree

  • VersionedCollapsingMergeTree

Dependent views deletion

If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper delete_condition. This applies as well to the associated quarantine Data Source.

Request parameters

KEY

TYPE

DESCRIPTION

delete_condition

String

Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple delete_condition such as column_name=value but any valid SQL statement including conditional operators is valid

GET /v0/datasources/(.+)
Getting information about a particular Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:READ token>" \
-X GET "https://api.tinybird.co/v0/datasources/datasource_name"

Get Data Source information and stats. The token provided must have read access to the Data Source.

Successful response
{
    "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea",
    "name": "datasource_name",
    "statistics": {
        "bytes": 430833,
        "row_count": 3980
    },
    "used_by": [{
        "id": "t_efdc62b5e67142bd9bf9a7f113a34353",
        "name": "pipe_using_datasource_name"
    }]
    "updated_at": "2018-09-07 23:50:32.322461",
    "created_at": "2018-11-28 23:50:32.322461",
    "type": "csv"
}
Request parameters

Key

Type

Description

attrs

String

comma separated list of the Data Source attributes to return in the response. Example: attrs=name,id,engine. Leave empty to return a full response

id and name are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that the id never changes; it will work even if you change the name (which is the name used to display the Data Source in the UI). In general you can use id or name indistinctively:

Using the above response as an example:

select count(1) from events_table

is equivalent to

select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea

The id t_bd1c62b5e67142bd9bf9a7f113a2b6ea is not a descriptive name so you can add a description like t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea

The statistics property contains information about the table. Those numbers are an estimation: bytes is the estimated data size on disk and row_count the estimated number of rows. These statistics are updated whenever data is appended to the Data Source.

The used_by property contains the list of pipes that are using this data source. Only Pipe id and name are sent.

The type property indicates the format used when the Data Source was created. Available formats are csv, ndjson, and parquet. The Data Source type indicates what file format you can use to ingest data.

DELETE /v0/datasources/(.+)
Dropping a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:DROP token>" \
-X DELETE "https://api.tinybird.co/v0/datasources/:name"

Drops a Data Source from your account.

Request parameters

Key

Type

Description

force

String

Default: false . The force parameter is taken into account when trying to delete Materialized Views. By default, when using false the deletion will not be carried out; you can enable it by setting it to true. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.

dry_run

String

Default: false. It allows you to test the deletion. When using true it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.

token

String

Auth token. Only required if no Bearer Authorization header is sent. It must have DROP:datasource_name scope for the given Data Source.

PUT /v0/datasources/(.+)

Update Data Source attributes

Updating the name of a Data Source
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
Request parameters

Key

Type

Description

name

String

new name for the Data Source

token

String

Auth token. Only required if no Bearer Authorization header is sent. It should have DATASOURCES:CREATE scope for the given Data Source.

POST /v0/analyze/?

The Analyze API takes a sample of a supported file (csv, ndjson, parquet) and guesses the file format, schema, columns, types, nullables and JSONPaths (in the case of NDJSON paths).

This is a helper endpoint to create Data Sources without having to write the schema manually.

Take into account Tinybird’s guessing algorithm is not deterministic since it takes a random portion of the file passed to the endpoint, that means it can guess different types or nullables depending on the sample analyzed. We recommend to double check the schema guessed in case you have to make some manual adjustments.

Analyze a local file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "https://api.tinybird.co/v0/analyze" \
-F "file=@path_to_local_file"
Analyze a remote file
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-G -X POST "https://api.tinybird.co/v0/analyze" \
--data-urlencode "url=https://example.com/file"
Analyze response
{
    "analysis": {
        "columns": [
            {
                "path": "$.a_nested_array.nested_array[:]",
                "recommended_type": "Array(Int16)",
                "present_pct": 3,
                "name": "a_nested_array_nested_array"
            },
            {
                "path": "$.an_array[:]",
                "recommended_type": "Array(Int16)",
                "present_pct": 3,
                "name": "an_array"
            },
            {
                "path": "$.field",
                "recommended_type": "String",
                "present_pct": 1,
                "name": "field"
            },
            {
                "path": "$.nested.nested_field",
                "recommended_type": "String",
                "present_pct": 1,
                "name": "nested_nested_field"
            }
        ],
        "schema": "a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`, an_array Array(Int16) `json:$.an_array[:]`, field String `json:$.field`, nested_nested_field String `json:$.nested.nested_field`"
    },
    "preview": {
        "meta": [
            {
                "name": "a_nested_array_nested_array",
                "type": "Array(Int16)"
            },
            {
                "name": "an_array",
                "type": "Array(Int16)"
            },
            {
                "name": "field",
                "type": "String"
            },
            {
                "name": "nested_nested_field",
                "type": "String"
            }
        ],
        "data": [
            {
                "a_nested_array_nested_array": [
                    1,
                    2,
                    3
                ],
                "an_array": [
                    1,
                    2,
                    3
                ],
                "field": "test",
                "nested_nested_field": "bla"
            }
        ],
        "rows": 1,
        "statistics": {
            "elapsed": 0.000310539,
            "rows_read": 2,
            "bytes_read": 142
        }
    }
}

The columns attribute contains the guessed columns and for each one:

  • path: The JSONPath syntax in the case of NDJSON/Parquet files

  • recommended_type: The guessed database type

  • present_pct: If the value is lower than 1 then there was nulls in the sample used for guessing

  • name: The recommended column name

The schema attribute is ready to be used in the Data Sources API

The preview contains up to 10 rows of the content of the file.

POST /v0/events

You can send individual events just sending the JSON event in the request body.

Send NDJSON data at high frequency to Tinybird.
curl \
-H "Authorization: Bearer <import_token>" \
-d '{"date": "2020-04-05 00:05:38", "city": "Chicago"}' \
'https://api.tinybird.co/v0/events?name=events_test'

You can also send more than one event per request using \n to separate them. This is far more optimal than sending one request per event. If you can batch events, it will work faster.

Send many NDJSON events. Notice the $ before the JSON events. It’s needed in order for Bash to replace the \n. curl doesn’t do it automatically.
curl \
-H "Authorization: Bearer <import_token>" \
-d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' \
'https://api.tinybird.co/v0/events?name=events_test'

Appends several events in NDJSON format to a JSON Data Source. If the Data Source doesn’t exist, create it with a guessed schema. Requests to this endpoint are not restricted by regular rate limits.

Requets parameters

Key

Type

Description

name

String

name of the target Data Source to append data to it

wait

Boolean

‘false’ by default. Set to ‘true’ to wait until the write is acknowledged