Data Sources API ¶
The Data Sources API allows you to list, create, update or delete your Tinybird Data Sources. You can also insert or delete data from Data Sources.
New to Data Sources? Read more about them here
All endpoints require authentication using a Token with the appropriate scope.
- POST /v0/datasources/?¶
This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided:
Create a new Data Source with a schema
Append data to an existing Data Source
Replace data in an existing Data Source
The mode is controlled by setting the
mode
parameter, for example,-d "mode=create"
. Each mode has different rate limits.When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized.
¶ KEY
TYPE
DESCRIPTION
mode
String
Default:
create
. Other modes:append
andreplace
.
Thecreate
mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data.
Theappend
mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet).
Thereplace
mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete.
Thecreate
mode will automatically name the Data Source if noname
parameter is provided; for theappend
andreplace
modes to work, thename
parameter must be provided and the schema must be compatible.name
String
Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the
append
orreplace
modes.url
String
Optional. The URL of the CSV with the data to be imported
dialect_delimiter
String
Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it.
dialect_new_line
String
Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it.
dialect_escapechar
String
Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash.
schema
String
Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format
ndjson
theschema
must include thejsonpath
for each column, see theJSONPaths
section for more details.engine
String
Optional. Engine for the underlying data. Requires the
schema
parameter.engine_*
String
Optional. Engine parameters and options, check the Engines section for more details
progress
String
Default:
false
. When usingtrue
and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON.token
String
Auth token with create or append permissions. Required only if no Bearer Authorization header is found
type_guessing
String
Default:
true
Thetype_guessing
parameter is not taken into account when replacing or appending data to an existing Data Source. When usingfalse
all columns are created asString
otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe.debug
String
Optional. Enables returning debug information from logs. It can include
blocks
,block_log
and/orhook_log
replace_condition
String
Optional. When used in combination with the
replace
mode it allows you to replace a portion of your Data Source that matches thereplace_condition
SQL statement with the contents of theurl
or query passed as a parameter. See this guide to learn more.replace_truncate_when_empty
Boolean
Optional. When used in combination with the
replace
mode it allows truncating the Data Source when empty data is provided. Not supported whenreplace_condition
is specifiedformat
String
Default:
csv
. Indicates the format of the data to be ingested in the Data Source. By default iscsv
and you should specifyformat=ndjson
for NDJSON format, andformat=parquet
for Parquet files.Examples
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=stocks" \ -F csv=@local_file.csv
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d url='https://.../data.csv'
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "schema=pk UInt64, insert_date Date, close Float32" \ -d "engine=ReplacingMergeTree" \ -d "engine_sorting_key=pk" \ -d "engine_ver=insert_date" \ -d "name=test123" \ -d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false"
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \ -F csv=@local_file.csv
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='append' \ -d name='data_source_name' \ -d url='https://.../data.csv'
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \ -F csv=@local_file.csv
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode='replace' \ -d name='data_source_name' \ --data-urlencode "url=http://example.com/file.csv"
- GET /v0/datasources/?¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources"
Get a list of the Data Sources in your account.
The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response.
{ "datasources": [{ "id": "t_a049eb516ef743d5ba3bbe5e5749433a", "name": "your_datasource_name", "cluster": "tinybird", "tags": {}, "created_at": "2019-11-13 13:53:05.340975", "updated_at": "2022-02-11 13:11:19.464343", "replicated": true, "version": 0, "project": null, "headers": {}, "shared_with": [ "89496c21-2bfe-4775-a6e8-97f1909c8fff" ], "engine": { "engine": "MergeTree", "engine_sorting_key": "example_column_1", "engine_partition_key": "", "engine_primary_key": "example_column_1" }, "description": "", "used_by": [], "type": "csv", "columns": [{ "name": "example_column_1", "type": "Date", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_1" }, { "name": "example_column_2", "type": "String", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_2" } ], "statistics": { "bytes": 77822, "row_count": 226188 }, "new_columns_detected": {}, "quarantine_rows": 0 }] }
¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine
. Leave empty to return a full responseNote that the
statistics
’sbytes
androw_count
attributes might benull
depending on how the Data Source was created.
- POST /v0/datasources/(.+)/alter¶
Modify the Data Source schema.
This endpoint supports the operation to alter the following fields of a Data Source:
¶ Key
Type
Description
schema
String
Optional. Set the whole schema that adds new columns to the existing ones of a Data Source.
description
String
Optional. Sets the description of the Data Source.
kafka_store_raw_value
Boolean
Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string.
kafka_store_headers
Boolean
Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will be saved as a binary map.
ttl
String
Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL.
dry
Boolean
Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all.
The schema parameter can be used to add new columns at the end of the existing ones in a Data Source.
Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like
SELECT * FROM Data Source ...
. If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the*
operator.Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family.
To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition.
For example, having a Data Source created like this:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32"
if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "schema=symbol String, date Date, close Float32, concept String"
If everything went ok, you will get the operations done in the response:
{ "operations": [ "ADD COLUMN `concept` String" ] }
You can also view the inferred operations without executing them adding
dry=true
in the parameters.To modify the description of a Data Source:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "description=My new description"
To save in the “value” column of a Kafka Data Source the JSON as a raw string:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "kafka_store_raw_value=true" -d "kafka_store_headers=true"
To modify the TTL of a Data Source:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "ttl=12 hours"
To remove the TTL of a Data Source:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "ttl=false"
To add default values to the columns of a Data Source:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String DEFAULT '-', date Date DEFAULT now(), close Float32 DEFAULT 1.1"
To add default values to the columns of a NDJSON Data Source, add the default definition after the jsonpath definition:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol` DEFAULT '-', date Date `json:$.date` DEFAULT now(), close `json:$.close` Float32 DEFAULT 1.1"
To make a column nullable, change the type of the column adding the Nullable type prefix to old one:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`, close `json:$.close` Nullable(Float32)"
To drop a column, just remove the column from the schema definition. It will not be possible removing columns that are part of the primary or partition key:
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json:$.symbol, date Date `json:$.date`"
You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the JSONPath in the schema in the same way as when you created the Data Source.
- POST /v0/datasources/(.+)/truncate¶
Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those won’t be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the
DATASOURCES:CREATE
scope.curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X POST "https://api.tinybird.co/v0/datasources/name/truncate"
This works as well for the
quarantine
table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix.curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate"
- POST /v0/datasources/(.+)/delete¶
Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the
DATASOURCES:CREATE
scope.curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ --data "delete_condition=(country='ES')" \ "https://api.tinybird.co/v0/datasources/:name/delete"
When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the Jobs API. In the response,
id
,job_id
, anddelete_id
should have the same value:{ "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b", "job": { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "waiting", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:32.423213", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bxxxxx", "name": "shopping_data" }, "delete_condition": "event = 'search'" }, "status": "waiting", "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b" }
To check on the progress of the delete job, use the
job_id
from the Delete API response to query the Jobs API.For example, to check on the status of the above delete job:
curl \ -H "Authorization: Bearer <TOKEN>" \ https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b
Would respond with:
{ "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "done", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:37.330020", "started_at": "2023-04-11 13:52:32.842861", "is_cancellable": false, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bc2d35", "name": "shopping_data" }, "delete_condition": " event = 'search'", "rows_affected": 100 }
Data Source engines supported
Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a Table Engine that determines those factors.
Currently, Tinybird supports deleting data for data sources with the following Engines:
MergeTree
ReplacingMergeTree
SummingMergeTree
AggregatingMergeTree
CollapsingMergeTree
VersionedCollapsingMergeTree
Dependent views deletion
If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper
delete_condition
. This applies as well to the associatedquarantine
Data Source.¶ KEY
TYPE
DESCRIPTION
delete_condition
String
Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM <table> WHERE <delete_condition> statement. Most of the times you might want to write a simple
delete_condition
such ascolumn_name=value
but any valid SQL statement including conditional operators is validdry_run
String
Default:
false
. It allows you to test the deletion. When usingtrue
it will execute all deletion validations and return number of matchedrows_to_be_deleted
.
- GET /v0/datasources/(.+)¶
curl \ -H "Authorization: Bearer <DATASOURCES:READ token>" \ -X GET "https://api.tinybird.co/v0/datasources/datasource_name"
Get Data Source information and stats. The token provided must have read access to the Data Source.
{ "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "datasource_name", "statistics": { "bytes": 430833, "row_count": 3980 }, "used_by": [{ "id": "t_efdc62b5e67142bd9bf9a7f113a34353", "name": "pipe_using_datasource_name" }] "updated_at": "2018-09-07 23:50:32.322461", "created_at": "2018-11-28 23:50:32.322461", "type": "csv" }
¶ Key
Type
Description
attrs
String
comma separated list of the Data Source attributes to return in the response. Example:
attrs=name,id,engine
. Leave empty to return a full responseid
andname
are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that theid
never changes; it will work even if you change thename
(which is the name used to display the Data Source in the UI). In general you can useid
orname
indistinctively:Using the above response as an example:
select count(1) from events_table
is equivalent to
select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea
The id
t_bd1c62b5e67142bd9bf9a7f113a2b6ea
is not a descriptive name so you can add a description liket_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea
The
statistics
property contains information about the table. Those numbers are an estimation:bytes
is the estimated data size on disk androw_count
the estimated number of rows. These statistics are updated whenever data is appended to the Data Source.The
used_by
property contains the list of pipes that are using this data source. Only Pipeid
andname
are sent.The
type
property indicates theformat
used when the Data Source was created. Available formats arecsv
,ndjson
, andparquet
. The Data Sourcetype
indicates what file format you can use to ingest data.
- DELETE /v0/datasources/(.+)¶
curl \ -H "Authorization: Bearer <DATASOURCES:DROP token>" \ -X DELETE "https://api.tinybird.co/v0/datasources/:name"
Drops a Data Source from your account.
¶ Key
Type
Description
force
String
Default:
false
. Theforce
parameter is taken into account when trying to delete Materialized Views. By default, when usingfalse
the deletion will not be carried out; you can enable it by setting it totrue
. If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case.dry_run
String
Default:
false
. It allows you to test the deletion. When usingtrue
it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source.token
String
Auth token. Only required if no Bearer Authorization header is sent. It must have
DROP:datasource_name
scope for the given Data Source.
- PUT /v0/datasources/(.+)¶
Update Data Source attributes
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name"
curl \ -H "Authorization: Bearer <DATASOURCES:CREATE token>" \ -X PUT "https://api.tinybird.co/v0/datasources/:name" \ -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \ -d "service=snowflake" \ -d "cron=*/30 * * * *" \ -d "query=select a, b, c from test" \ -d "mode=replace" \ -d "external_data_source=database.schema.table" \ -d "ingest_now=True" \
¶ Key
Type
Description
name
String
new name for the Data Source
token
String
Auth token. Only required if no Bearer Authorization header is sent. It should have
DATASOURCES:CREATE
scope for the given Data Sourceconnector
String
Connector ID to link it to
service
String
Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed
cron
String
Cron-like pattern to execute the connector’s job
query
String
Optional: custom query to collect from the external data source
mode
String
Only replace is allowed for connectors
external_data_source
String
External data source to use for Snowflake
ingest_now
Boolean
To ingest the data immediately instead of waiting for the first execution determined by cron