# Tinybird Documentation Generated on: 2025-01-20T11:48:00.704Z URL: https://www.tinybird.co/docs/api-reference Last update: 2025-01-09T09:46:35.000Z Content: --- title: "API Overview · Tinybird Docs" theme-color: "#171612" description: "Tinybird's API Endpoints, such as the Data Sources API to import Data, the Pipes API to transform data and publish the results through API Endpoints, and the Query API to run arbitrary queries." --- # API Overview¶ You can control Tinybird services using the API as an alternative to the UI and CLI. The following APIs are available: | API name | Description | | --- | --- | | [ Analyze API](https://www.tinybird.co/docs/docs/api-reference/analyze-api) | Analyze a given NDJSON, CSV, or Parquet file to generate a Tinybird Data Source schema. | | [ Data Sources API](https://www.tinybird.co/docs/docs/api-reference/datasource-api) | List, create, update, or delete your Tinybird Data Sources, and insert or delete data from Data Sources. | | [ Events API](https://www.tinybird.co/docs/docs/api-reference/events-api) | Ingest NDJSON events with a simple HTTP POST request. | | [ Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api) | Get details on Tinybird jobs, and list the jobs for the last 48 hours or the last 100 jobs. | | [ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) | Interact with your Pipes, including Pipes themselves, API Endpoints, Materialized Views, and managing Copy jobs. | | [ Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) | Query your Pipes and Data Sources inside Tinybird as if you were running SQL statements against a regular database. | | [ Environment Variables API](https://www.tinybird.co/docs/docs/api-reference/environment-variables-api) | Create, update, delete, and list variables that can be used in Pipes in a Workspace. | | [ Sink Pipes API](https://www.tinybird.co/docs/docs/api-reference/sink-pipes-api) | Create, delete, schedule, and trigger Sink Pipes. | | [ Tokens API](https://www.tinybird.co/docs/docs/api-reference/token-api) | List, create, update, or delete your Tinybird Static Tokens. | Make all requests to Tinybird's API Endpoints over TLS (HTTPS). All response bodies, including errors, are encoded as JSON. You can get information on several Workspace operations by [monitoring your jobs](https://www.tinybird.co/docs/docs/monitoring/jobs) , using either the APIs or the built-in Tinybird Service Data Sources. ## Regions and endpoints¶ A Workspace belongs to one region. The API for each region has a specific API base URL that you use to make API requests. The following table lists the current regions and their corresponding API base URLs: ### Current Tinybird regions¶ | Region | Provider | Provider region | API base URL | | --- | --- | --- | --- | | Europe | GCP | europe-west3 | [ https://api.tinybird.co](https://api.tinybird.co/) | | US East | GCP | us-east4 | [ https://api.us-east.tinybird.co](https://api.us-east.tinybird.co/) | | Europe | AWS | eu-central-1 | [ https://api.eu-central-1.aws.tinybird.co](https://api.eu-central-1.aws.tinybird.co/) | | Europe | AWS | eu-west-1 | [ https://api.eu-west-1.aws.tinybird.co](https://api.eu-west-1.aws.tinybird.co/) | | US East | AWS | us-east-1 | [ https://api.us-east.aws.tinybird.co](https://api.us-east.aws.tinybird.co/) | | US West | AWS | us-west-2 | [ https://api.us-west-2.aws.tinybird.co](https://api.us-west-2.aws.tinybird.co/) | ## Authentication¶ Tinybird makes use of Tokens for every API call. This ensures that each user or application can only access data that they are authorized to access. See [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens). You must make all API requests over HTTPS. Don't make calls over plain HTTP or send API requests without authentication. Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. There are two ways to authenticate your requests in the Tinybird API: using an authorization header, or using a URL parameter. ### Authorization header¶ You can send a Bearer authorization header to authenticate API calls. With cURL, use `-H "Authorization: Bearer "`. If you have a valid Token with read access to the particular Data Source, you can get a successful response by sending the following request: ##### Authorization header Authenticated request curl \ -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/sql?q=SELECT+*+FROM+" ### URL parameter¶ You can also specify the Token using a parameter in the URL, using `token=` . For example: ##### URL parameter authenticated request curl -X GET \ "https://api.tinybird.co/v0/sql?q=SELECT+*+FROM+&token=" ## Compression¶ To compress API responses, add `Accept-Encoding: gzip` to your requests. For example: ##### Request with compressed response curl \ -X GET \ -H "Authorization: Bearer " \ -H "Accept-Encoding: gzip" \ "https://api.tinybird.co/v0/sql?q=SELECT+*+FROM+" ## Errors¶ Tinybird 's API returns standard HTTP success or error status codes. In case of errors, responses include additional information in JSON format. The following table lists the error status codes. ### Response codes | Code | Description | | --- | --- | | 400 | Bad request. This could be due to a missing parameter in a request, for instance | | 403 | Forbidden. Provided auth token doesn't have the right scope or the Data Source isn't available | | 404 | Not found | | 405 | HTTP Method not allowed | | 408 | Request timeout (e.g. query execution time was exceeded) | | 409 | You need to resubmit the request due to a conflict with the current state of the target source (e.g.: you need to delete a Materialized View) | | 411 | No valid Content-Length header containing the length of the message-body | | 413 | The message body is too large | | 429 | Too many requests. When over the rate limits of your account | | 500 | Unexpected error | ## Limits¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Versioning¶ All Tinybird APIs are versioned with a version string specified in the base URL. Always use the latest API available. When versioning services, Tinybird adheres to [semantic versioning](https://semver.org/) rules. ## Reserved words¶ The following keywords are reserved. You can't use them to name Data Sources, Pipes, nodes or Workspaces. Case is ignored. - `Array` - `Boolean` - `Date` - `Date32` - `DateTime` - `DateTime32` - `DateTime64` - `Decimal` - `Decimal128` - `Decimal256` - `Decimal32` - `Decimal64` - `Enum` - `Enum16` - `Enum8` - `FixedString` - `Float32` - `Float64` - `IPv4` - `IPv6` - `Int128` - `Int16` - `Int256` - `Int32` - `Int64` - `Int8` - `MultiPolygon` - `Point` - `Polygon` - `Ring` - `String` - `TABLE` - `UInt128` - `UInt16` - `UInt256` - `UInt32` - `UInt64` - `UInt8` - `UUID` - `_temporary_and_external_tables` - `add` - `after` - `all` - `and` - `anti` - `any` - `array` - `as` - `asc` - `asof` - `between` - `by` - `case` - `collate` - `column` - `columns` - `cross` - `cube` - `custom_error` - `day_diff` - `default` - `defined` - `desc` - `distinct` - `else` - `end` - `enumerate_with_last` - `error` - `exists` - `from` - `full` - `functions` - `generateRandom` - `global` - `group` - `having` - `if` - `ilike` - `in` - `inner` - `insert` - `interval` - `into` - `join` - `left` - `like` - `limit` - `limits` - `max` - `min` - `not` - `null` - `numbers_mt` - `on` - `one` - `or` - `order` - `outer` - `prewhere` - `public` - `right` - `sample` - `select` - `semi` - `split_to_array` - `sql_and` - `sql_unescape` - `system` - `table` - `then` - `tinybird` - `to` - `union` - `using` - `where` - `with` - `zeros_mt` Pipe, Data Source and node names are globally unique. You can't use an alias for a column that matches a globally unique name. --- URL: https://www.tinybird.co/docs/api-reference/analyze-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Analyze API Reference · Tinybird Docs" theme-color: "#171612" description: "The Analyze API allows you analyze a given NDJSON, CSV, or Parquet file to generate a Tinybird Data Source schema." --- POST /v0/analyze/? [¶](https://www.tinybird.co/docs/about:blank#post--v0-analyze-?) The Analyze API takes a sample of a supported file ( `csv`, `ndjson`, `parquet` ) and guesses the file format, schema, columns, types, nullables and JSONPaths (in the case of NDJSON paths). This is a helper endpoint to create Data Sources without having to write the schema manually. Take into account Tinybird’s guessing algorithm is not deterministic since it takes a random portion of the file passed to the endpoint, that means it can guess different types or nullables depending on the sample analyzed. We recommend to double check the schema guessed in case you have to make some manual adjustments. Analyze a local file [¶](https://www.tinybird.co/docs/about:blank#id1) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/analyze" \ -F "file=@path_to_local_file" Analyze a remote file [¶](https://www.tinybird.co/docs/about:blank#id2) curl \ -H "Authorization: Bearer " \ -G -X POST "https://api.tinybird.co/v0/analyze" \ --data-urlencode "url=https://example.com/file" Analyze response [¶](https://www.tinybird.co/docs/about:blank#id3) { "analysis": { "columns": [ { "path": "$.a_nested_array.nested_array[:]", "recommended_type": "Array(Int16)", "present_pct": 3, "name": "a_nested_array_nested_array" }, { "path": "$.an_array[:]", "recommended_type": "Array(Int16)", "present_pct": 3, "name": "an_array" }, { "path": "$.field", "recommended_type": "String", "present_pct": 1, "name": "field" }, { "path": "$.nested.nested_field", "recommended_type": "String", "present_pct": 1, "name": "nested_nested_field" } ], "schema": "a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`, an_array Array(Int16) `json:$.an_array[:]`, field String `json:$.field`, nested_nested_field String `json:$.nested.nested_field`" }, "preview": { "meta": [ { "name": "a_nested_array_nested_array", "type": "Array(Int16)" }, { "name": "an_array", "type": "Array(Int16)" }, { "name": "field", "type": "String" }, { "name": "nested_nested_field", "type": "String" } ], "data": [ { "a_nested_array_nested_array": [ 1, 2, 3 ], "an_array": [ 1, 2, 3 ], "field": "test", "nested_nested_field": "bla" } ], "rows": 1, "statistics": { "elapsed": 0.000310539, "rows_read": 2, "bytes_read": 142 } } } The `columns` attribute contains the guessed columns and for each one: - `path` : The JSONPath syntax in the case of NDJSON/Parquet files - `recommended_type` : The guessed database type - `present_pct` : If the value is lower than 1 then there was nulls in the sample used for guessing - `name` : The recommended column name The `schema` attribute is ready to be used in the [Data Sources API](https://www.tinybird.co/docs/docs/api-reference/datasource-api) The `preview` contains up to 10 rows of the content of the file. --- URL: https://www.tinybird.co/docs/api-reference/datasource-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Data Sources API Reference · Tinybird Docs" theme-color: "#171612" description: "The Data Source API enables you to create, manage and import data into your Data Sources." --- POST /v0/datasources/? [¶](https://www.tinybird.co/docs/about:blank#post--v0-datasources-?) This endpoint supports 3 modes to enable 3 distinct operations, depending on the parameters provided: > - Create a new Data Source with a schema - Append data to an existing Data Source - Replace data in an existing Data Source The mode is controlled by setting the `mode` parameter, for example, `-d "mode=create"` . Each mode has different [rate limits](https://www.tinybird.co/docs/docs/api-reference/overview#limits). When importing remote files by URL, if the server hosting the remote file supports HTTP Range headers, the import process will be parallelized. | KEY | TYPE | DESCRIPTION | | --- | --- | --- | | mode | String | Default: `create` . Other modes: `append` and `replace` . The `create` mode creates a new Data Source and attempts to import the data of the CSV if a URL is provided or the body contains any data. The `append` mode inserts the new rows provided into an existing Data Source (it will also create it if it does not exist yet). The `replace` mode will remove the previous Data Source and its data and replace it with the new one; Pipes or queries pointing to this Data Source will immediately start returning data from the new one and without disruption once the replace operation is complete. The `create` mode will automatically name the Data Source if no `name` parameter is provided; for the `append` and `replace` modes to work, the `name` parameter must be provided and the schema must be compatible. | | name | String | Optional. Name of the Data Source to create, append or replace data. This parameter is mandatory when using the `append` or `replace` modes. | | url | String | Optional. The URL of the CSV with the data to be imported | | dialect_delimiter | String | Optional. The one-character string separating the fields. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s field delimiter, you can use this parameter to explicitly define it. | | dialect_new_line | String | Optional. The one- or two-character string separating the records. We try to guess the delimiter based on the CSV contents using some statistics, but sometimes we fail to identify the correct one. If you know your CSV’s record delimiter, you can use this parameter to explicitly define it. | | dialect_escapechar | String | Optional. The escapechar removes any special meaning from the following character. This is useful if the CSV does not use double quotes to encapsulate a column but uses double quotes in the content of a column and it is escaped with, e.g. a backslash. | | schema | String | Optional. Data Source schema in the format ‘column_name Type, column_name_2 Type2…’. When creating a Data Source with format `ndjson` the `schema` must include the `jsonpath` for each column, see the `JSONPaths` section for more details. | | engine | String | Optional. Engine for the underlying data. Requires the `schema` parameter. | | engine_* | String | Optional. Engine parameters and options, check the[ Engines](https://www.tinybird.co/docs/concepts/data-sources.html#supported-engines) section for more details | | progress | String | Default: `false` . When using `true` and sending the data in the request body, Tinybird will return block status while loading using Line-delimited JSON. | | token | String | Auth token with create or append permissions. Required only if no Bearer Authorization header is found | | type_guessing | String | Default: `true` The `type_guessing` parameter is not taken into account when replacing or appending data to an existing Data Source. When using `false` all columns are created as `String` otherwise it tries to guess the column types based on the CSV contents. Sometimes you are not familiar with the data and the first step is to get familiar with it: by disabling the type guessing, we enable you to quickly import everything as strings that you can explore with SQL and cast to the right type or shape in whatever way you see fit via a Pipe. | | debug | String | Optional. Enables returning debug information from logs. It can include `blocks` , `block_log` and/or `hook_log` | | replace_condition | String | Optional. When used in combination with the `replace` mode it allows you to replace a portion of your Data Source that matches the `replace_condition` SQL statement with the contents of the `url` or query passed as a parameter. See this[ guide](https://www.tinybird.co/guide/replacing-and-deleting-data#replace-data-selectively) to learn more. | | replace_truncate_when_empty | Boolean | Optional. When used in combination with the `replace` mode it allows truncating the Data Source when empty data is provided. Not supported when `replace_condition` is specified | | format | String | Default: `csv` . Indicates the format of the data to be ingested in the Data Source. By default is `csv` and you should specify `format=ndjson` for NDJSON format, and `format=parquet` for Parquet files. | **Examples** Creating a CSV Data Source from a schema [¶](https://www.tinybird.co/docs/about:blank#id2) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32" Creating a CSV Data Source from a local CSV file with schema inference [¶](https://www.tinybird.co/docs/about:blank#id3) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=stocks" \ -F csv = @local_file.csv Creating a CSV Data Source from a remote CSV file with schema inference [¶](https://www.tinybird.co/docs/about:blank#id4) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d url = 'https://.../data.csv' Creating an empty Data Source with a ReplacingMergeTree engine and custom engine settings [¶](https://www.tinybird.co/docs/about:blank#id5) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "schema=pk UInt64, insert_date Date, close Float32" \ -d "engine=ReplacingMergeTree" \ -d "engine_sorting_key=pk" \ -d "engine_ver=insert_date" \ -d "name=test123" \ -d "engine_settings=index_granularity=2048, ttl_only_drop_parts=false" Appending data to a Data Source from a local CSV file [¶](https://www.tinybird.co/docs/about:blank#id6) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=append" \ -F csv = @local_file.csv Appending data to a Data Source from a remote CSV file [¶](https://www.tinybird.co/docs/about:blank#id7) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode = 'append' \ -d name = 'data_source_name' \ -d url = 'https://.../data.csv' Replacing data with a local file [¶](https://www.tinybird.co/docs/about:blank#id8) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=data_source_name&mode=replace" \ -F csv = @local_file.csv Replacing data with a remote file from a URL [¶](https://www.tinybird.co/docs/about:blank#id9) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d mode = 'replace' \ -d name = 'data_source_name' \ --data-urlencode "url=http://example.com/file.csv" GET /v0/datasources/? [¶](https://www.tinybird.co/docs/about:blank#get--v0-datasources-?) getting a list of your Data Sources [¶](https://www.tinybird.co/docs/about:blank#id10) curl \ -H "Authorization: Bearer " \ -X GET "https://api.tinybird.co/v0/datasources" Get a list of the Data Sources in your account. The token you use to query the available Data Sources will determine what Data Sources get returned: only those accessible with the token you are using will be returned in the response. Successful response [¶](https://www.tinybird.co/docs/about:blank#id11) { "datasources": [{ "id": "t_a049eb516ef743d5ba3bbe5e5749433a", "name": "your_datasource_name", "cluster": "tinybird", "tags": {}, "created_at": "2019-11-13 13:53:05.340975", "updated_at": "2022-02-11 13:11:19.464343", "replicated": true, "version": 0, "project": null, "headers": {}, "shared_with": [ "89496c21-2bfe-4775-a6e8-97f1909c8fff" ], "engine": { "engine": "MergeTree", "engine_sorting_key": "example_column_1", "engine_partition_key": "", "engine_primary_key": "example_column_1" }, "description": "", "used_by": [], "type": "csv", "columns": [{ "name": "example_column_1", "type": "Date", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_1" }, { "name": "example_column_2", "type": "String", "codec": null, "default_value": null, "jsonpath": null, "nullable": false, "normalized_name": "example_column_2" } ], "statistics": { "bytes": 77822, "row_count": 226188 }, "new_columns_detected": {}, "quarantine_rows": 0 }] }| Key | Type | Description | | --- | --- | --- | | attrs | String | comma separated list of the Data Source attributes to return in the response. Example: `attrs=name,id,engine` . Leave empty to return a full response | Note that the `statistics` ’s `bytes` and `row_count` attributes might be `null` depending on how the Data Source was created. POST /v0/datasources/(.+)/alter [¶](https://www.tinybird.co/docs/about:blank#post--v0-datasources-(.+)-alter) Modify the Data Source schema. This endpoint supports the operation to alter the following fields of a Data Source: | Key | Type | Description | | --- | --- | --- | | schema | String | Optional. Set the whole schema that adds new columns to the existing ones of a Data Source. | | description | String | Optional. Sets the description of the Data Source. | | kafka_store_raw_value | Boolean | Optional. Default: false. When set to true, the ‘value’ column of a Kafka Data Source will save the JSON as a raw string. | | kafka_store_headers | Boolean | Optional. Default: false. When set to true, the ‘headers’ of a Kafka Data Source will be saved as a binary map. | | ttl | String | Optional. Set to any value accepted in ClickHouse for a TTL or to ‘false’ to remove the TTL. | | dry | Boolean | Optional. Default: false. Set to true to show what would be modified in the Data Source, without running any modification at all. | The schema parameter can be used to add new columns at the end of the existing ones in a Data Source. Be aware that currently we don’t validate if the change will affect the existing MVs (Materialized Views) attached to the Data Source to be modified, so this change may break existing MVs. For example, avoid changing a Data Source that has a MV created with something like `SELECT * FROM Data Source ...` . If you want to have forward compatible MVs with column additions, create them especifying the columns instead of using the `*` operator. Also, take in account that, for now, the only engines supporting adding new columns are those inside the MergeTree family. To add a column to a Data Source, call this endpoint with the Data Source name and the new schema definition. For example, having a Data Source created like this: Creating a Data Source from a schema [¶](https://www.tinybird.co/docs/about:blank#id14) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String, date Date, close Float32" if you want to add a new column ‘concept String’, you need to call this endpoint with the new schema: Adding a new column to an existing Data Source [¶](https://www.tinybird.co/docs/about:blank#id15) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "schema=symbol String, date Date, close Float32, concept String" If everything went ok, you will get the operations done in the response: ADD COLUMN operation resulted from the schema change. [¶](https://www.tinybird.co/docs/about:blank#id16) { "operations": [ "ADD COLUMN `concept` String" ] } You can also view the inferred operations without executing them adding `dry=true` in the parameters. - To modify the description of a Data Source: Modifying the description a Data Source [¶](https://www.tinybird.co/docs/about:blank#id17) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "description=My new description" - To save in the “value” column of a Kafka Data Source the JSON as a raw string: Saving the raw string in the value column of a Kafka Data Source [¶](https://www.tinybird.co/docs/about:blank#id18) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "kafka_store_raw_value=true" -d "kafka_store_headers=true" - To modify the TTL of a Data Source: Modifying the TTL of a Data Source [¶](https://www.tinybird.co/docs/about:blank#id19) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/stocks/alter" \ -d "name=stocks" \ -d "ttl=12 hours" - To remove the TTL of a Data Source: Modifying the TTL of a Data Source [¶](https://www.tinybird.co/docs/about:blank#id20) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "ttl=false" - To add default values to the columns of a Data Source: Modifying default values [¶](https://www.tinybird.co/docs/about:blank#id21) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String DEFAULT '-', date Date DEFAULT now(), close Float32 DEFAULT 1.1" - To add default values to the columns of a NDJSON Data Source, add the default definition after the jsonpath definition: Modifying default values in a NDJSON Data Source [¶](https://www.tinybird.co/docs/about:blank#id22) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json: $ .symbol` DEFAULT '-', date Date `json: $ .date` DEFAULT now(), close `json: $ .close` Float32 DEFAULT 1.1" - To make a column nullable, change the type of the column adding the Nullable type prefix to old one: Converting column “close” to Nullable [¶](https://www.tinybird.co/docs/about:blank#id23) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json: $ .symbol, date Date `json: $ .date`, close `json: $ .close` Nullable(Float32)" - To drop a column, just remove the column from the schema definition. It will not be possible removing columns that are part of the primary or partition key: Remove column “close” from the Data Source [¶](https://www.tinybird.co/docs/about:blank#id24) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources" \ -d "name=stocks" \ -d "schema=symbol String `json: $ .symbol, date Date `json: $ .date`" You can also alter the JSONPaths of existing Data Sources. In that case you have to specify the [JSONPath](https://www.tinybird.co/docs/docs/guides/ingesting-data/ingest-ndjson-data) in the schema in the same way as when you created the Data Source. POST /v0/datasources/(.+)/truncate [¶](https://www.tinybird.co/docs/about:blank#post--v0-datasources-(.+)-truncate) Truncates a Data Source in your account. If the Data Source has dependent Materialized Views, those **won’t** be truncated in cascade. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method. Auth token in use must have the `DATASOURCES:CREATE` scope. Truncating a Data Source [¶](https://www.tinybird.co/docs/about:blank#id25) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/name/truncate" This works as well for the `quarantine` table of a Data Source. Remember that the quarantine table for a Data Source has the same name but with the “_quarantine” suffix. Truncating the quarantine table from a Data Source [¶](https://www.tinybird.co/docs/about:blank#id26) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources/:name_quarantine/truncate" POST /v0/datasources/(.+)/delete [¶](https://www.tinybird.co/docs/about:blank#post--v0-datasources-(.+)-delete) Deletes rows from a Data Source in your account given a SQL condition. Auth token in use must have the `DATASOURCES:CREATE` scope. Deleting rows from a Data Source given a SQL condition [¶](https://www.tinybird.co/docs/about:blank#id27) curl \ -H "Authorization: Bearer " \ --data "delete_condition=(country='ES')" \ "https://api.tinybird.co/v0/datasources/:name/delete" When deleting rows from a Data Source, the response will not be the final result of the deletion but a Job. You can check the job status and progress using the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api) . In the response, `id`, `job_id` , and `delete_id` should have the same value: Delete API Response [¶](https://www.tinybird.co/docs/about:blank#id28) { "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b", "job": { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "waiting", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:32.423213", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bxxxxx", "name": "shopping_data" }, "delete_condition": "event = 'search'" }, "status": "waiting", "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b" } To check on the progress of the delete job, use the `job_id` from the Delete API response to query the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api). For example, to check on the status of the above delete job: checking the status of the delete job [¶](https://www.tinybird.co/docs/about:blank#id29) curl \ -H "Authorization: Bearer " \ https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b Would respond with: Job API Response [¶](https://www.tinybird.co/docs/about:blank#id30) { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "done", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:37.330020", "started_at": "2023-04-11 13:52:32.842861", "is_cancellable": false, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bc2d35", "name": "shopping_data" }, "delete_condition": " event = 'search'", "rows_affected": 100 } ### Data Source engines supported Tinybird uses ClickHouse as the underlying storage technology. ClickHouse features different strategies to store data, these different strategies define not only where and how the data is stored but what kind of data access, queries, and availability your data has. In ClickHouse terms, a Tinybird Data Source uses a [Table Engine](https://clickhouse.tech/docs/en/engines/table_engines/) that determines those factors. Currently, Tinybird supports deleting data for data sources with the following Engines: - MergeTree - ReplacingMergeTree - SummingMergeTree - AggregatingMergeTree - CollapsingMergeTree - VersionedCollapsingMergeTree ### Dependent views deletion If the Data Source has dependent Materialized Views, those won’t be cascade deleted. In case you want to delete data from other dependent Materialized Views, you’ll have to do a subsequent call to this method for the affected view with a proper `delete_condition` . This applies as well to the associated `quarantine` Data Source. | KEY | TYPE | DESCRIPTION | | --- | --- | --- | | delete_condition | String | Mandatory. A string representing the WHERE SQL clause you’d add to a regular DELETE FROM WHERE statement. Most of the times you might want to write a simple `delete_condition` such as `column_name=value` but any valid SQL statement including conditional operators is valid | | dry_run | String | Default: `false` . It allows you to test the deletion. When using `true` it will execute all deletion validations and return number of matched `rows_to_be_deleted` . | GET /v0/datasources/(.+) [¶](https://www.tinybird.co/docs/about:blank#get--v0-datasources-(.+)) Getting information about a particular Data Source [¶](https://www.tinybird.co/docs/about:blank#id32) curl \ -H "Authorization: Bearer " \ -X GET "https://api.tinybird.co/v0/datasources/datasource_name" Get Data Source information and stats. The token provided must have read access to the Data Source. Successful response [¶](https://www.tinybird.co/docs/about:blank#id33) { "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "datasource_name", "statistics": { "bytes": 430833, "row_count": 3980 }, "used_by": [{ "id": "t_efdc62b5e67142bd9bf9a7f113a34353", "name": "pipe_using_datasource_name" }] "updated_at": "2018-09-07 23:50:32.322461", "created_at": "2018-11-28 23:50:32.322461", "type": "csv" }| Key | Type | Description | | --- | --- | --- | | attrs | String | comma separated list of the Data Source attributes to return in the response. Example: `attrs=name,id,engine` . Leave empty to return a full response | `id` and `name` are two ways to refer to the Data Source in SQL queries and API endpoints. The only difference is that the `id` never changes; it will work even if you change the `name` (which is the name used to display the Data Source in the UI). In general you can use `id` or `name` indistinctively: Using the above response as an example: `select count(1) from events_table` is equivalent to `select count(1) from t_bd1c62b5e67142bd9bf9a7f113a2b6ea` The id `t_bd1c62b5e67142bd9bf9a7f113a2b6ea` is not a descriptive name so you can add a description like `t_my_events_datasource.bd1c62b5e67142bd9bf9a7f113a2b6ea` The `statistics` property contains information about the table. Those numbers are an estimation: `bytes` is the estimated data size on disk and `row_count` the estimated number of rows. These statistics are updated whenever data is appended to the Data Source. The `used_by` property contains the list of pipes that are using this data source. Only Pipe `id` and `name` are sent. The `type` property indicates the `format` used when the Data Source was created. Available formats are `csv`, `ndjson` , and `parquet` . The Data Source `type` indicates what file format you can use to ingest data. DELETE /v0/datasources/(.+) [¶](https://www.tinybird.co/docs/about:blank#delete--v0-datasources-(.+)) Dropping a Data Source [¶](https://www.tinybird.co/docs/about:blank#id35) curl \ -H "Authorization: Bearer " \ -X DELETE "https://api.tinybird.co/v0/datasources/:name" Drops a Data Source from your account. | Key | Type | Description | | --- | --- | --- | | force | String | Default: `false` . The `force` parameter is taken into account when trying to delete Materialized Views. By default, when using `false` the deletion will not be carried out; you can enable it by setting it to `true` . If the given Data Source is being used as the trigger of a Materialized Node, it will not be deleted in any case. | | dry_run | String | Default: `false` . It allows you to test the deletion. When using `true` it will execute all deletion validations and return the possible affected materializations and other dependencies of a given Data Source. | | token | String | Auth token. Only required if no Bearer Authorization header is sent. It must have `DROP:datasource_name` scope for the given Data Source. | PUT /v0/datasources/(.+) [¶](https://www.tinybird.co/docs/about:blank#put--v0-datasources-(.+)) Update Data Source attributes Updating the name of a Data Source [¶](https://www.tinybird.co/docs/about:blank#id37) curl \ -H "Authorization: Bearer " \ -X PUT "https://api.tinybird.co/v0/datasources/:name?name=new_name" Promoting a Data Source to a Snowflake one [¶](https://www.tinybird.co/docs/about:blank#id38) curl \ -H "Authorization: Bearer " \ -X PUT "https://api.tinybird.co/v0/datasources/:name" \ -d "connector=1d8232bf-2254-4d68-beff-4dd9aa505ab0" \ -d "service=snowflake" \ -d "cron=*/30 * * * *" \ -d "query=select a, b, c from test" \ -d "mode=replace" \ -d "external_data_source=database.schema.table" \ -d "ingest_now=True" \| Key | Type | Description | | --- | --- | --- | | name | String | new name for the Data Source | | token | String | Auth token. Only required if no Bearer Authorization header is sent. It should have `DATASOURCES:CREATE` scope for the given Data Source | | connector | String | Connector ID to link it to | | service | String | Type of service to promote it to. Only ‘snowflake’ or ‘bigquery’ allowed | | cron | String | Cron-like pattern to execute the connector’s job | | query | String | Optional: custom query to collect from the external data source | | mode | String | Only replace is allowed for connectors | | external_data_source | String | External data source to use for Snowflake | | ingest_now | Boolean | To ingest the data immediately instead of waiting for the first execution determined by cron | --- URL: https://www.tinybird.co/docs/api-reference/environment-variables-api Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Environment Variables API · Tinybird Docs" theme-color: "#171612" description: "The Environment Variables API allows you to create, update, delete and list environment variables that can be used in Pipes in a Workspace." --- # Environment Variables API¶ Use the Environment Variables API to create, update, delete, and list environment variables that you can use in Pipes in a Workspace. Environment variables allow you to store sensitive information, such as access secrets and host names, in your Workspace. Environment variables are encrypted at rest. Using the Environment Variables API requires a Workspace admin token. ## Environment variables types¶ The Environment Variables API supports different types of environment variables: | Environment variable type | Comments | | --- | --- | | `secret` | Used to store passwords and other secrets, automatically prevents Endpoint from exposing its value. It's the default type. | ## Limits¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. The Environment Variables API has the following limits: - 5 requests per second. - 100 environment variables per Workspace. - 8 KB max size of the `value` attribute. ## Templating¶ After creating environment variables in a Workspace, you can use the `tb_secret` template function to replace the original value: % SELECT * FROM postgresql('host:post', 'database', 'table', 'user', {{tb_secret('pg_password')}}) Environment variables values are rendered as `String` data type. If you need to use a different type, use any of the [functions to cast a String value to a given type](https://www.tinybird.co/docs/docs/sql-reference/functions) . For example: % SELECT * FROM table WHERE int_value = toUInt8({{tb_secret('int_secret')}}) ## Staging and production use¶ If you have staging and production Workspaces, create the same environment variables with the same name in both Workspaces, changing only their corresponding value. Tinybird doesn't allow you to create an API Endpoint when exposing environment variables with `type=secret` in a SELECT clause. So, while it's possible to have a node that uses the logic `SELECT {{tb_secret('username')}}` , you can't publish that node as a Copy Pipe or API Endpoint. ## Branch use¶ You can use environment variables in Branches, but you must create them in the main Workspace initially. Environment variables have the same value in the main Workspace as in the Branches. You can't create a environment variable in a Branch to be deployed in the main Workspace. ## POST /v0/variables/?¶ Creates a new environment variable. ### Restrictions¶ Environment variables names are unique for a Workspace. ### Example¶ curl \ -X POST "https://$TB_HOST/v0/variables" \ -H "Authorization: Bearer " \ -d "type=secret" \ -d "name=test_password" \ -d "value=test" ### Request parameters¶ | Key | Type | Description | | --- | --- | --- | | type | String (optional) | The type of the variable. Defaults to `secret` | | name | String | The name of the variable | | value | String | The variable value | ### Successful response example¶ { "name": "test_token", "created_at": "2024-06-21T10:27:57", "updated_at": "2024-06-21T10:27:57", "edited_by": "token: 'admin token'" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 400 | Invalid or missing parameters | | 403 | Limit reached or invalid token | | 404 | Workspace not found | ## DELETE /v0/variables/(.\+)¶ Deletes a environment variable. ### Example¶ curl \ -X DELETE "https://$TB_HOST/v0/variables/test_password" \ -H "Authorization: Bearer " ### Successful response example¶ { "ok": true } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 400 | Invalid or missing parameters | | 403 | Limit reached or token invalid | | 404 | Workspace or variable not found | ## PUT /v0/variables/(.\+)¶ Updates a environment variable. ### Example¶ curl \ -X PUT "https://$TB_HOST/v0/variables/test_password" \ -H "Authorization: Bearer " \ -d "value=new_value" ### Successful response example¶ { "name": "test_password", "type": "secret", "created_at": "2024-06-21T10:27:57", "updated_at": "2024-06-21T10:29:57", "edited_by": "token: 'admin token'" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 400 | Invalid or missing parameters | | 403 | Limit reached or token invalid | | 404 | Workspace or variable not found | ## GET /v0/variables/?¶ Retrieves all Workspace environment variables. The value isn't returned. ### Example¶ curl \ -X GET "https://$TB_HOST/v0/variables" \ -H "Authorization: Bearer " ### Successful response example¶ { "variables": [ { "name": "test_token", "type": "secret", "created_at": "2024-06-21T10:27:57", "updated_at": "2024-06-21T10:27:57", "edited_by": "token: 'admin token'" }, { "name": "test_token2", "type": "secret", "created_at": "2024-06-21T10:27:57", "updated_at": "2024-06-21T10:29:57", "edited_by": "token: 'admin token'" } ] } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 400 | Invalid or missing parameters | | 403 | Limit reached or token invalid | | 404 | Workspace not found | ## GET /v0/variables/(.\+)¶ Fetches information about a particular environment variable. The value isn't returned. ### Example¶ curl \ -X GET "https://$TB_HOST/v0/variables/test_password" \ -H "Authorization: Bearer " ### Successful response example¶ { "name": "test_password", "type": "secret", "created_at": "2024-06-21T10:27:57", "updated_at": "2024-06-21T10:27:57", "edited_by": "token: 'admin token'" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 400 | Invalid or missing parameters | | 403 | Limit reached or token invalid | | 404 | Workspace or variable not found | --- URL: https://www.tinybird.co/docs/api-reference/events-api Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Events API Reference · Tinybird Docs" theme-color: "#171612" description: "With the Tinybird Events API you can ingest thousands of JSON events per second." --- # Events API¶ The Events API allows you to ingest JSON events with a simple HTTP POST request. See [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api). All endpoints require authentication using a Token with `DATASOURCE:APPEND` or `DATASOURCE:CREATE` scope. ## POST /v0/events¶ Use this endpoint to send NDJSON (new-line delimited JSON) events to a [Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources). ### Request parameters | Key | Type | Description | | --- | --- | --- | | name | String | name or ID of the target Data Source to append data to it | | wait | Boolean | 'false' by default. Set to 'true' to wait until the write is acknowledged by the database. Enabling this flag makes possible to retry on database errors, but it introduces additional latency. It's recommended to enable it in use cases in which data loss avoidance is critical. Disable it otherwise. | ### Return HTTP status codes | Status Code | Description | | --- | --- | | 200 | The data has been inserted into the database. The write has been acknowledged. The request 'wait' parameter was enabled. | | 202 | The data has been processed, and it will be send to the database eventually. The write hasn't been acknowledged yet. The request 'wait' parameter was disabled. | | 400 | The request is invalid. The body will contain more information. A common cause is missing the 'name' parameter. No data has been inserted, but the request shouldn't be retried. | | 403 | The token isn't valid. The request shouldn't be retried. | | 404 | The token's Workspace doesn't belong to this cluster. The Workspace is probably removed or in another cluster. The request shouldn't be retried, ensure the token's region and the Tinybird domain matches. | | 422 | The ingestion has been partially completed due to an error in a Materialized View. Retrying may result in data duplication, but not retrying may result in data loss. The general advice is to not retry, review attached Materialized Views, and contact us if the issue persists. | | 429 | The request/second limit has been reached. Default limit is 1000 requests/second, contact us for increased capacity. The request may be retried after a while. Use exponential backoff with a limited amount of retries. | | 500 | An unexpected error has occurred. The body will contain more information. Retrying is the general advice, contact with us if the issue persists. | | 503 | The service is temporarily unavailable. The body may contain more information. A common cause is to have reached a throughput limit, or to have attached a Materialized View with an issue. No data has been inserted, and it's safe to retry. Contact with us if the issue persists. | | 0x07 GOAWAY | HTTP2 only. Too many alive connections. Recreate the connection and retry. | ### Compression You can compress JSON events with Gzip and send the compressed payload to the Events API. You must include the header `Content-Encoding: gzip` with the request. ## Examples¶ ### NDJSON messages¶ The following example shows how to push single NDJSON messages using the Events API: ##### Send individual NDJSON messages curl \ -H "Authorization: Bearer " \ -d '{"date": "2020-04-05 00:05:38", "city": "Chicago"}' \ 'https://api.tinybird.co/v0/events?name=events_test' ### JSON messages¶ The following example shows how to push single JSON messages using the Events API: ##### Send individual JSON messages curl \ -H "Authorization: Bearer " \ -d $'{ \ "date": "2020-04-05 00:05:38", \ "city": "Chicago" \ }' \ 'https://api.tinybird.co/v0/events?name=events_test&format=json' ### Multiple NDJSON messages¶ The following example shows how to push multiple NDJSON messages using the Events API. Notice the '$' before the JSON events. It's needed in order for Bash to replace the '\n'. curl doesn't do it automatically. ##### Send many NDJSON events. curl \ -H "Authorization: Bearer " \ -d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' \ 'https://api.tinybird.co/v0/events?name=events_test' ### Gzip compressed payload¶ The following example shows how to push a Gzip compressed payload using the Events API, where 'body.gz' is a batch of NDJSON events. ##### Send a Gzip compressed payload. curl \ -H "Authorization: Bearer " \ -H "Content-Encoding: gzip" \ --data-binary @body.gz \ 'https://api.tinybird.co/v0/events?name=events_example' --- URL: https://www.tinybird.co/docs/api-reference/jobs-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Jobs API Reference · Tinybird Docs" theme-color: "#171612" description: "With the Jobs API, you can list the jobs for the last 48 hours or the last 100 jobs and also get the details for a specific job." --- GET /v0/jobs/? [¶](https://www.tinybird.co/docs/about:blank#get--v0-jobs-?) We can get a list of the last 100 jobs in the last 48 hours, with the possibility of filtering them by kind, status, pipe_id, pipe_name, created_after, and created_before. | Key | Type | Description | | --- | --- | --- | | kind | String | This will return only the jobs with that particular kind. Example: `kind=populateview` or `kind=copy` or `kind=import` | | status | String | This will return only the jobs with the status provided. Example: `status=done` or `status=waiting` or `status=working` or `status=error` | | pipe_id | String | This will return only the jobs associated with the provided pipe id. Example: `pipe_id=t_31a0ff508c9843b59c32f7f81a156968` | | pipe_name | String | This will return only the jobs associated with the provided pipe name. Example: `pipe_name=test_pipe` | | created_after | String | This will return jobs that were created after the provided date in the ISO 8601 standard date format. Example: `created_after=2023-06-15T18:13:25.855Z` | | created_before | String | This will return jobs that were created before the provided date in the ISO 8601 standard date format. Example: `created_before=2023-06-19T18:13:25.855Z` | Getting the latest jobs [¶](https://www.tinybird.co/docs/about:blank#id2) curl \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/jobs" \ You will get a list of jobs with the `kind`, `status`, `id` , and the `url` to access the specific information about that job. Jobs list [¶](https://www.tinybird.co/docs/about:blank#id3) { "jobs": [ { "id": "c8ae13ef-e739-40b6-8bd5-b1e07c8671c2", "kind": "import", "status": "done", "created_at": "2020-12-04 15:08:33.214377", "updated_at": "2020-12-04 15:08:33.396286", "job_url": "https://api.tinybird.co/v0/jobs/c8ae13ef-e739-40b6-8bd5-b1e07c8671c2", "datasource": { "id": "t_31a0ff508c9843b59c32f7f81a156968", "name": "my_datasource_1" } }, { "id": "1f6a5a3d-cfcb-4244-ba0b-0bfa1d1752fb", "kind": "import", "status": "error", "created_at": "2020-12-04 15:08:09.051310", "updated_at": "2020-12-04 15:08:09.263055", "job_url": "https://api.tinybird.co/v0/jobs/1f6a5a3d-cfcb-4244-ba0b-0bfa1d1752fb", "datasource": { "id": "t_49806938714f4b72a225599cdee6d3ab", "name": "my_datasource_2" } } ] } Job details in `job_url` will be available for 48h after its creation. POST /v0/jobs/(.+)/cancel [¶](https://www.tinybird.co/docs/about:blank#post--v0-jobs-(.+)-cancel) With this endpoint you can try to cancel an existing Job. All jobs can be cancelled if they are in the “waiting” status, but you can’t cancel a Job in “done” or “error” status. In the case of the job of type “populate”, you can cancel it in the “working” state. After successfully starting the cancellation process, you will see two different status in the job: - “cancelling”: The Job can’t be immediately cancelled as it’s doing some work, but the cancellation will eventually happen. - “cancelled”: The Job has been completely cancelled. A Job cancellation doesn’t guarantee a complete rollback of the changes being made by it, sometimes you will need to delete new inserted rows or datasources created. The fastest way to know if a job is cancellable, is just reading the “is_cancellable” key inside the job JSON description. Depending on the Job and its status, when you try to cancel it you may get different responses: - HTTP Code 200: The Job has successfully started the cancellation process. Remember that if the Job has now a “cancelling” status, it may need some time to completely cancel itself. This request will return the status of the job. - HTTP Code 404: Job not found. - HTTP Code 403: The token provided doesn’t have access to this Job. - HTTP Code 400: Job is not in a cancellable status or you are trying to cancel a job which is already in the “cancelling” state. Try to cancel a Job [¶](https://www.tinybird.co/docs/about:blank#id4) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/jobs/:job_id/cancel" Populate Job in cancelling state right after the cancellation request. [¶](https://www.tinybird.co/docs/about:blank#id5) { "kind": "populateview", "id": "32c3438d-582e-4a6f-9b57-7d7a3bfbeb8c", "job_id": "32c3438d-582e-4a6f-9b57-7d7a3bfbeb8c", "status": "cancelling", "created_at": "2021-03-17 18:56:23.939380", "updated_at": "2021-03-17 18:56:44.343245", "is_cancellable": false, "datasource": { "id": "t_02043945875b4070ae975f3812444b76", "name": "your_datasource_name", "cluster": null, "tags": {}, "created_at": "2020-07-15 10:55:12.427269", "updated_at": "2020-07-15 10:55:12.427270", "statistics": null, "replicated": false, "version": 0, "project": null, "used_by": [] }, "query_id": "01HSZ9WJES5QEZZM4TGDD3YFZ2", "pipe_id": "t_7fa8009023a245b696b4f2f7195b23c3", "pipe_name": "top_product_per_day", "queries": [ { "query_id": "01HSZ9WJES5QEZZM4TGDD3YFZ2", "status": "done" }, { "query_id": "01HSZ9WY6QS6XAMBHZMSNB1G75", "status": "done" }, { "query_id": "01HSZ9X8YVEQ0PXA6T2HZQFFPX", "status": "working" }, { "query_id": "01HSZQ5YX0517X81JBF9G9HB2P", "status": "waiting" }, { "query_id": "01HSZQ6PZJA3P81RC6Q6EF6HMK", "status": "waiting" }, { "query_id": "01HSZQ76D7YYFB16TFT32KXMCY", "status": "waiting" } ], "progress_percentage": 50.0 } GET /v0/jobs/(.+) [¶](https://www.tinybird.co/docs/about:blank#get--v0-jobs-(.+)) Get the details of a specific Job. You can get the details of a Job by using its ID. Get the details of a Job [¶](https://www.tinybird.co/docs/about:blank#id6) curl \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/jobs/:job_id" You will get a JSON response with the details of the Job, including the `kind`, `status`, `id`, `created_at`, `updated_at` , and the `datasource` associated with the Job. This is available for 48h after the Job creation. After that, you can consult the Job details in the Service Data Source jobs_log. Job details [¶](https://www.tinybird.co/docs/about:blank#id7) { "kind": "import", "id": "d5b869ed-3a74-45f9-af54-57350aae4cef", "job_id": "d5b869ed-3a74-45f9-af54-57350aae4cef", "status": "done", "created_at": "2024-07-22 11:47:58.207606", "updated_at": "2024-07-22 11:48:52.971327", "started_at": "2024-07-22 11:47:58.351734", "is_cancellable": false, "mode": "append", "datasource": { "id": "t_caf95c54174e48f488ea65d181eb5b75", "name": "events", "cluster": "default", "tags": { }, "created_at": "2024-07-22 11:47:51.807384", "updated_at": "2024-07-22 11:48:52.726243", "replicated": true, "version": 0, "project": null, "headers": { "cached_delimiter": "," }, "shared_with": [ ], "engine": { "engine": "MergeTree", "partition_key": "toYear(date)", "sorting_key": "date, user_id, event, extra_data" }, "description": "", "used_by": [ ], "last_commit": { "content_sha": "", "status": "changed", "path": "" }, "errors_discarded_at": null, "type": "csv" }, "import_id": "d5b869ed-3a74-45f9-af54-57350aae4cef", "url": "https://storage.googleapis.com/tinybird-assets/datasets/guides/events_50M_1.csv", "statistics": { "bytes": 1592720209, "row_count": 50000000 }, "quarantine_rows": 0, "invalid_lines": 0 } --- URL: https://www.tinybird.co/docs/api-reference/pipe-api Content: --- title: "Pipes API Reference · Tinybird Docs" theme-color: "#171612" description: "The Pipe API enables you to manage your Pipes. With Pipes you can transform data via SQL queries and publish the results of those queries as API Endpoints." --- GET /v0/pipes/? [¶](https://www.tinybird.co/docs/about:blank#get--v0-pipes-?) Get a list of pipes in your account. getting a list of your pipes [¶](https://www.tinybird.co/docs/about:blank#id1) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes" Pipes in the response will be the ones that are accessible using a particular token with read permissions for them. Successful response [¶](https://www.tinybird.co/docs/about:blank#id2) { "pipes": [{ "id": "t_55c39255e6b548dd98cb6da4b7d62c1c", "name": "my_pipe", "description": "This is a description", "endpoint": "t_h65c788b42ce4095a4789c0d6b0156c3", "created_at": "2022-11-10 12:39:38.106380", "updated_at": "2022-11-29 13:33:40.850186", "parent": null, "nodes": [{ "id": "t_h65c788b42ce4095a4789c0d6b0156c3", "name": "my_node", "sql": "SELECT col_a, col_b FROM my_data_source", "description": null, "materialized": null, "cluster": null, "tags": {}, "created_at": "2022-11-10 12:39:47.852303", "updated_at": "2022-11-10 12:46:54.066133", "version": 0, "project": null, "result": null, "ignore_sql_errors": false "node_type": "default" }], "url": "https://api.tinybird.co/v0/pipes/my_pipe.json" }] }| Key | Type | Description | | --- | --- | --- | | dependencies | boolean | The response will include the nodes dependent data sources and pipes, default is `false` | | attrs | String | comma separated list of the pipe attributes to return in the response. Example: `attrs=name,description` | | node_attrs | String | comma separated list of the node attributes to return in the response. Example `node_attrs=id,name` | Pipes id’s are immutable so you can always refer to them in your 3rd party applications to make them compatible with Pipes once they are renamed. For lighter JSON responses consider using the `attrs` and `node_attrs` params to return exactly the attributes you need to consume. POST /v0/pipes/? [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-?) Creates a new Pipe. There are 3 ways to create a Pipe Creating a Pipe providing full JSON [¶](https://www.tinybird.co/docs/about:blank#id4) curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ "https://api.tinybird.co/v0/pipes" \ -d '{ "name":"pipe_name", "description": "my first pipe", "nodes": [ {"sql": "select * from my_datasource limit 10", "name": "node_00", "description": "sampled data" }, {"sql": "select count() from node_00", "name": "node_01" } ] }' If you prefer to create the minimum Pipe, and then append your transformation nodes you can set your name and first transformation node’s SQL in your POST request Creating a pipe with a name and a SQL query [¶](https://www.tinybird.co/docs/about:blank#id5) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes?name=pipename&sql=select%20*%20from%20events" Pipes can be also created as copies of other Pipes. Just use the `from` argument: Creating a pipe from another pipe [¶](https://www.tinybird.co/docs/about:blank#id6) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes?name=pipename&from=src_pipe" Bear in mind, if you use this method to overwrite an existing Pipe, the endpoint will only be maintained if the node name is the same. POST /v0/pipes/(.+)/nodes [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-nodes) Appends a new node to a Pipe. adding a new node to a pipe [¶](https://www.tinybird.co/docs/about:blank#id7) curl \ -H "Authorization: Bearer " \ -d 'select * from node_0' "https://api.tinybird.co/v0/pipes/:name/nodes?name=node_name&description=explanation" Appends a new node that creates a Materialized View adding a Materialized View using a materialized node [¶](https://www.tinybird.co/docs/about:blank#id8) curl \ -H "Authorization: Bearer " \ -d 'select id, sum(amount) as amount, date from my_datasource' "https://api.tinybird.co/v0/pipes/:name/nodes?name=node_name&description=explanation&type=materialized&datasource=new_datasource&engine=AggregatingMergeTree"| Key | Type | Description | | --- | --- | --- | | name | String | The referenceable name for the node. | | description | String | Use it to store a more detailed explanation of the node. | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | | type | String | Optional. Available options are { `standard` (default), `materialized` , `endpoint` }. Use `materialized` to create a Materialized View from your new node. | | datasource | String | Required with `type=materialized` . Specifies the name of the destination Data Source where the Materialized View schema is defined. | | override_datasource | Boolean | Optional. Default `false` When the target Data Source of the Materialized View exists in the Workspace it’ll be overriden by the `datasource` specified in the request. | | populate | Boolean | Optional. Default `false` . When `true` , a job is triggered to populate the destination Data Source. | | populate_subset | Float | Optional. Populate with a subset percent of the data (limited to a maximum of 2M rows), this is useful to quickly test a materialized node with some data. The subset must be greater than 0 and lower than 0.1. A subset of 0.1 means a 10 percent of the data in the source Data Source will be used to populate the Materialized View. Use it together with `populate=true` , it has precedence over `populate_condition` | | populate_condition | String | Optional. Populate with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `populate_condition='date == toYYYYMM(now())'` it’ll populate taking all the rows from the trigger Data Source which `date` is the current month. Use it together with `populate=true` . `populate_condition` is not taken into account if the `populate_subset` param is present. Including in the `populate_condition` any column present in the Data Source `engine_sorting_key` will make the populate job process less data. | | unlink_on_populate_error | String | Optional. Default is `false` . If the populate job fails the Materialized View is unlinked and new data won’t be ingested in the Materialized View. | | engine | String | Optional. Engine for destination Materialized View. Requires the `type` parameter as `materialized` . | | engine_* | String | Optional. Engine parameters and options. Requires the `type` parameter as `materialized` and the `engine` parameter.[ Check Engine Parameters and Options for more details](https://www.tinybird.co/docs/docs/api-reference/datasource-api) | SQL query for the transformation node must be sent in the body encoded in utf-8 | Code | Description | | --- | --- | | 200 | No error | | 400 | empty or wrong SQL or API param value | | 403 | Forbidden. Provided token doesn’t have permissions to append a node to the pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found | | 409 | There’s another resource with the same name, names must be unique | The Materialized View already exists | `override_datasource` cannot be performed | DELETE /v0/pipes/(.+)/nodes/(.+) [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+)-nodes-(.+)) Drops a particular transformation node in the Pipe. It does not remove related nodes so this could leave the Pipe in an unconsistent state. For security reasons, enabled nodes can’t be removed. removing a node from a pipe [¶](https://www.tinybird.co/docs/about:blank#id11) curl -X DELETE "https://api.tinybird.co/v0/pipes/:name/nodes/:node_id"| Code | Description | | --- | --- | | 204 | No error, removed node | | 400 | The node is published. Published nodes can’t be removed | | 403 | Forbidden. Provided token doesn’t have permissions to change the last node of the pipe, it needs ADMIN or IMPORT | | 404 | Pipe not found | PUT /v0/pipes/(.+)/nodes/(.+) [¶](https://www.tinybird.co/docs/about:blank#put--v0-pipes-(.+)-nodes-(.+)) Changes a particular transformation node in the Pipe Editing a Pipe’s transformation node [¶](https://www.tinybird.co/docs/about:blank#id13) curl -X PUT \ -H "Authorization: Bearer " \ -d 'select * from node_0' "https://api.tinybird.co/v0/pipes/:name/nodes/:node_id?name=new_name&description=updated_explanation"| Key | Type | Description | | --- | --- | --- | | name | String | new name for the node | | description | String | new description for the node | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | Please, note that the desired SQL query should be sent in the body encoded in utf-8. | Code | Description | | --- | --- | | 200 | No error | | 400 | Empty or wrong SQL | | 403 | Forbidden. Provided token doesn’t have permissions to change the last node to the pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found | | 409 | There’s another resource with the same name, names must be unique | GET /v0/pipes/(.+)\.(json|csv|ndjson|parquet|prometheus) [¶](https://www.tinybird.co/docs/about:blank#get--v0-pipes-(.+)%5C.(json%7Ccsv%7Cndjson%7Cparquet%7Cprometheus)) Returns the published node data in a particular format. Getting data for a pipe [¶](https://www.tinybird.co/docs/about:blank#pipe-get-data) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name.format"| Key | Type | Description | | --- | --- | --- | | q | String | Optional, query to execute, see API Query endpoint | | output_format_json_quote_64bit_integers | int | (Optional) Controls quoting of 64-bit or bigger integers (like UInt64 or Int128) when they are output in a JSON format. Such integers are enclosed in quotes by default. This behavior is compatible with most JavaScript implementations. Possible values: 0 — Integers are output without quotes. 1 — Integers are enclosed in quotes. Default value is 0 | | output_format_json_quote_denormals | int | (Optional) Controls representation of inf and nan on the UI instead of null e.g when dividing by 0 - inf and when there is no representation of a number in Javascript - nan. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | | output_format_parquet_string_as_string | int | (Optional) Use Parquet String type instead of Binary for String columns. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | The `q` parameter is a SQL query (see [Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) ). When using this endpoint to query your Pipes, you can use the `_` shortcut, which refers to your Pipe name | format | Description | | --- | --- | | csv | CSV with header | | json | JSON including data, statistics and schema information | | ndjson | One JSON object per each row | | parquet | A Parquet file. Some libraries might not properly process `UInt*` data types, if that’s your case cast those columns to signed integers with `toInt*` functions. `String` columns are exported as `Binary` , take that into account when reading the resulting Parquet file, most libraries convert from Binary to String (e.g. Spark has this configuration param: `spark.sql.parquet.binaryAsString` ) | | prometheus | Prometheus text-based format. The output table must include name (String) and value (number) as required columns, with optional help (String), timestamp (number), and type (String) (valid values: counter, gauge, histogram, summary, untyped, or empty). Labels should be a Map(String, String), and rows for the same metric with different labels must appear consecutively. The table must be sorted by the name column. | POST /v0/pipes/(.+)\.(json|csv|ndjson|parquet|prometheus) [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)%5C.(json%7Ccsv%7Cndjson%7Cparquet%7Cprometheus)) Returns the published node data in a particular format, passing the parameters in the request body. Use this endpoint when the query is too long to be passed as a query string parameter. When using the post endpoint, there are no traces of the parameters in the pipe_stats_rt Data Source. See the get endpoint for more information. GET /v0/pipes/(.+\.pipe) [¶](https://www.tinybird.co/docs/about:blank#get--v0-pipes-(.+%5C.pipe)) Get pipe information. Provided Auth Token must have read access to the Pipe. Getting information about a particular pipe [¶](https://www.tinybird.co/docs/about:blank#id16) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name" `pipe_id` and `pipe_name` are two ways to refer to the pipe in SQL queries and API endpoints the only difference is `pipe_id` never changes so it’ll work even if you change the `pipe_name` (which is the name used to display the pipe). In general you can use `pipe_id` or `pipe_name` indistinctly: Successful response [¶](https://www.tinybird.co/docs/about:blank#id17) { "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "events_pipe", "pipeline": { "nodes": [{ "name": "events_ds_0" "sql": "select * from events_ds_log__raw", "materialized": false }, { "name": "events_ds", "sql": "select * from events_ds_0 where valid = 1", "materialized": false }] } } You can make your Pipe’s id more descriptive by prepending information such as `t_my_events_table.bd1c62b5e67142bd9bf9a7f113a2b6ea` DELETE /v0/pipes/(.+\.pipe) [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+%5C.pipe)) Drops a Pipe from your account. Auth token in use must have the `DROP:NAME` scope. Dropping a pipe [¶](https://www.tinybird.co/docs/about:blank#id18) curl -X DELETE \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name" PUT /v0/pipes/(.+\.pipe) [¶](https://www.tinybird.co/docs/about:blank#put--v0-pipes-(.+%5C.pipe)) Changes Pipe’s metadata. When there is another Pipe with the same name an error is raised. editing a pipe [¶](https://www.tinybird.co/docs/about:blank#id19) curl -X PUT \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name?name=new_name"| Key | Type | Description | | --- | --- | --- | | name | String | new name for the pipe | | description | String | new Markdown description for the pipe | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | GET /v0/pipes/(.+) [¶](https://www.tinybird.co/docs/about:blank#get--v0-pipes-(.+)) Get pipe information. Provided Auth Token must have read access to the Pipe. Getting information about a particular pipe [¶](https://www.tinybird.co/docs/about:blank#id21) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name" `pipe_id` and `pipe_name` are two ways to refer to the pipe in SQL queries and API endpoints the only difference is `pipe_id` never changes so it’ll work even if you change the `pipe_name` (which is the name used to display the pipe). In general you can use `pipe_id` or `pipe_name` indistinctly: Successful response [¶](https://www.tinybird.co/docs/about:blank#id22) { "id": "t_bd1c62b5e67142bd9bf9a7f113a2b6ea", "name": "events_pipe", "pipeline": { "nodes": [{ "name": "events_ds_0" "sql": "select * from events_ds_log__raw", "materialized": false }, { "name": "events_ds", "sql": "select * from events_ds_0 where valid = 1", "materialized": false }] } } You can make your Pipe’s id more descriptive by prepending information such as `t_my_events_table.bd1c62b5e67142bd9bf9a7f113a2b6ea` DELETE /v0/pipes/(.+) [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+)) Drops a Pipe from your account. Auth token in use must have the `DROP:NAME` scope. Dropping a pipe [¶](https://www.tinybird.co/docs/about:blank#id23) curl -X DELETE \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name" PUT /v0/pipes/(.+) [¶](https://www.tinybird.co/docs/about:blank#put--v0-pipes-(.+)) Changes Pipe’s metadata. When there is another Pipe with the same name an error is raised. editing a pipe [¶](https://www.tinybird.co/docs/about:blank#id24) curl -X PUT \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:name?name=new_name"| Key | Type | Description | | --- | --- | --- | | name | String | new name for the pipe | | description | String | new Markdown description for the pipe | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | --- URL: https://www.tinybird.co/docs/api-reference/pipe-api/api-endpoints Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Pipes API endpoints reference · Tinybird Docs" theme-color: "#171612" description: "The Pipes API helps you manage your Pipes. Use the API Endpoints service to publish or unpublish your Pipes as API Endpoints." --- POST /v0/pipes/(.+)/nodes/(.+)/endpoint [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-nodes-(.+)-endpoint) Publishes an API endpoint Publishing an endpoint [¶](https://www.tinybird.co/docs/about:blank#id1) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/endpoint" Successful response [¶](https://www.tinybird.co/docs/about:blank#id2) { "id": "t_60d8f84ce5d349b28160013ce99758c7", "name": "my_pipe", "description": "this is my pipe description", "nodes": [{ "id": "t_bd1e095da943494d9410a812b24cea81", "name": "get_all", "sql": "SELECT * FROM my_datasource", "description": "This is a description for the **first** node", "materialized": null, "cluster": null, "dependencies": [ "my_datasource" ], "tags": {}, "created_at": "2019-09-03 19:56:03.704840", "updated_at": "2019-09-04 07:05:53.191437", "version": 0, "project": null, "result": null, "ignore_sql_errors": false }], "endpoint": "t_bd1e095da943494d9410a812b24cea81", "created_at": "2019-09-03 19:56:03.193446", "updated_at": "2019-09-10 07:18:39.797083", "parent": null } The response will contain a `token` if there’s a **unique READ token** for this pipe. You could use this token to share your endpoint. | Code | Description | | --- | --- | | 200 | No error | | 400 | Wrong node id | | 403 | Forbidden. Provided token doesn’t have permissions to publish a pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found | DELETE /v0/pipes/(.+)/nodes/(.+)/endpoint [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+)-nodes-(.+)-endpoint) Unpublishes an API endpoint Unpublishing an endpoint [¶](https://www.tinybird.co/docs/about:blank#id4) curl -X DELETE \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/endpoint"| Code | Description | | --- | --- | | 200 | No error | | 400 | Wrong node id | | 403 | Forbidden. Provided token doesn’t have permissions to publish a pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found | --- URL: https://www.tinybird.co/docs/api-reference/pipe-api/copy-pipes-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Pipes API Copy Pipes reference · Tinybird Docs" theme-color: "#171612" description: "The Pipes API enables you to manage your Pipes. Use the Copy Pipes service to create, delete, schedule, and trigger Copy jobs." --- POST /v0/pipes/(.+)/nodes/(.+)/copy [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-nodes-(.+)-copy) Calling this endpoint sets the pipe as a Copy one with the given settings. Scheduling is optional. To run the actual copy after you set the pipe as a Copy one, you must call the POST `/v0/pipes/:pipe/copy` endpoint. If you need to change the target Data Source or the scheduling configuration, you can call PUT endpoint. Restrictions: - You can set only one schedule per Copy pipe. - You can’t set a Copy pipe if the pipe is already materializing. You must unlink the Materialization first. - You can’t set a Copy pipe if the pipe is already an endpoint. You must unpublish the endpoint first. Setting the pipe as a Copy pipe [¶](https://www.tinybird.co/docs/about:blank#id1) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/copy" \ -d "target_datasource=my_destination_datasource" \ -d "schedule_cron=*/15 * * * *"| Key | Type | Description | | --- | --- | --- | | token | String | Auth token. Ensure it has the `PIPE:CREATE` and `DATASOURCE:APPEND` scopes on it | | target_datasource | String | Name or the id of the target Data Source. | | schedule_cron | String | Optional. A crontab expression. | Successful response [¶](https://www.tinybird.co/docs/about:blank#id3) { "id": "t_3aa11a5cabd1482c905bc8dfc551a84d", "name": "my_copy_pipe", "description": "This is a pipe to copy", "type": "copy", "endpoint": null, "created_at": "2023-03-01 10:14:04.497505", "updated_at": "2023-03-01 10:34:19.113518", "parent": null, "copy_node": "t_33ec8ac3c3324a53822fded61a83dbbd", "copy_target_datasource": "t_0be6161a5b7b4f6180b10325643e0b7b", "copy_target_workspace": "5a70f2f5-9635-47bf-96a9-7b50362d4e2f", "nodes": [{ "id": "t_33ec8ac3c3324a53822fded61a83dbbd", "name": "emps", "sql": "SELECT * FROM employees WHERE starting_date > '2016-01-01 00:00:00'", "description": null, "materialized": null, "cluster": null, "mode": "append", "tags": { "copy_target_datasource": "t_0be6161a5b7b4f6180b10325643e0b7b", "copy_target_workspace": "5a70f2f5-9635-47bf-96a9-7b50362d4e2f" }, "created_at": "2023-03-01 10:14:04.497547", "updated_at": "2023-03-01 10:14:04.497547", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "dependencies": [ "employees" ], "params": [] }] } DELETE /v0/pipes/(.+)/nodes/(.+)/copy [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+)-nodes-(.+)-copy) Removes the Copy type of the pipe. By removing the Copy type, nor the node nor the pipe are deleted. The pipe will still be present, but will stop any scheduled and copy settings. Unsetting the pipe as a Copy pipe [¶](https://www.tinybird.co/docs/about:blank#id4) curl -X DELETE \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/copy"| Code | Description | | --- | --- | | 204 | No error | | 400 | Wrong node id | | 403 | Forbidden. Provided token doesn’t have permissions to publish a pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found | PUT /v0/pipes/(.+)/nodes/(.+)/copy [¶](https://www.tinybird.co/docs/about:blank#put--v0-pipes-(.+)-nodes-(.+)-copy) Calling this endpoint will update a Copy pipe with the given settings: you can change its target Data Source, as well as adding or modifying its schedule. Updating a Copy Pipe [¶](https://www.tinybird.co/docs/about:blank#id6) curl -X PUT \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/copy" \ -d "target_datasource=other_destination_datasource" \ -d "schedule_cron=*/15 * * * *"| Key | Type | Description | | --- | --- | --- | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | | target_datasource | String | Optional. Name or the id of the target Data Source. | | schedule_cron | String | Optional. A crontab expression. If schedule_cron=’None’ the schedule will be removed from the copy pipe, if it was defined | Successful response [¶](https://www.tinybird.co/docs/about:blank#id8) { "id": "t_3aa11a5cabd1482c905bc8dfc551a84d", "name": "my_copy_pipe", "description": "This is a pipe to copy", "type": "copy", "endpoint": null, "created_at": "2023-03-01 10:14:04.497505", "updated_at": "2023-03-01 10:34:19.113518", "parent": null, "copy_node": "t_33ec8ac3c3324a53822fded61a83dbbd", "copy_target_datasource": "t_2f046a4b2cc44137834a35420a533465", "copy_target_workspace": "5a70f2f5-9635-47bf-96a9-7b50362d4e2f", "nodes": [{ "id": "t_33ec8ac3c3324a53822fded61a83dbbd", "name": "emps", "sql": "SELECT * FROM employees WHERE starting_date > '2016-01-01 00:00:00'", "description": null, "materialized": null, "cluster": null, "mode": "append", "tags": { "copy_target_datasource": "t_2f046a4b2cc44137834a35420a533465", "copy_target_workspace": "5a70f2f5-9635-47bf-96a9-7b50362d4e2f" }, "created_at": "2023-03-01 10:14:04.497547", "updated_at": "2023-03-07 09:08:34.206123", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "dependencies": [ "employees" ], "params": [] }] } POST /v0/pipes/(.+)/copy [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-copy) Runs a copy job, using the settings previously set in the pipe. You can use this URL to do an on-demand copy. This URL is also used by the scheduler to make the programmed calls. This URL accepts parameters, just like in a regular endpoint. This operation is asynchronous and will copy the output of the endpoint to an existing datasource. Runs a copy job on a Copy pipe [¶](https://www.tinybird.co/docs/about:blank#id9) curl -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/pipes/:pipe/copy?param1=test¶m2=test2"| Key | Type | Description | | --- | --- | --- | | token | String | Auth token. Ensure it has the `PIPE:READ` scope on it | | parameters | String | Optional. The value of the parameters to run the Copy with. They are regular URL query parameters. | | _mode | String | Optional. One of ‘append’ or ‘replace’. Default is ‘append’. | | Code | Description | | --- | --- | | 200 | No error | | 400 | Pipe is not a Copy pipe or there is a problem with the SQL query | | 400 | The columns in the SQL query don’t match the columns in the target Data Source | | 403 | Forbidden. The provided token doesn’t have permissions to append a node to the pipe ( `ADMIN` or `PIPE:READ` and `DATASOURCE:APPEND` ) | | 403 | Job limits exceeded. Tried to copy more than 100M rows, or there are too many active (working and waiting) Copy jobs. | | 404 | Pipe not found, Node not found or Target Data Source not found | The response will not be the final result of the copy but a Job. You can check the job status and progress using the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api). Successful response [¶](https://www.tinybird.co/docs/about:blank#id12) { "id": "t_33ec8ac3c3324a53822fded61a83dbbd", "name": "emps", "sql": "SELECT * FROM employees WHERE starting_date > '2016-01-01 00:00:00'", "description": null, "materialized": null, "cluster": null, "tags": { "copy_target_datasource": "t_0be6161a5b7b4f6180b10325643e0b7b", "copy_target_workspace": "5a70f2f5-9635-47bf-96a9-7b50362d4e2f" }, "created_at": "2023-03-01 10:14:04.497547", "updated_at": "2023-03-01 10:14:04.497547", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "dependencies": [ "employees" ], "params": [], "job": { "kind": "copy", "id": "f0b2f107-0af8-4c28-a83b-53053cb45f0f", "job_id": "f0b2f107-0af8-4c28-a83b-53053cb45f0f", "status": "waiting", "created_at": "2023-03-01 10:41:07.398102", "updated_at": "2023-03-01 10:41:07.398128", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_0be6161a5b7b4f6180b10325643e0b7b" }, "query_id": "19a8d613-b424-4afd-95f1-39cfbd87e827", "query_sql": "SELECT * FROM d_b0ca70.t_25f928e33bcb40bd8e8999e69cb02f94 AS employees WHERE starting_date > '2016-01-01 00:00:00'", "pipe_id": "t_3aa11a5cabd1482c905bc8dfc551a84d", "pipe_name": "copy_emp", "job_url": "https://api.tinybird.co/v0/jobs/f0b2f107-0af8-4c28-a83b-53053cb45f0f" } } POST /v0/pipes/(.+)/copy/pause [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-copy-pause) Pauses the scheduling. This affects any future scheduled Copy job. Any copy operation currently copying data will be completed. Pauses a scheduled copy [¶](https://www.tinybird.co/docs/about:blank#id13) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/copy/pause"| Code | Description | | --- | --- | | 200 | Scheduled copy paused correctly | | 400 | Pipe is not copy | | 404 | Pipe not found, Scheduled copy for pipe not found | POST /v0/pipes/(.+)/copy/resume [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-copy-resume) Resumes a previously paused scheduled copy. Resumes a Scheduled copy [¶](https://www.tinybird.co/docs/about:blank#id15) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/copy/resume"| Code | Description | | --- | --- | | 200 | Scheduled copy resumed correctly | | 400 | Pipe is not copy | | 404 | Pipe not found, Scheduled copy for pipe not found | POST /v0/pipes/(.+)/copy/cancel [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-copy-cancel) Cancels jobs that are working or waiting that are tied to the pipe and pauses the scheduling of copy jobs for this pipe. To allow scheduled copy jobs to run for the pipe, the copy pipe must be resumed and the already cancelled jobs will not be resumed. Cancels scheduled copy jobs tied to the pipe [¶](https://www.tinybird.co/docs/about:blank#id17) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/pipes/:pipe/copy/cancel"| Code | Description | | --- | --- | | 200 | Scheduled copy pipe cancelled correctly | | 400 | Pipe is not copy | | 400 | Job is not in cancellable status | | 400 | Job is already being cancelled | | 404 | Pipe not found, Scheduled copy for pipe not found | Successful response [¶](https://www.tinybird.co/docs/about:blank#id19) { "id": "t_fb56a87a520441189a5a6d61f8d968f4", "name": "scheduled_copy_pipe", "description": "none", "endpoint": "none", "created_at": "2023-06-09 10:54:21.847433", "updated_at": "2023-06-09 10:54:21.897854", "parent": "none", "type": "copy", "copy_node": "t_bb96e50cb1b94ffe9e598f870d88ad1b", "copy_target_datasource": "t_3f7e6534733f425fb1add6229ca8be4b", "copy_target_workspace": "8119d519-80b2-454a-a114-b092aea3b9b0", "schedule": { "timezone": "Etc/UTC", "cron": "0 * * * *", "status": "paused" }, "nodes": [ { "id": "t_bb96e50cb1b94ffe9e598f870d88ad1b", "name": "scheduled_copy_pipe_0", "sql": "SELECT * FROM landing_ds", "description": "none", "materialized": "none", "cluster": "none", "tags": { "copy_target_datasource": "t_3f7e6534733f425fb1add6229ca8be4b", "copy_target_workspace": "8119d519-80b2-454a-a114-b092aea3b9b0" }, "created_at": "2023-06-09 10:54:21.847451", "updated_at": "2023-06-09 10:54:21.847451", "version": 0, "project": "none", "result": "none", "ignore_sql_errors": "false", "node_type": "copy", "dependencies": [ "landing_ds" ], "params": [] } ], "cancelled_jobs": [ { "id": "ced3534f-8b5e-4fe0-8dcc-4369aa256b11", "kind": "copy", "status": "cancelled", "created_at": "2023-06-09 07:54:21.921446", "updated_at": "2023-06-09 10:54:22.043272", "job_url": "https://api.tinybird.co/v0/jobsjobs/ced3534f-8b5e-4fe0-8dcc-4369aa256b11", "is_cancellable": "false", "pipe_id": "t_fb56a87a520441189a5a6d61f8d968f4", "pipe_name": "pipe_test_scheduled_copy_pipe_cancel_multiple_jobs", "datasource": { "id": "t_3f7e6534733f425fb1add6229ca8be4b", "name": "target_ds_test_scheduled_copy_pipe_cancel_multiple_jobs" } }, { "id": "b507ded9-9862-43ae-8863-b6de17c3f914", "kind": "copy", "status": "cancelling", "created_at": "2023-06-09 07:54:21.903036", "updated_at": "2023-06-09 10:54:22.044837", "job_url": "https://api.tinybird.co/v0/jobsb507ded9-9862-43ae-8863-b6de17c3f914", "is_cancellable": "false", "pipe_id": "t_fb56a87a520441189a5a6d61f8d968f4", "pipe_name": "pipe_test_scheduled_copy_pipe_cancel_multiple_jobs", "datasource": { "id": "t_3f7e6534733f425fb1add6229ca8be4b", "name": "target_ds_test_scheduled_copy_pipe_cancel_multiple_jobs" } } ] } --- URL: https://www.tinybird.co/docs/api-reference/pipe-api/materialized-views Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Pipes API Materialized Views and Populates reference · Tinybird Docs" theme-color: "#171612" description: "The Pipes API enables you to manage your Pipes. Use the Materialized Views and Populates service to create, delete, or populate Materialized Views." --- POST /v0/pipes/(.+)/nodes/(.+)/population [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-nodes-(.+)-population) Populates a Materialized View Populating a Materialized View [¶](https://www.tinybird.co/docs/about:blank#id1) curl -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/population" \ -d "populate_condition=toYYYYMM(date) = 202203" The response will not be the final result of the import but a Job. You can check the job status and progress using the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api). Alternatively you can use a query like this to check the operations related to the populate Job: Check populate jobs in the datasources_ops_log including dependent Materialized Views triggered [¶](https://www.tinybird.co/docs/about:blank#id2) SELECT * FROM tinybird. datasources_ops_log WHERE timestamp > now () - INTERVAL 1 DAY AND operation_id IN ( SELECT operation_id FROM tinybird. datasources_ops_log WHERE timestamp > now () - INTERVAL 1 DAY and datasource_id = '{the_datasource_id}' and job_id = '{the_job_id}' ) ORDER BY timestamp ASC When a populate job fails for the first time, the Materialized View is automatically unlinked. In that case you can get failed population jobs and their errors to fix them with a query like this: Check failed populate jobs [¶](https://www.tinybird.co/docs/about:blank#id3) SELECT * FROM tinybird. datasources_ops_log WHERE datasource_id = '{the_datasource_id}' AND pipe_name = '{the_pipe_name}' AND event_type LIKE 'populateview%' AND result = 'error' ORDER BY timestamp ASC Alternatively you can use the `unlink_on_populate_error='true'` flag to always unlink the Materialized View if the populate job does not work as expected. | Key | Type | Description | | --- | --- | --- | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | | populate_subset | Float | Optional. Populate with a subset percent of the data (limited to a maximum of 2M rows), this is useful to quickly test a materialized node with some data. The subset must be greater than 0 and lower than 0.1. A subset of 0.1 means a 10 percent of the data in the source Data Source will be used to populate the Materialized View. It has precedence over `populate_condition` | | populate_condition | String | Optional. Populate with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `populate_condition='date == toYYYYMM(now())'` it’ll populate taking all the rows from the trigger Data Source which `date` is the current month. `populate_condition` is not taken into account if the `populate_subset` param is present. Including in the `populate_condition` any column present in the Data Source `engine_sorting_key` will make the populate job process less data. | | truncate | String | Optional. Default is `false` . Populates over existing data, useful to populate past data while new data is being ingested. Use `true` to truncate the Data Source before populating. | | unlink_on_populate_error | String | Optional. Default is `false` . If the populate job fails the Materialized View is unlinked and new data won’t be ingested in the Materialized View. | | Code | Description | | --- | --- | | 200 | No error | | 400 | Node is not materialized | | 403 | Forbidden. Provided token doesn’t have permissions to append a node to the pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found, Node not found | POST /v0/pipes/(.+)/nodes/(.+)/materialization [¶](https://www.tinybird.co/docs/about:blank#post--v0-pipes-(.+)-nodes-(.+)-materialization) Creates a Materialized View Creating a Materialized View [¶](https://www.tinybird.co/docs/about:blank#id6) curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/materialization?datasource=my_data_source_name&populate=true"| Key | Type | Description | | --- | --- | --- | | token | String | Auth token. Ensure it has the `PIPE:CREATE` scope on it | | datasource | String | Required. Specifies the name of the destination Data Source where the Materialized View schema is defined. If the Data Source does not exist, it creates automatically with the default settings. | | override_datasource | Boolean | Optional. Default `false` When the target Data Source of the Materialized View exists in the Workspace it’ll be overriden by the `datasource` specified in the request. | | populate | Boolean | Optional. Default `false` . When `true` , a job is triggered to populate the destination datasource. | | populate_subset | Float | Optional. Populate with a subset percent of the data (limited to a maximum of 2M rows), this is useful to quickly test a materialized node with some data. The subset must be greater than 0 and lower than 0.1. A subset of 0.1 means a 10 percent of the data in the source Data Source will be used to populate the Materialized View. Use it together with `populate=true` , it has precedence over `populate_condition` | | populate_condition | String | Optional. Populate with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `populate_condition='date == toYYYYMM(now())'` it’ll populate taking all the rows from the trigger Data Source which `date` is the current month. Use it together with `populate=true` . `populate_condition` is not taken into account if the `populate_subset` param is present. Including in the `populate_condition` any column present in the Data Source `engine_sorting_key` will make the populate job process less data. | | unlink_on_populate_error | String | Optional. Default is `false` . If the populate job fails the Materialized View is unlinked and new data won’t be ingested in the Materialized View. | | engine | String | Optional. Engine for destination Materialized View. If the Data Source already exists, the settings are not overriden. | | engine_* | String | Optional. Engine parameters and options. Requires the `engine` parameter. If the Data Source already exists, the settings are not overriden.[ Check Engine Parameters and Options for more details](https://www.tinybird.co/docs/docs/api-reference/datasource-api) | SQL query for the materialized node must be sent in the body encoded in utf-8 | Code | Description | | --- | --- | | 200 | No error | | 400 | Node already being materialized | | 403 | Forbidden. Provided token doesn’t have permissions to append a node to the pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found, Node not found | | 409 | The Materialized View already exists or `override_datasource` cannot be performed | DELETE /v0/pipes/(.+)/nodes/(.+)/materialization [¶](https://www.tinybird.co/docs/about:blank#delete--v0-pipes-(.+)-nodes-(.+)-materialization) Removes a Materialized View By removing a Materialized View, nor the Data Source nor the Node are deleted. The Data Source will still be present, but will stop receiving data from the Node. Removing a Materialized View [¶](https://www.tinybird.co/docs/about:blank#id9) curl -H "Authorization: Bearer " \ -X DELETE "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/materialization"| Code | Description | | --- | --- | | 204 | No error, Materialized View removed | | 403 | Forbidden. Provided token doesn’t have permissions to append a node to the pipe, it needs `ADMIN` or `PIPE:CREATE` | | 404 | Pipe not found, Node not found | --- URL: https://www.tinybird.co/docs/api-reference/query-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Query API Reference · Tinybird Docs" theme-color: "#171612" description: "The Query API allows you to query your Pipes inside Tinybird as if you were running SQL statements against a regular database." --- GET /v0/sql [¶](https://www.tinybird.co/docs/about:blank#get--v0-sql) Executes a SQL query using the engine. Running sql queries against your data [¶](https://www.tinybird.co/docs/about:blank#id1) curl --data "SELECT * FROM " https://api.tinybird.co/v0/sql As a response, it gives you the query metadata, the resulting data and some performance statistics. Successful response [¶](https://www.tinybird.co/docs/about:blank#id2) { "meta": [ { "name": "VendorID", "type": "Int32" }, { "name": "tpep_pickup_datetime", "type": "DateTime" } ], "data": [ { "VendorID": 2, "tpep_pickup_datetime": "2001-01-05 11:45:23", "tpep_dropoff_datetime": "2001-01-05 11:52:05", "passenger_count": 5, "trip_distance": 1.53, "RatecodeID": 1, "store_and_fwd_flag": "N", "PULocationID": 71, "DOLocationID": 89, "payment_type": 2, "fare_amount": 7.5, "extra": 0.5, "mta_tax": 0.5, "tip_amount": 0, "tolls_amount": 0, "improvement_surcharge": 0.3, "total_amount": 8.8 }, { "VendorID": 2, "tpep_pickup_datetime": "2002-12-31 23:01:55", "tpep_dropoff_datetime": "2003-01-01 14:59:11" } ], "rows": 3, "rows_before_limit_at_least": 4, "statistics": { "elapsed": 0.00091042, "rows_read": 4, "bytes_read": 296 } } Data can be fetched in different formats. Just append `FORMAT ` to your SQL query: Requesting different formats with SQL [¶](https://www.tinybird.co/docs/about:blank#id3) SELECT count () from < pipe > FORMAT JSON| Key | Type | Description | | --- | --- | --- | | q | String | The SQL query | | pipeline | String | (Optional) The name of the pipe. It allows writing a query like ‘SELECT * FROM _’ where ‘_’ is a placeholder for the ‘pipeline’ parameter | | output_format_json_quote_64bit_integers | int | (Optional) Controls quoting of 64-bit or bigger integers (like UInt64 or Int128) when they are output in a JSON format. Such integers are enclosed in quotes by default. This behavior is compatible with most JavaScript implementations. Possible values: 0 — Integers are output without quotes. 1 — Integers are enclosed in quotes. Default value is 0 | | output_format_json_quote_denormals | int | (Optional) Controls representation of inf and nan on the UI instead of null e.g when dividing by 0 - inf and when there is no representation of a number in Javascript - nan. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | | output_format_parquet_string_as_string | int | (Optional) Use Parquet String type instead of Binary for String columns. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | | format | Description | | --- | --- | | CSV | CSV without header | | CSVWithNames | CSV with header | | JSON | JSON including data, statistics and schema information | | TSV | TSV without header | | TSVWithNames | TSV with header | | PrettyCompact | Formatted table | | JSONEachRow | Newline-delimited JSON values (NDJSON) | | Parquet | Apache Parquet | | Prometheus | Prometheus text-based format | As you can see in the example above, timestamps do not include a time zone in their serialization. Let’s see how that relates to timestamps ingested from your original data: - If the original timestamp had no time zone associated, you’ll read back the same date and time verbatim. If you ingested the timestamp `2022-11-14 11:08:46` , for example, Tinybird sends `"2022-11-14 11:08:46"` back. This is so regardless of the time zone of the column in ClickHouse. - If the original timestamp had a time zone associated, you’ll read back the corresponding date and time in the time zone of the destination column in ClickHouse, which is UTC by default. If you ingested `2022-11-14 12:08:46.574295 +0100` , for instance, Tinybird sends `"2022-11-14 11:08:46"` back for a `DateTime` , and `"2022-11-14 06:08:46"` for a `DateTime('America/New_York')` . POST /v0/sql [¶](https://www.tinybird.co/docs/about:blank#post--v0-sql) Executes a SQL query using the engine, while providing a templated or non templated query string and the custom parameters that will be translated into the query. The custom parameters provided should not have the same name as the request parameters for this endpoint (outlined below), as they are reserved in order to get accurate results for your query. Running sql queries against your data [¶](https://www.tinybird.co/docs/about:blank#id6) For example: 1. Providing the value to the query via the POST body: curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ "https://api.tinybird.co/v0/sql" -d \ '{ "q":"% SELECT * FROM where column_name = {{String(column_name)}}", "column_name": "column_name_value" }' 2. Providing a new value to the query from the one defined within the pipe in the POST body: curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ "https://api.tinybird.co/v0/sql" -d \ '{ "q":"% SELECT * FROM where column_name = {{String(column_name, "column_name_value")}}", "column_name": "new_column_name_value" }' 3. Providing a non template query in the POST body: curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: application/json" \ "https://api.tinybird.co/v0/sql" -d \ '{ "q":"SELECT * FROM " }' 4. Providing a non template query as a string in the POST body with a content type of "text/plain" : curl -X POST \ -H "Authorization: Bearer " \ -H "Content-Type: text/plain" \ "https://api.tinybird.co/v0/sql" -d "SELECT * FROM "| Key | Type | Description | | --- | --- | --- | | pipeline | String | (Optional) The name of the pipe. It allows writing a query like ‘SELECT * FROM _’ where ‘_’ is a placeholder for the ‘pipeline’ parameter | | output_format_json_quote_64bit_integers | int | (Optional) Controls quoting of 64-bit or bigger integers (like UInt64 or Int128) when they are output in a JSON format. Such integers are enclosed in quotes by default. This behavior is compatible with most JavaScript implementations. Possible values: 0 — Integers are output without quotes. 1 — Integers are enclosed in quotes. Default value is 0 | | output_format_json_quote_denormals | int | (Optional) Controls representation of inf and nan on the UI instead of null e.g when dividing by 0 - inf and when there is no representation of a number in Javascript - nan. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | | output_format_parquet_string_as_string | int | (Optional) Use Parquet String type instead of Binary for String columns. Possible values: 0 - disabled, 1 - enabled. Default value is 0 | As a response, it gives you the query metadata, the resulting data and some performance statistics. Successful response [¶](https://www.tinybird.co/docs/about:blank#id8) { "meta": [ { "name": "VendorID", "type": "Int32" }, { "name": "tpep_pickup_datetime", "type": "DateTime" } ], "data": [ { "VendorID": 2, "tpep_pickup_datetime": "2001-01-05 11:45:23", "tpep_dropoff_datetime": "2001-01-05 11:52:05", "passenger_count": 5, "trip_distance": 1.53, "RatecodeID": 1, "store_and_fwd_flag": "N", "PULocationID": 71, "DOLocationID": 89, "payment_type": 2, "fare_amount": 7.5, "extra": 0.5, "mta_tax": 0.5, "tip_amount": 0, "tolls_amount": 0, "improvement_surcharge": 0.3, "total_amount": 8.8 }, { "VendorID": 2, "tpep_pickup_datetime": "2002-12-31 23:01:55", "tpep_dropoff_datetime": "2003-01-01 14:59:11" } ], "rows": 3, "rows_before_limit_at_least": 4, "statistics": { "elapsed": 0.00091042, "rows_read": 4, "bytes_read": 296 } } Data can be fetched in different formats. Just append `FORMAT ` to your SQL query: Requesting different formats with SQL [¶](https://www.tinybird.co/docs/about:blank#id9) SELECT count () from < pipe > FORMAT JSON| format | Description | | --- | --- | | CSV | CSV without header | | CSVWithNames | CSV with header | | JSON | JSON including data, statistics and schema information | | TSV | TSV without header | | TSVWithNames | TSV with header | | PrettyCompact | Formatted table | | JSONEachRow | Newline-delimited JSON values (NDJSON) | | Parquet | Apache Parquet | As you can see in the example above, timestamps do not include a time zone in their serialization. Let’s see how that relates to timestamps ingested from your original data: - If the original timestamp had no time zone associated, you’ll read back the same date and time verbatim. If you ingested the timestamp `2022-11-14 11:08:46` , for example, Tinybird sends `"2022-11-14 11:08:46"` back. This is so regardless of the time zone of the column in ClickHouse. - If the original timestamp had a time zone associated, you’ll read back the corresponding date and time in the time zone of the destination column in ClickHouse, which is UTC by default. If you ingested `2022-11-14 12:08:46.574295 +0100` , for instance, Tinybird sends `"2022-11-14 11:08:46"` back for a `DateTime` , and `"2022-11-14 06:08:46"` for a `DateTime('America/New_York')` . --- URL: https://www.tinybird.co/docs/api-reference/sink-pipes-api Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Sink Pipes API Reference · Tinybird Docs" theme-color: "#171612" description: "The Sink Pipes API allows you to create, delete, schedule, and trigger Sink Pipes." --- # Sink Pipes API¶ The Sink Pipes API allows you to create, delete, schedule, and trigger Sink Pipes. ## POST /v0/pipes/\{pipe\_id\}/nodes/\{node\_id\}/sink¶ Set the Pipe as a Sink Pipe, optionally scheduled. Required token permission is `PIPES:CREATE`. ### Restrictions¶ - You can set only one schedule per Sink Pipe. - You can't set a Sink Pipe if the Pipe is already materializing. You must unlink the Materialization first. - You can't set a Sink Pipe if the Pipe is already an API Endpoint. You must unpublish the API Endpoint first. - You can't set a Sink Pipe if the Pipe is already copying. You must unset the copy first. ### Example¶ ##### Setting the Pipe as a Sink Pipe curl \ -X POST "https://api.tinybird.co/v0/pipes/:pipe/nodes/:node/sink" \ -H "Authorization: Bearer " \ -d "connection=my_connection_name" \ -d "path=s3://bucket-name/prefix" \ -d "file_template=exported_file_template" \ -d "format=csv" \ -d "compression=gz" \ -d "schedule_cron=0 */1 * * *" \ -d "write_strategy=new" ### Request parameters¶ | Key | Type | Description | | --- | --- | --- | | connection | String | Name of the connection to holding the credentials to run the sink | | path | String | Object store prefix into which the sink will write data | | file_template | String | File template string. See[ file template](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink#file-template) for more details | | format | String | Optional. Format of the exported files. Default: CSV | | compression | String | Optional. Compression of the output files. Default: None | | schedule_cron | String | Optional. The sink's execution schedule, in crontab format. | | write_strategy | String | Optional. Default: `new` . The sink's write strategy for filenames already existing in the bucket. Values: `new` , `truncate` ; `new` adds a new file with a suffix, while `truncate` replaces the existent file. | ### Successful response example¶ { "id": "t_529f46626c324674b3a84cd820ac2649", "name": "p_test", "description": null, "endpoint": null, "created_at": "2024-01-18 12:57:36.503834", "updated_at": "2024-01-18 13:01:21.435012", "parent": null, "type": "sink", "last_commit": { "content_sha": "", "path": "", "status": "changed" }, "sink_node": "t_6e8afdb8c691459b80e16541433f951b", "schedule": { "timezone": "Etc/UTC", "cron": "0 */1 * * *", "status": "running" }, "nodes": [ { "id": "t_6e8afdb8c691459b80e16541433f951b", "name": "p_test_0", "sql": "SELECT * FROM test", "description": null, "materialized": null, "cluster": null, "tags": {}, "created_at": "2024-01-18 12:57:36.503843", "updated_at": "2024-01-18 12:57:36.503843", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "node_type": "sink", "dependencies": [ "test" ], "params": [] } ] } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 404 | Pipe, Node, or data connector not found, bucket doesn't exist | | 403 | Limit reached, Query includes forbidden keywords, Pipe is already a Sink Pipe, can't assume role | | 401 | Invalid credentials (from connection) | | 400 | Invalid or missing parameters, bad ARN role, invalid region name | ## DELETE /v0/pipes/\{pipe\_id\}/nodes/\{node\_id\}/sink¶ Removes the Sink from the Pipe. This doesn't delete the Pipe nor the Node, only the sink configuration and any associated settings. ### Example¶ curl \ -X DELETE "https://api.tinybird.co/v0/pipes/$1/nodes/$2/sink" \ -H "Authorization: Bearer " Successful response example { "id": "t_529f46626c324674b3a84cd820ac2649", "name": "p_test", "description": null, "endpoint": null, "created_at": "2024-01-18 12:57:36.503834", "updated_at": "2024-01-19 09:27:12.069650", "parent": null, "type": "default", "last_commit": { "content_sha": "", "path": "", "status": "changed" }, "nodes": [ { "id": "t_6e8afdb8c691459b80e16541433f951b", "name": "p_test_0", "sql": "SELECT * FROM test", "description": null, "materialized": null, "cluster": null, "tags": {}, "created_at": "2024-01-18 12:57:36.503843", "updated_at": "2024-01-19 09:27:12.069649", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "node_type": "standard", "dependencies": [ "test" ], "params": [] } ], "url": "https://api.split.tinybird.co/v0/pipes/p_test.json" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 404 | Pipe, Node, or data connector not found | | 403 | Limit reached, Query includes forbidden keywords, Pipe is already a Sink Pipe | | 400 | Invalid or missing parameters, Pipe isn't a Sink Pipe | ## POST /v0/pipes/\{pipe\_id\}/sink¶ Triggers the Sink Pipe, creating a sink job. Allows overriding some of the sink settings for this particular execution. ### Example¶ ##### Trigger a Sink Pipe with some overrides curl \ -X POST "https://api.tinybird.co/v0/pipes/p_test/sink" \ -H "Authorization: Bearer " \ -d "file_template=export_file" \ -d "format=csv" \ -d "compression=gz" \ -d "write_strategy=truncate" \ -d {key}={val} ### Request parameters¶ | Key | Type | Description | | --- | --- | --- | | connection | String | Name of the connection to holding the credentials to run the sink | | path | String | Object store prefix into which the sink will write data | | file_template | String | File template string. See[ file template](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink#file-template) for more details | | format | String | Optional. Format of the exported files. Default: CSV | | compression | String | Optional. Compression of the output files. Default: None | | write_strategy | String | Optional. The sink's write strategy for filenames already existing in the bucket. Values: `new` , `truncate` ; `new` adds a new file with a suffix, while `truncate` replaces the existent file. | | {key} | String | Optional. Additional variables to be injected into the file template. See[ file template](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink#file-template) for more details | ### Successful response example¶ { "id": "t_6e8afdb8c691459b80e16541433f951b", "name": "p_test_0", "sql": "SELECT * FROM test", "description": null, "materialized": null, "cluster": null, "tags": {}, "created_at": "2024-01-18 12:57:36.503843", "updated_at": "2024-01-19 09:27:12.069649", "version": 0, "project": null, "result": null, "ignore_sql_errors": false, "node_type": "sink", "dependencies": [ "test" ], "params": [], "job": { "id": "685e7395-3b08-492b-9fe8-2944859d6a06", "kind": "sink", "status": "waiting", "created_at": "2024-01-19 15:58:46.688525", "updated_at": "2024-01-19 15:58:46.688532", "is_cancellable": true, "job_url": "https://api.split.tinybird.co/v0/jobs/685e7395-3b08-492b-9fe8-2944859d6a06", "pipe": { "id": "t_529f46626c324674b3a84cd820ac2649", "name": "p_test" } } } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 404 | Pipe, Node, or data connector not found | | 403 | Limit reached, Query includes forbidden keywords, Pipe is already a Sink Pipe | | 400 | Invalid or missing parameters, Pipe isn't a Sink Pipe | ## GET /v0/integrations/s3/policies/trust-policy¶ Retrieves the trust policy to be attached to the IAM role that will be used for the connection. External IDs are different for each Workspace, but shared between Branches of the same Workspace to avoid having to change the trust policy for each Branch. ### Example¶ curl \ -X GET "https://$TB_HOST/v0/integrations/s3/policies/trust-policy" \ -H "Authorization: Bearer " Successful response example { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "AWS": "arn:aws:iam::123456789:root" }, "Condition": { "StringEquals": { "sts:ExternalId": "c6ee2795-aae3-4a55-a7a1-92d92fab0e41" } } } ] } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 404 | S3 integration not supported in your region | ## GET /v0/integrations/s3/policies/write-access-policy¶ Retrieves the trust policy to be attached to the IAM Role that will be used for the connection. External IDs are different for each Workspace, but shared between branches of the same Workspace to avoid having to change the trust policy for each branch. ### Example¶ curl \ -X GET "https://$TB_HOST/v0/integrations/s3/policies/write-access-policy?bucket=test-bucket" \ -H "Authorization: Bearer " ### Request parameters¶ | Key | Type | Description | | --- | --- | --- | | bucket | Optional[String] | Bucket to use for rendering the template. If not provided the '' placeholder is used | ### Successful response example¶ { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": "arn:aws:s3:::" }, { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl" ], "Resource": "arn:aws:s3:::/*" } ] } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | ## GET /v0/integrations/s3/settings¶ Retrieves the settings to be attached to the IAM role that will be used for the connection. External IDs are different for each Workspace, but shared between Branches of the same Workspace to avoid having to generate specific IAM roles for each of the Branches. ### Example¶ curl \ -X GET "https://$TB_HOST/v0/integrations/s3/settings" \ -H "Authorization: Bearer " Successful response example { "principal": "arn:aws:iam:::root", "external_id": "" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 404 | S3 integration not supported in your region | ## GET /v0/datasources-bigquery-credentials¶ Retrieves the Workspace's GCP service account to be authorized to write to the destination bucket. ### Example¶ curl \ -X POST "${TINYBIRD_HOST}/v0/connectors" \ -H "Authorization: Bearer " \ -d "service=gcs_service_account" \ -d "name=" ### Request parameters¶ None ### Successful response example¶ { "account": "cdk-E-d83f6d01-b5c1-40-43439d@development-353413.iam.gserviceaccount.com" } ### Response codes¶ | Code | Description | | --- | --- | | 200 | OK | | 503 | Feature not enabled in your region | --- URL: https://www.tinybird.co/docs/api-reference/token-api Last update: 2024-12-30T16:50:50.000Z Content: --- title: "Token API Reference · Tinybird Docs" theme-color: "#171612" description: "In order to read, append or import data into you Tinybird account, you'll need a Token with the right permissions." --- GET /v0/tokens/? [¶](https://www.tinybird.co/docs/about:blank#get--v0-tokens-?) Retrieves all workspace Static Tokens. Get all tokens [¶](https://www.tinybird.co/docs/about:blank#id2) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens" A list of your Static Tokens and their scopes will be sent in the response. Successful response [¶](https://www.tinybird.co/docs/about:blank#id3) { "tokens": [ { "name": "admin token", "description": "", "scopes": [ { "type": "ADMIN" } ], "token": "p.token" }, { "name": "import token", "description": "", "scopes": [ { "type": "DATASOURCES:CREATE" } ], "token": "p.token0" }, { "name": "token name 1", "description": "", "scopes": [ { "type": "DATASOURCES:READ", "resource": "table_name_1" }, { "type": "DATASOURCES:APPEND", "resource": "table_name_1" } ], "token": "p.token1" }, { "name": "token name 2", "description": "", "scopes": [ { "type": "PIPES:READ", "resource": "pipe_name_2" } ], "token": "p.token2" } ] } POST /v0/tokens/? [¶](https://www.tinybird.co/docs/about:blank#post--v0-tokens-?) Creates a new Token: Static or JWT Creating a new Static Token [¶](https://www.tinybird.co/docs/about:blank#id4) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/" \ -d "name=test&scope=DATASOURCES:APPEND:table_name&scope=DATASOURCES:READ:table_name"| Key | Type | Description | | --- | --- | --- | | name | String | Name of the token | | description | String | Optional. Markdown text with a description of the token. | | scope | String | Scope(s) to set. Format is[ SCOPE:TYPE[:arg][:filter]](about:blank#id2) . This is only used for the Static Tokens | Successful response [¶](https://www.tinybird.co/docs/about:blank#id6) { "name": "token_name", "description": "", "scopes": [ { "type": "DATASOURCES:APPEND", "resource": "table_name" } { "type": "DATASOURCES:READ", "resource": "table_name", "filter": "department = 1" }, ], "token": "p.token" } When creating a token with `filter` whenever you use the token to read the table, it will be filtered. For example, if table is `events_table` and `filter` is `date > '2018-01-01' and type == 'foo'` a query like `select count(1) from events_table` will become `select count(1) from events_table where date > '2018-01-01' and type == 'foo'` Creating a new token with filter [¶](https://www.tinybird.co/docs/about:blank#id7) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/" \ -d "name=test&scope=DATASOURCES:READ:table_name:column==1" If we provide an `expiration_time` in the URL, the token will be created as a JWT Token. Creating a new JWT Token [¶](https://www.tinybird.co/docs/about:blank#id8) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens?name=jwt_token&expiration_time=1710000000" \ -d '{"scopes": [{"type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": {"user_id": 3}}]}' In multi-tenant applications, you can use this endpoint to create a JWT token for a specific tenant where each user has their own token with a fixed set of scopes and parameters POST /v0/tokens/(.+)/refresh [¶](https://www.tinybird.co/docs/about:blank#post--v0-tokens-(.+)-refresh) Refresh the Static Token without modifying name, scopes or any other attribute. Specially useful when a Token is leaked, or when you need to rotate a Token. Refreshing a Static Token [¶](https://www.tinybird.co/docs/about:blank#id9) curl -X POST \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/:token_name/refresh" When successfully refreshing a token, new information will be sent in the response Successful response [¶](https://www.tinybird.co/docs/about:blank#id10) { "name": "token name", "description": "", "scopes": [ { "type": "DATASOURCES:READ", "resource": "table_name" } ], "token": "NEW_TOKEN" }| Key | Type | Description | | --- | --- | --- | | auth_token | String | Token. Ensure it has the `TOKENS` scope on it | | Code | Description | | --- | --- | | 200 | No error | | 403 | Forbidden. Provided token doesn’t have permissions to drop the token. A token is not allowed to remove itself, it needs `ADMIN` or `TOKENS` scope | GET /v0/tokens/(.+) [¶](https://www.tinybird.co/docs/about:blank#get--v0-tokens-(.+)) Fetches information about a particular Static Token. Getting token info [¶](https://www.tinybird.co/docs/about:blank#id13) curl -X GET \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/:token" Returns a json with name and scopes. Successful response [¶](https://www.tinybird.co/docs/about:blank#id14) { "name": "token name", "description": "", "scopes": [ { "type": "DATASOURCES:READ", "resource": "table_name" } ], "token": "p.TOKEN" } DELETE /v0/tokens/(.+) [¶](https://www.tinybird.co/docs/about:blank#delete--v0-tokens-(.+)) Deletes a Static Token . Deleting a token [¶](https://www.tinybird.co/docs/about:blank#id15) curl -X DELETE \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/:token" PUT /v0/tokens/(.+) [¶](https://www.tinybird.co/docs/about:blank#put--v0-tokens-(.+)) Modifies a Static Token. More than one scope can be sent per request, all of them will be added as Token scopes. Every time a Token scope is modified, it overrides the existing one(s). editing a token [¶](https://www.tinybird.co/docs/about:blank#id16) curl -X PUT \ -H "Authorization: Bearer " \ "https://api.tinybird.co/v0/tokens/?" \ -d "name=test_new_name&description=this is a test token&scope=PIPES:READ:test_pipe&scope=DATASOURCES:CREATE"| Key | Type | Description | | --- | --- | --- | | token | String | Token. Ensure it has the `TOKENS` scope on it | | name | String | Optional. Name of the token. | | description | String | Optional. Markdown text with a description of the token. | | scope | String | Optional. Scope(s) to set. Format is[ SCOPE:TYPE[:arg][:filter]](about:blank#id2) . New scope(s) will override existing ones. | Successful response [¶](https://www.tinybird.co/docs/about:blank#id18) { "name": "test", "description": "this is a test token", "scopes": [ { "type": "PIPES:READ", "resource": "test_pipe" }, { "type": "DATASOURCES:CREATE" } ] } --- URL: https://www.tinybird.co/docs/changelog Content: --- title: "Changelog · Tinybird" theme-color: "#171612" description: "Tinybird helps data teams build real-time Data Products at scale through SQL-based API endpoints." --- # Tinybird Local container¶ We are launching Tinybird Local, a free Docker image that replicates the service we provide in the Cloud. Pull the Docker image, and you'll have a local version of Tinybird! Use Tinybird Local when you want to: - Test your app or service locally. - Add it to your CI/CD pipeline. - Run things locally because you prefer it. ![Image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fassets%2Fchangelog%2Ftinybird-local-container.png&w=3840&q=75) If you require connectors or other advanced capabilities, let us know, it'll help us improve the Docker image. [See the docs](https://www.tinybird.co/docs/cli/local-container) for installation instructions. ## Switch regions from the command bar¶ When you press the `s` key in Tinybird, the command bar opens and you can search for the resource you are looking for, such as data source, pipes, and so on. Now, you can also search for the region you want to switch to. ![Video]() Controls: true ## Use job\_timestamp instead of now() in Copy Pipes¶ Jobs run asynchronously, so `now()` in a Copy Pipe can vary greatly from the time the job was scheduled depending on the queue state at launch time, which at peak times and in shared environments can be a few seconds. To avoid this, you can use the `job_timestamp` column, which is the timestamp of the job launch time, instead of the execution time. We have updated the [Copy Pipe best practices](https://www.tinybird.co/docs/work-with-data/process-and-copy/copy-pipes#best-practices) to recommend this. ## Improvements and bug fixes¶ - Fixed a bug where maximum active Copy Jobs were not being limited when running in Branches. The[ normal Copy Pipe limits](https://www.tinybird.co/docs/get-started/plans/limits#copy-pipe-limits) are now being applied. --- URL: https://www.tinybird.co/docs/cli Last update: 2024-12-18T11:12:31.000Z Content: --- title: "Tinybird CLI · Tinybird Docs" theme-color: "#171612" description: "Use the Tinybird CLI to access all the Tinybird features from the command line." --- # Tinybird command-line interface (CLI)¶ Use the Tinybird CLI to access all the Tinybird features directly from the command line. You can test and run commands from the terminal or integrate the CLI in your pipelines. - Read the Quick start guide. See[ Quick start](https://www.tinybird.co/docs/docs/cli/quick-start) . - Install Tinybird CLI on your machine. See[ Install](https://www.tinybird.co/docs/docs/cli/install) . - Learn about Tinybird datafiles and their format. See[ Datafiles](https://www.tinybird.co/docs/docs/cli/datafiles) . - Organize your CLI projects. See[ Data projects](https://www.tinybird.co/docs/docs/cli/data-projects) . --- URL: https://www.tinybird.co/docs/cli/advanced-templates Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Advanced templates · Tinybird Docs" theme-color: "#171612" description: "Advanced usage of Tinybird datafiles, including how to work with query parameters." --- # Advanced templates¶ When developing multiple use cases, you might want to reuse certain parts or steps of an analysis, such as data filters or similar table operations. Read on to learn about advanced usage of the datafile system when using the [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/quick-start) . Before reading this page, familiarize yourself with [query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters). ## Templates reuse¶ You can set up templates to reuse certain parts or steps of an analysis, such as data filters or similar table operations. To follow along, clone the [ecommerce_data_project_advanced](https://github.com/tinybirdco/ecommerce_data_project_advanced) repository: ##### Clone demo git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git cd ecommerce_data_project_advanced The repository contains the following file structure: ##### File structure ecommerce_data_project/ datasources/ events.datasource mv_top_per_day.datasource products.datasource fixtures/ events.csv products.csv endpoints/ sales.pipe top_products_between_dates.pipe top_products_last_week.pipe includes/ only_buy_events.incl top_products.incl pipes/ top_product_per_day.pipe Take a look at the `sales.pipe` API Endpoint and the `top_product_per_day.pipe` Pipe that materializes to a `mv_top_per_day` Data Source. Both use the same node, `only_buy_events` , through the usage of [include](https://www.tinybird.co/docs/docs/cli/datafiles/include-files) files: - only_buy_events - sales_pipes - top_products ##### includes/only\_buy\_events.incl NODE only_buy_events SQL > SELECT toDate(timestamp) date, product, joinGet('products_join_by_id', 'color', product) as color, JSONExtractFloat(json, 'price') as price FROM events where action = 'buy' When using include files to reuse logic in .datasource files, the extension of the file must be `.datasource.incl`. ## Include variables¶ You can include variables in a node template. The following example shows two API Endpoints that display the 10 top products, each filtered by different date intervals: - top_products_1 - top_products_2 - top_products_3 ##### includes/top\_products.incl NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > select date, topKMerge(10)(top_10) as top_10 from top_product_per_day {% if '$DATE_FILTER' = 'last_week' %} where date > today() - interval 7 day {% else %} where date between {{Date(start)}} and {{Date(end)}} {% end %} group by date In the previous examples, the `DATE_FILTER` variable is sent to the `top_products` include, where the variable content is retrieved using the `$` prefix with the `DATE_FILTER` reference. You can also assign an array of values to an include variable. To do this, parse the variable using function templates, as explained in [Template functions](https://www.tinybird.co/docs/docs/cli/template-functions). ### Variables and parameters¶ Parameters are variables whose value you can change through the API Endpoint request parameters. Variables only live in the template and you can set them when declaring the `INCLUDE` or with the `set` template syntax. For example: ##### Using 'set' to declare a variable {% set my_var = 'default' %} By default, variables are interpreted as parameters. To prevent variables or private parameters from appearing in the auto-generated API Endpoint documentation, they need to start with `_` . For example: ##### Define private variables % SELECT date FROM my_table WHERE a > 10 {% if defined(_private_param) %} and b = {{Int32(_private_param)}} {% end %} You also need to use `_` as a prefix when using variables in template functions. See [Template functions](https://www.tinybird.co/docs/docs/cli/template-functions) for more information. --- URL: https://www.tinybird.co/docs/cli/command-ref Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Tinybird CLI command reference · Tinybird Docs" theme-color: "#171612" description: "The Tinybird CLI allows you to use all the Tinybird functionality directly from the command line. Get to know the command reference." --- # CLI command reference¶ The following list shows all available commands in the Tinybird command-line interface, their options, and their arguments. For examples on how to use them, see the [Quick start guide](https://www.tinybird.co/docs/docs/cli/quick-start), [Data projects](https://www.tinybird.co/docs/docs/cli/data-projects) , and [Common use cases](https://www.tinybird.co/docs/docs/cli/common-use-cases). ## tb auth¶ Configure your Tinybird authentication. **auth commands** | Command | Description | | --- | --- | | info OPTIONS | Gets information about the authentication that is currently being used. | | ls OPTIONS | Lists available regions to authenticate. | | use OPTIONS REGION_NAME_OR_HOST_OR_ID | Switches to a different region. You can pass the region name, the region host url, or the region index after listing available regions with `tb auth ls` . | The previous commands accept the following options: - `--token INTEGER` : Use auth Token, defaults to TB_TOKEN envvar, then to the .tinyb file. - `--host TEXT` : Set custom host if it's different than https://api.tinybird.co. Check[ this page](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) for the available list of regions. - `--region TEXT` : Set region. Run 'tb auth ls' to show available regions. - `--connector [bigquery|snowflake]` : Set credentials for one of the supported connectors. - `--interactive,-i` : Show available regions and select where to authenticate to. ## tb branch¶ Manage your Workspace branches. **Branch commands** | Command | Description | Options | | | --- | --- | --- | | | create BRANCH_NAME | Creates a new Branch in the current 'main' Workspace. | `--last-partition` : Attaches the last modified partition from 'main' to the new Branch. `-i, --ignore-datasource DATA_SOURCE_NAME` : Ignore specified Data Source partitions. `--wait / --no-wait` : Wait for Branch jobs to finish, showing a progress bar. Disabled by default. | | | current | Shows the Branch you're currently authenticated to. | | | | data | Performs a data branch operation to bring data into the current Branch. | `--last-partition` : Attaches the last modified partition from 'main' to the new Branch. `-i, --ignore-datasource DATA_SOURCE_NAME` : Ignore specified Data Source partitions. `--wait / --no-wait` : Wait for Branch jobs to finish, showing a progress bar. Disabled by default. | | | datasource copy DATA_SOURCE_NAME | Copies data source from Main. | `--sql SQL` : Freeform SQL query to select what is copied from Main into the Environment Data Source. `--sql-from-main` : SQL query selecting all from the same Data Source in Main. `--wait / --no-wait` : Wait for copy job to finish. Disabled by default. | | | ls | Lists all the Branches available. | `--sort / --no-sort` : Sorts the list of Branches by name. Disabled by default. | | | regression-tests | Regression test commands. | `-f, --filename PATH` : The yaml file with the regression-tests definition. `--skip-regression-tests / --no-skip-regression-tests` : Flag to skip execution of regression tests. This is handy for CI Branches where regression might be flaky. `--main` : Runs regression tests in the main Branch. For this flag to work all the resources in the Branch Pipe Endpoints need to exist in the main Branch. `--wait / --no-wait` : Waits for regression job to finish, showing a progress bar. Disabled by default. | | | regression-tests coverage PIPE_NAME | Runs regression tests using coverage requests for Branch vs Main Workspace. It creates a regression-tests job. The argument supports regular expressions. Using '.*' if no Pipe name is provided. | `--assert-result / --no-assert-result` : Whether to perform an assertion on the results returned by the Endpoint. Enabled by default. Use `--no-assert-result` if you expect the endpoint output is different from current version. `--assert-result-no-error / --no-assert-result-no-error` : Whether to verify that the Endpoint doesn't return errors. Enabled by default. Use `--no-assert-result-no-error` if you expect errors from the endpoint. `--assert-result-rows-count / --no-assert-result-rows-count` : Whether to verify that the correct number of elements are returned in the results. Enabled by default. Use `--assert-result-rows-count` if you expect the numbers of elements in the endpoint output is different from current version. `--assert-result-ignore-order / --no-assert-result-ignore-order` : Whether to ignore the order of the elements in the results. Disabled by default. Use `--assert-result-ignore-order` if you expect the endpoint output is returning same elements but in different order. `--assert-time-increase-percentage INTEGER` : Allowed percentage increase in Endpoint response time. Default value is 25%. Use -1 to disable assert. `--assert-bytes-read-increase-percentage INTEGER` : Allowed percentage increase in the amount of bytes read by the endpoint. Default value is 25%. Use -1 to disable assert. `--assert-max-time FLOAT` : Max time allowed for the endpoint response time. If the response time is lower than this value then the `--assert-time-increase-percentage` isn't taken into account. `--ff, --failfast` : When set, the checker exits as soon one test fails. `--wait` : Waits for regression job to finish, showing a progress bar. Disabled by default. `--skip-regression-tests / --no-skip-regression-tests` : Flag to skip execution of regression tests. This is handy for CI environments where regression might be flaky. `--main` : Runs regression tests in the main Branch. For this flag to work all the resources in the Branch Pipe Endpoints need to exist in the main Branch. | | | regression-tests last PIPE_NAME | Runs regression tests using coverage requests for Branch vs Main Workspace. It creates a regression-tests job. The argument supports regular expressions. Using '.*' if no Pipe name is provided. | `--assert-result / --no-assert-result` : Whether to perform an assertion on the results returned by the Endpoint. Enabled by default. Use `--no-assert-result` if you expect the endpoint output is different from current version. `--assert-result-no-error / --no-assert-result-no-error` : Whether to verify that the Endpoint doesn't return errors. Enabled by default. Use `--no-assert-result-no-error` if you expect errors from the endpoint. `--assert-result-rows-count / --no-assert-result-rows-count` : Whether to verify that the correct number of elements are returned in the results. Enabled by default. Use `--assert-result-rows-count` if you expect the numbers of elements in the endpoint output is different from current version. `--assert-result-ignore-order / --no-assert-result-ignore-order` : Whether to ignore the order of the elements in the results. Disabled by default. Use `--assert-result-ignore-order` if you expect the endpoint output is returning same elements but in different order. `--assert-time-increase-percentage INTEGER` : Allowed percentage increase in Endpoint response time. Default value is 25%. Use -1 to disable assert. `--assert-bytes-read-increase-percentage INTEGER` : Allowed percentage increase in the amount of bytes read by the endpoint. Default value is 25%. Use -1 to disable assert. `--assert-max-time FLOAT` : Max time allowed for the endpoint response time. If the response time is lower than this value then the `--assert-time-increase-percentage` isn't taken into account. `--ff, --failfast` : When set, the checker exits as soon one test fails. `--wait` : Waits for regression job to finish, showing a progress bar. Disabled by default. `--skip-regression-tests / --no-skip-regression-tests` : Flag to skip execution of regression tests. This is handy for CI environments where regression might be flaky. | | | regression-tests manual PIPE_NAME | Runs regression tests using coverage requests for Branch vs Main Workspace. It creates a regression-tests job. The argument supports regular expressions. Using '.*' if no Pipe name is provided. | `--assert-result / --no-assert-result` : Whether to perform an assertion on the results returned by the Endpoint. Enabled by default. Use `--no-assert-result` if you expect the endpoint output is different from current version. `--assert-result-no-error / --no-assert-result-no-error` : Whether to verify that the Endpoint doesn't return errors. Enabled by default. Use `--no-assert-result-no-error` if you expect errors from the endpoint. `--assert-result-rows-count / --no-assert-result-rows-count` : Whether to verify that the correct number of elements are returned in the results. Enabled by default. Use `--assert-result-rows-count` if you expect the numbers of elements in the endpoint output is different from current version. `--assert-result-ignore-order / --no-assert-result-ignore-order` : Whether to ignore the order of the elements in the results. Disabled by default. Use `--assert-result-ignore-order` if you expect the endpoint output is returning same elements but in different order. `--assert-time-increase-percentage INTEGER` : Allowed percentage increase in Endpoint response time. Default value is 25%. Use -1 to disable assert. `--assert-bytes-read-increase-percentage INTEGER` : Allowed percentage increase in the amount of bytes read by the endpoint. Default value is 25%. Use -1 to disable assert. `--assert-max-time FLOAT` : Max time allowed for the endpoint response time. If the response time is lower than this value then the `--assert-time-increase-percentage` isn't taken into account. `--ff, --failfast` : When set, the checker exits as soon one test fails. `--wait` : Waits for regression job to finish, showing a progress bar. Disabled by default. `--skip-regression-tests / --no-skip-regression-tests` : Flag to skip execution of regression tests. This is handy for CI Branches where regression might be flaky. | | | rm [BRANCH_NAME_OR_ID] | Removes a Branch from the Workspace (not Main). It can't be recovered. | `--yes` : Don't ask for confirmation. | | | use [BRANCH_NAME_OR_ID] | Switches to another Branch. | | | ## tb check¶ Checks file syntax. It only allows one option, `--debug` , which prints the internal representation. ## tb connection¶ Connection commands. | Command | Description | Options | | --- | --- | --- | | create COMMAND [ARGS] | Creates a connection. Available subcommands or types are `bigquery` , `kafka` , `s3` , `s3_iamrole` , and `snowflake` . | See the next table. | | ls [OPTIONS] | Lists connections. | `--connector TYPE` : Filters by connector. Available types are `bigquery` , `kafka` , `s3` , `s3_iamrole` , and `snowflake` . | | rm [OPTIONS] CONNECTION_ID_OR_NAME | Removes a connection. | `--force BOOLEAN` : Forces connection removal even if there are Data Sources using it. | ### tb connection create¶ The following subcommands and settings are available for each `tb connection create` subcommand: | Command | Description | Options | | --- | --- | --- | | create bigquery [OPTIONS] | Creates a BigQuery connection. | `--no-validate` : Doesn't validate GCP permissions. | | create kafka [OPTIONS] | Creates a Kafka connection. | `--bootstrap-servers TEXT` : Kafka Bootstrap Server in the form mykafka.mycloud.com:9092. `--key TEXT` : Key. `--secret TEXT` : Secret. `--connection-name TEXT` : Name of your Kafka connection. If not provided, it's set as the bootstrap server. `--auto-offset-reset TEXT` : Offset reset, can be 'latest' or 'earliest'. Defaults to 'latest'. `--schema-registry-url TEXT` : Avro Confluent Schema Registry URL. `--sasl-mechanism TEXT` : Authentication method for connection-based protocols. Defaults to 'PLAIN'. `--ssl-ca-pem TEXT` : Path or content of the CA Certificate file in PEM format. | | create s3 [OPTIONS] | Creates an S3 connection. | `--key TEXT` : Your Amazon S3 key with access to the buckets. `--secret TEXT` : The Amazon S3 secret for the key. `--region TEXT` : The Amazon S3 region where you buckets are located. `--connection-name TEXT` : The name of the connection to identify it in Tinybird. `--no-validate` : Don't validate S3 permissions during connection creation. | | create s3_iamrole [OPTIONS] | Creates an S3 connection (IAM role). | `--connection-name TEXT` : Name of the connection to identify it in Tinybird. `--role-arn TEXT` : The ARN of the IAM role to use for the connection. `--region TEXT` : The Amazon S3 region where the bucket is located. `--policy TEXT` : The Amazon S3 access policy: write or read. `--no-validate` : Don't validate S3 permissions during connection creation. | | create snowflake [OPTIONS] | Creates a Snowflake connection. | `--account TEXT` : The account identifier of your Snowflake account. For example, myorg-account123. `--username TEXT` : The Snowflake user you want to use for the connection. `--password TEXT` : The Snowflake password of the chosen user. `--warehouse TEXT` : If not provided, it's set to your Snowflake user default. Warehouse to run the export sentences. `--role TEXT` : If not provided, it's set to your Snowflake user default. Snowflake role use in the export process. `--connection-name TEXT` : The name of your Snowflake connection. If not provided, it's set as the account identifier. `--integration-name TEXT` : The name of your Snowflake integration. If not provided, Tinybird creates one. `--stage-name TEXT` : The name of your Snowflake stage. If not provided, Tinybird creates one. `--no-validate` : Don't validate Snowflake permissions during connection creation. | ## tb datasource¶ Data Sources commands. | Command | Description | Options | | --- | --- | --- | | analyze OPTIONS URL_OR_FILE | Analyzes a URL or a file before creating a new data source. | | | append OPTIONS DATASOURCE_NAME URL | Appends data to an existing Data Source from URL, local file or a connector. | | | connect OPTIONS CONNECTION DATASOURCE_NAME | Deprecated. Use `tb connection create` instead. | `--kafka-topic TEXT` : For Kafka connections: topic. `--kafka-group TEXT` : For Kafka connections: group ID. `--kafka-auto-offset-reset [latest|earliest]` : Kafka auto.offset.reset config. Valid values are: ["latest", "earliest"]. `--kafka-sasl-mechanism [PLAIN|SCRAM-SHA-256|SCRAM-SHA-512]` : Kafka SASL mechanism. Valid values are: ["PLAIN", "SCRAM-SHA-256", "SCRAM-SHA-512"]. Default: "PLAIN". | | copy OPTIONS DATASOURCE_NAME | Copies data source from Main. | `--sql TEXT` : Freeform SQL query to select what is copied from Main into the Branch Data Source. `--sql-from-main` : SQL query selecting * from the same Data Source in Main. `--wait` : Wait for copy job to finish, disabled by default. | | delete OPTIONS DATASOURCE_NAME | Deletes rows from a Data Source. | `--yes` : Doesn't ask for confirmation. `--wait` : Wait for delete job to finish, disabled by default. `--dry-run` : Run the command without deleting anything. `--sql-condition` : Delete rows with SQL condition. | | generate OPTIONS FILENAMES | Generates a Data Source file based on a sample CSV file from local disk or URL. | `--force` : Overrides existing files. | | ls OPTIONS | Lists Data Sources. | `--match TEXT` : Retrieves any resources matching the pattern. eg `--match _test` . `--format [json]` : Force a type of the output. `--dry-run` : Run the command without deleting anything. | | replace OPTIONS DATASOURCE_NAME URL | Replaces the data in a Data Source from a URL, local file or a connector. | `--sql` : The SQL to extract from. `--connector` : Connector name. `--sql-condition` : Delete rows with SQL condition. | | rm OPTIONS DATASOURCE_NAME | Deletes a Data Source. | `--yes` : Doesn't ask for confirmation. | | share OPTIONS DATASOURCE_NAME WORKSPACE_NAME_OR_ID | Shares a Data Source. | `--user_token TEXT` : User token. `--yes` : Don't ask for confirmation. | | sync OPTIONS DATASOURCE_NAME | Syncs from connector defined in .datasource file. | `--yes` : Doesn't ask for confirmation. | | truncate OPTIONS DATASOURCE_NAME | Truncates a Data Source. | `--yes` : Doesn't ask for confirmation. `--cascade` : Truncate dependent Data Source attached in cascade to the given Data Source. | | unshare OPTIONS DATASOURCE_NAME WORKSPACE_NAME_OR_ID | Unshares a Data Source. | `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. `--yes` : Don't ask for confirmation. | | scheduling resume DATASOURCE_NAME | Resumes the scheduling of a Data Source. | | | scheduling pause DATASOURCE_NAME | Pauses the scheduling of a Data Source. | | | scheduling status DATASOURCE_NAME | Gets the scheduling status of a Data Source (paused or running). | | ## tb dependencies¶ Prints all Data Sources dependencies. Its options: - `--no-deps` : Prints only Data Sources with no Pipes using them. - `--match TEXT` : Retrieves any resource matching the pattern. - `--pipe TEXT` : Retrieves any resource used by Pipe. - `--datasource TEXT` : Retrieves resources depending on this Data Source. - `--check-for-partial-replace` : Retrieves dependant Data Sources that have their data replaced if a partial replace is executed in the Data Source selected. - `--recursive` : Calculates recursive dependencies. ## tb deploy¶ Deploys in Tinybird pushing resources changed from previous release using Git. These are the options available for the `deploy` command: - `--dry-run` : Runs the command with static checks, without creating resources on the Tinybird account or any side effect. Doesn't check for runtime errors. - `-f, --force` : Overrides Pipes when they already exist. - `--override-datasource` : When pushing a Pipe with a materialized node if the target Data Source exists it tries to override it. - `--populate` : Populate materialized nodes when pushing them. - `--subset FLOAT` : Populates with a subset percent of the data (limited to a maximum of 2M rows), this is useful to quickly test a materialized node with some data. The subset must be greater than 0 and lower than 0.1. A subset of 0.1 means a 10% of the data in the source Data Source is used to populate the Materialized View. Use it together with `--populate` , it has precedence over `--sql-condition` . - `--sql-condition TEXT` : Populates with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `--sql-condition='date == toYYYYMM(now())'` it populates taking all the rows from the trigger Data Source which `date` is the current month. Use it together with `--populate` . `--sql-condition` isn't taken into account if the `--subset` param is present. Including in the `sql_condition` any column present in the Data Source `engine_sorting_key` makes the populate job process less data. - `--unlink-on-populate-error` : If the populate job fails the Materialized View is unlinked and new data isn't ingested there. First time a populate job fails, the Materialized View is always unlinked. - `--wait` : To be used along with `--populate` command. Waits for populate jobs to finish, showing a progress bar. Disabled by default. - `--yes` : Doesn't ask for confirmation. - `--workspace_map TEXT..., --workspace TEXT...` : Adds a Workspace path to the list of external Workspaces, usage: `--workspace name path/to/folder` . - `--timeout FLOAT` : Timeout you want to use for the job populate. - `--user_token TOKEN` : The user Token is required for sharing a Data Source that contains the SHARED_WITH entry. ## tb diff¶ Diffs local datafiles to the corresponding remote files in the Workspace. It works as a regular `diff` command, useful to know if the remote resources have been changed. Some caveats: - Resources in the Workspace might mismatch due to having slightly different SQL syntax, for instance: A parenthesis mismatch, `INTERVAL` expressions or changes in the schema definitions. - If you didn't specify an `ENGINE_PARTITION_KEY` and `ENGINE_SORTING_KEY` , resources in the Workspace might have default ones. The recommendation in these cases is use `tb pull` to keep your local files in sync. Remote files are downloaded and stored locally in a `.diff_tmp` directory, if working with git you can add it to `.gitignore`. The options for this command: - `--fmt / --no-fmt` : Format files before doing the diff, default is True so both files match the format. - `--no-color` : Don't colorize diff. - `--no-verbose` : List the resources changed not the content of the diff. ## tb fmt¶ Formats a .datasource, .pipe or .incl file. These are the options available for the `fmt` command: - `--line-length INTEGER` : A number indicating the maximum characters per line in the node SQL, lines split based on the SQL syntax and the number of characters passed as a parameter. - `--dry-run` : Don't ask to override the local file. - `--yes` : Don't ask for confirmation to overwrite the local file. - `--diff` : Formats local file, prints the diff and exits 1 if different, 0 if equal. This command removes comments starting with # from the file, so use DESCRIPTION or a comment block instead: ##### Example comment block % {% comment this is a comment and fmt keeps it %} SELECT {% comment this is another comment and fmt keeps it %} count() c FROM stock_prices_1m You can add `tb fmt` to your git `pre-commit` hook to have your files properly formatted. If the SQL formatting results aren't the ones expected to you, you can disable it just for the blocks needed. Read [how to disable fmt](https://docs.sqlfmt.com/getting-started/disabling-sqlfmt). ## tb init¶ Initializes folder layout. It comes with these options: - `--generate-datasources` : Generates Data Sources based on CSV, NDJSON and Parquet files in this folder. - `--folder DIRECTORY` : Folder where datafiles are placed. - `-f, --force` : Overrides existing files. - `-ir, --ignore-remote` : Ignores remote files not present in the local data project on `tb init --git` . - `--git` : Initializes Workspace with Git commits. - `--override-commit TEXT` : Use this option to manually override the reference commit of your Workspace. This is useful if a commit isn't recognized in your Git log, such as after a force push ( `git push -f` ). ## tb job¶ Jobs commands. | Command | Description | Options | | --- | --- | --- | | cancel JOB_ID | Tries to cancel a job. | None | | details JOB_ID | Gets details for any job created in the last 48h. | None | | ls [OPTIONS] | Lists jobs. | `--status [waiting|working|done|error]` or `-s` : Shows results with the desired status. | ## tb materialize¶ Analyzes the `node_name` SQL query to generate the .datasource and .pipe files needed to push a new materialize view. This command guides you to generate the Materialized View with name TARGET_DATASOURCE, the only requirement is having a valid Pipe datafile locally. Use `tb pull` to download resources from your Workspace when needed. It allows to use these options: - `--push-deps` : Push dependencies, disabled by default. - `--workspace TEXT...` : Add a Workspace path to the list of external Workspaces, usage: `--workspace name path/to/folder` . - `--no-versions` : When set, resource dependency versions aren't used, it pushes the dependencies as-is. - `--verbose` : Prints more log. - `--unlink-on-populate-error` : If the populate job fails the Materialized View is unlinked and new data isn't ingested in the Materialized View. First time a populate job fails, the Materialized View is always unlinked. ## tb pipe¶ Use the following commands to manage Pipes. | Command | Description | Options | | --- | --- | --- | | append OPTIONS PIPE_NAME_OR_UID SQL | Appends a node to a Pipe. | | | copy pause OPTIONS PIPE_NAME_OR_UID | Pauses a running Copy Pipe. | | | copy resume OPTIONS PIPE_NAME_OR_UID | Resumes a paused Copy Pipe. | | | copy run OPTIONS PIPE_NAME_OR_UID | Runs an on-demand copy job. | `--wait` : Waits for the copy job to finish. `--yes` : Doesn't ask for confirmation. `--param TEXT` : Key and value of the params you want the Copy Pipe to be called with. For example: `tb pipe copy run --param foo=bar` . | | data OPTIONS PIPE_NAME_OR_UID PARAMETERS | Prints data returned by a Pipe. You can pass query parameters to the command, for example `--param_name value` . | `--query TEXT` : Runs SQL over Pipe results. `--format [json|csv]` : Return format (CSV, JSON). `-- value` : Query parameter. You can define multiple parameters and their value. For example, `--paramOne value --paramTwo value2` . | | generate OPTIONS NAME QUERY | Generates a Pipe file based on a sql query. Example: `tb pipe generate my_pipe 'select * from existing_datasource'` . | `--force` : Overrides existing files. | | ls OPTIONS | Lists Pipes. | `--match TEXT` : Retrieves any resourcing matching the pattern. For example `--match _test` . `--format [json|csv]` : Force a type of the output. | | populate OPTIONS PIPE_NAME | Populates the result of a Materialized node into the target Materialized View. | `--node TEXT` : Name of the materialized Node. Required. `--sql-condition TEXT` : Populate with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `--sql-condition='date == toYYYYMM(now())'` it populates taking all the rows from the trigger Data Source which `date` is the current month. Use it together with `--populate` . `--sql-condition` isn't taken into account if the `--subset` param is present. Including in the `sql_condition` any column present in the Data Source `engine_sorting_key` makes the populate job process less data. `--truncate` : Truncates the materialized Data Source before populating it. `--unlink-on-populate-error` : If the populate job fails the Materialized View is unlinked and new data isn't ingested in the Materialized View. First time a populate job fails, the Materialized View is always unlinked. `--wait` : Waits for populate jobs to finish, showing a progress bar. Disabled by default. | | publish OPTIONS PIPE_NAME_OR_ID NODE_UID | Changes the published node of a Pipe. | | | regression-test OPTIONS FILENAMES | Runs regression tests using last requests. | `--debug` : Prints internal representation, can be combined with any command to get more information. `--only-response-times` : Checks only response times. `--workspace_map TEXT..., --workspace TEXT...` : Add a Workspace path to the list of external Workspaces, usage: `--workspace name path/to/folder` . `--no-versions` : When set, resource dependency versions aren't used, it pushes the dependencies as-is. `-l, --limit INTEGER RANGE` : Number of requests to validate [0<=x<=100]. `--sample-by-params INTEGER RANGE` : When set, aggregates the pipe_stats_rt requests by `extractURLParameterNames(assumeNotNull(url))` and for each combination takes a sample of N requests [1<=x<=100]. `-m, --match TEXT` : Filters the checker requests by specific parameter. You can pass multiple parameters -m foo -m bar. `-ff, --failfast` : When set, the checker exits as soon as one test fails. `--ignore-order` : When set, the checker ignores the order of list properties. `--validate-processed-bytes` : When set, the checker validates that the new version doesn't process more than 25% than the current version. `--relative-change FLOAT` : When set, the checker validates the new version has less than this distance with the current version. | | rm OPTIONS PIPE_NAME_OR_ID | Deletes a Pipe. PIPE_NAME_OR_ID can be either a Pipe name or id in the Workspace or a local path to a .pipe file. | `--yes` : Doesn't ask for confirmation. | | set_endpoint OPTIONS PIPE_NAME_OR_ID NODE_UID | Same as 'publish', changes the published node of a Pipe. | | | sink run OPTIONS PIPE_NAME_OR_UID | Runs an on-demand sink job. | `--wait` : Waits for the sink job to finish. `--yes` : Don't ask for confirmation. `--dry-run` : Run the command without executing the sink job. `--param TEXT` : Key and value of the params you want the Sink Pipe to be called with. For example: `tb pipe sink run --param foo=bar` . | | stats OPTIONS PIPES | Prints Pipe stats for the last 7 days. | `--format [json]` : Forces a type of the output. To parse the output, keep in mind to use `tb --no-version-warning pipe stats` option. | | token_read OPTIONS PIPE_NAME | Retrieves a Token to read a Pipe. | | | unlink OPTIONS PIPE_NAME NODE_UID | Unlinks the output of a Pipe, whatever its type: Materialized Views, Copy Pipes, or Sinks. | | | unpublish OPTIONS PIPE_NAME NODE_UID | Unpublishes the endpoint of a Pipe. | | ## tb prompt¶ Provides instructions to configure the shell prompt for Tinybird CLI. See [Configure your shell prompt](https://www.tinybird.co/docs/docs/cli/install#configure-your-shell-prompt). ## tb pull¶ Retrieves the latest version for project files from your Workspace. With these options: - `--folder DIRECTORY` : Folder where files are placed. - `--auto / --no-auto` : Saves datafiles automatically into their default directories (/datasources or /pipes). Default is True. - `--match TEXT` : Retrieve any resourcing matching the pattern. eg `--match _test` . - `-f, --force` : Override existing files. - `--fmt` : Format files, following the same format as `tb fmt` . ## tb push¶ Push files to your Workspace. You can use this command with these options: - `--dry-run` : Runs the command with static checks, without creating resources on the Tinybird account or any side effect. Doesn't check for runtime errors. - `--check / --no-check` : Enables/disables output checking, enabled by default. - `--push-deps` : Pushes dependencies, disabled by default. - `--only-changes` : Pushes only the resources that have changed compared to the destination Workspace. - `--debug` : Prints internal representation, can be combined with any command to get more information. - `-f, --force` : Overrides Pipes when they already exist. - `--override-datasource` : When pushing a Pipe with a materialized node if the target Data Source exists it tries to override it. - `--populate` : Populates materialized nodes when pushing them. - `--subset FLOAT` : Populates with a subset percent of the data (limited to a maximum of 2M rows), this is useful to quickly test a materialized node with some data. The subset must be greater than 0 and lower than 0.1. A subset of 0.1 means a 10 percent of the data in the source Data Source is used to populate the Materialized View. Use it together with `--populate` , it has precedence over `--sql-condition` . - `--sql-condition TEXT` : Populates with a SQL condition to be applied to the trigger Data Source of the Materialized View. For instance, `--sql-condition='date == toYYYYMM(now())'` it populates taking all the rows from the trigger Data Source which `date` is the current month. Use it together with `--populate` . `--sql-condition` isn't taken into account if the `--subset` param is present. Including in the `sql_condition` any column present in the Data Source `engine_sorting_key` makes the populate job process less data. - `--unlink-on-populate-error` : If the populate job fails the Materialized View is unlinked and new data isn't ingested in the Materialized View. First time a populate job fails, the Materialized View is always unlinked. - `--fixtures` : Appends fixtures to Data Sources. - `--wait` : To be used along with `--populate` command. Waits for populate jobs to finish, showing a progress bar. Disabled by default. - `--yes` : Doesn't ask for confirmation. - `--only-response-times` : Checks only response times, when --force push a Pipe. - `--workspace TEXT..., --workspace_map TEXT...` : Add a Workspace path to the list of external Workspaces, usage: `--workspace name path/to/folder` . - `--no-versions` : When set, resource dependency versions aren't used, it pushes the dependencies as-is. - `--timeout FLOAT` : Timeout you want to use for the populate job. - `-l, --limit INTEGER RANGE` : Number of requests to validate [0<=x<=100]. - `--sample-by-params INTEGER RANGE` : When set, aggregates the `pipe_stats_rt` requests by `extractURLParameterNames(assumeNotNull(url))` and for each combination takes a sample of N requests [1<=x<=100]. - `-ff, --failfast` : When set, the checker exits as soon one test fails. - `--ignore-order` : When set, the checker ignores the order of list properties. - `--validate-processed-bytes` : When set, the checker validates that the new version doesn't process more than 25% than the current version. - `--user_token TEXT` : The User Token is required for sharing a Data Source that contains the SHARED_WITH entry. ## tb sql¶ Runs SQL queries over Data Sources and Pipes. - `--rows_limit INTEGER` : Max number of rows retrieved. - `--pipeline TEXT` : The name of the Pipe to run the SQL Query. - `--pipe TEXT` : The path to the .pipe file to run the SQL Query of a specific NODE. - `--node TEXT` : The NODE name. - `--format [json|csv|human]` : Output format. - `--stats / --no-stats` : Shows query stats. ## tb test¶ Test commands. | Command | Description | Options | | --- | --- | --- | | init | Initializes a file list with a simple test suite. | `--force` : Overrides existing files. | | parse [OPTIONS] [FILES] | Reads the contents of a test file list. | | | run [OPTIONS] [FILES] | Runs the test suite, a file, or a test. | `--verbose` or `-v` : Shows results. `--fail` : Show only failed/error tests. `--concurrency [INTEGER RANGE]` or `-c [INTEGER RANGE]` : How many tests to run concurrently. | ## tb token¶ Manage your Workspace Tokens. | Command | Description | Options | | --- | --- | --- | | copy OPTIONS TOKEN_ID | Copies a Token. | | | ls OPTIONS | Lists Tokens. | `--match TEXT` : Retrieves any Token matching the pattern. eg `--match _test` . | | refresh OPTIONS TOKEN_ID | Refreshes a Token. | `--yes` : Doesn't ask for confirmation. | | rm OPTIONS TOKEN_ID | Removes a Token. | `--yes` : Doesn't ask for confirmation. | | scopes OPTIONS TOKEN_ID | Lists Token scopes. | | | create static OPTIONS TOKEN_NAME | Creates a static Token that lasts forever. | `--scope` : Scope for the Token (e.g., `DATASOURCES:READ` ). Required. `--resource` : Resource you want to associate the scope with. `--filter` : SQL condition used to filter the values when calling with this token (eg. `--filter=value > 0` ). | | create jwt OPTIONS TOKEN_NAME | Creates a JWT Token with a fixed expiration time. | `--ttl` : Time to live (e.g., '1h', '30min', '1d'). Required. `--scope` : Scope for the token (only `PIPES:READ` is allowed for JWT tokens).Required. `--resource` : Resource associated with the scope. Required. `--fixed-params` : Fixed parameters in key=value format, multiple values separated by commas. | ## tb workspace¶ Manage your Workspaces. | Command | Description | Options | | --- | --- | --- | | clear OPTIONS | Drop all the resources inside a project. This command is dangerous because it removes everything, use with care. | `--yes` : Don't ask for confirmation. `--dry-run` : Run the command without removing anything. | | create OPTIONS WORKSPACE_NAME | Creates a new Workspace for your Tinybird user. | `--starter_kit TEXT` : Uses a Tinybird starter kit as a template. `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. `--fork` : When enabled, Tinybird shares all Data Sources from the current Workspace to the new created one. | | current OPTIONS | Shows the Workspace you're currently authenticated to. | | | delete OPTIONS WORKSPACE_NAME_OR_ID | Deletes a Workspace where you are an admin. | `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. `--yes` : Don't ask for confirmation. | | ls OPTIONS | Lists all the Workspaces you have access to in the account you're currently authenticated to. | | | members add OPTIONS MEMBERS_EMAILS | Adds members to the current Workspace. | `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. | | members ls OPTIONS | Lists members in the current Workspace. | | | members rm OPTIONS | Removes members from the current Workspace. | `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. | | members set-role OPTIONS [guest|viewer|admin] MEMBERS_EMAILS | Sets the role for existing Workspace members. | `--user_token TEXT` : When passed, Tinybird won't prompt asking for it. | | use OPTIONS WORKSPACE_NAME_OR_ID | Switches to another workspace. Use `tb workspace ls` to list the workspaces you have access to. | | ## tb tag¶ Manage your Workspace tags. | Command | Description | Options | | --- | --- | --- | | create TAG_NAME | Creates a tag in the current Workspace. | | | ls | List all the tags of the current Workspace. | | | ls TAG_NAME | List all the resources tagged with the given tag. | | | rm TAG_NAME | Removes a tag from the current Workspace. All resources aren't tagged by the given tag anymore. | `--yes` : Don't ask for confirmation. | --- URL: https://www.tinybird.co/docs/cli/common-use-cases Last update: 2025-01-20T11:43:08.000Z Content: --- title: "CLI common use cases · Tinybird Docs" theme-color: "#171612" description: "This document shows some common use cases where the Command Line Interface (CLI) can help you on your day to day workflow." --- # Common use cases¶ The following uses cases illustrate how Tinybird CLI solve common situations using available commands. ## Download Pipes and data sources from your account¶ There are two ways you can start working with the CLI. You can either [start a new data project](https://www.tinybird.co/docs/docs/cli/quick-start) from scratch, or if you already have some data and API Endpoints in your Tinybird account, pull it to your local disk to continue working from there. For this second option, use the `--match` flag to filter Pipes or data sources containing the string passed as parameter. For instance, to pull all the files named `project`: ##### Pull all the project files tb pull --match project [D] writing project.datasource(demo) [D] writing project_geoindex.datasource(demo) [D] writing project_geoindex_pipe.pipe(demo) [D] writing project_agg.pipe(demo) [D] writing project_agg_API_endpoint_request_log_pipe_3379.pipe(demo) [D] writing project_exploration.pipe(demo) [D] writing project_moving_avg.pipe(demo) The pull command doesn't preserve the directory structure, so all your datafiles are downloaded to your current directory. Once the files are pulled, you can `diff` or `push` the changes to your source control repository and continue working from the command line. When you pull data sources or Pipes, your data isn't downloaded, just the data source schemas and Pipes definition, so they can be replicated easily. ## Push the entire data project¶ ##### Push the whole project tb push --push-deps ## Push a Pipe with all its dependencies¶ ##### Push dependencies tb push pipes/mypipe.pipe --push-deps ## Adding a new column to a Data Source¶ Data Source schemas are mostly immutable, but you have the possibility to append new columns at the end of an existing Data Source with an Engine from the MergeTree Family or Null Engine. If you want to change columns, add columns in other positions, or modify the engine, you must first create a new version of the Data Source with the modified schema. Then ingest the data and finally point the Pipes to this new API Endpoint. To force a Pipe replacement use the `--force` flag when pushing it. If you create a new column with a `DEFAULT` o `MATERIALIZED` value, only the rows inserted after adding the column will write the data to disk. Data already in a Data Source when the column is added will not be modified, and the value will be computed at query time. That's not problematic when the expression is constant, for example a specific date or number, but for dynamic expression like `now()` or `now64()` , the returned value might change every time a select query is performed. ### Append new columns to an existing Data Source¶ As an example, imagine you have the following Data Source defined, and it has been already pushed to Tinybird: ##### Appending a new column to a Data Source SCHEMA > `test` Int16, `local_date` Date, `test3` Int64 If you want to append a new column, you must change the `*.datasource` file to add the new column `new_column` . You can append as many columns as you need at the same time: ##### Appending a new column to a Data Source SCHEMA > `test` Int16, `local_date` Date, `test3` Int64, `new_column` Int64 Remember that when **appending or deleting columns to an existing Data Source** , the engine of that Data Source must be of the **MergeTree** family. After appending the new column, execute `tb push my_datasource.datasource --force` and confirm the addition of the column(s). The `--force` parameter is required for this kind of operation. Existing imports will continue working once the new columns are added, even if those imports don't carry values for the added columns. In those cases, the new columns contain empty values like `0` for numeric values or `''` for Strings, or if defined, the default values in the schema. ### Create a new version of the Data Source to make additional add/change column operations¶ To create a new version of a Data Source, create a separate datafile with a different name. You can choose a helpful naming convention such as adding a `_version` suffix (e.g. `my_ds_1.datasource` ). ## Debug mode¶ When you work with Pipes that use several versions of different data sources, you might need to double check which version of which Data Source the Pipe is pointing at before you push it to your Tinybird account. To do so, use the `--dry-run --debug` flags like this: ##### Debug mode tb push my_pipe.pipe --dry-run --debug After you've validated the content of the Pipe, push your Pipe as normal. ## Automatic regression tests for your API Endpoints¶ Any time you `--force` push a Pipe which has a public API Endpoint that has received requests, some automatic regression tests are executed. If the previous version of the API Endpoint returns the same data as the version you are pushing, the CLI checks for the top ten requests. This can help you validate whether you are introducing a regression in your API. Other times, you are consciously `--force` pushing a new version which returns different data. In that case you can avoid the regression tests with the `--no-check` flag: ##### Avoid regression tests tb push my_may_view_pipe.pipe --force --no-check When pushing a Pipe with a public API Endpoint, the API Endpoint will be maintained based on the node name. If the existing API Endpoint node is renamed, the last node of the Pipe will be recreated as an API Endpoint. The latter option isn't an atomic operation: The API Endpoint will be down for a few moments while the new API Endpoint is created. --- URL: https://www.tinybird.co/docs/cli/data-projects Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Organize files in data projects · Tinybird Docs" theme-color: "#171612" description: "Learn how to best organize your Tinybird CLI files in versioned projects." --- # Organize files in data projects¶ A data project is a set of files that describes how your data must be stored, processed, and exposed through APIs. In the same way you maintain source code files in a repository, use a CI, make deployments, run tests, and so on, Tinybird provides tools to work following a similar pattern but with data pipelines. The source code of your project are the [datafiles](https://www.tinybird.co/docs/docs/cli/datafiles) in Tinybird. With a data project you can: - Define how the data should flow, from schemas to API Endpoints. - Manage your datafiles using version control. - Branch your datafiles. - Run tests. - Deploy data projects. ## Ecommerce site example¶ Consider an ecommerce site where you have events from users and a list of products with their attributes. Your goal is to expose several API endpoints to return sales per day and top product per day. The data project file structure would look like the following: ecommerce_data_project/ datasources/ events.datasource products.datasource fixtures/ events.csv products.csv pipes/ top_product_per_day.pipe endpoints/ sales.pipe top_products.pipe To follow this tutorial, download and open the example using the following commands: ##### Clone demo git clone https://github.com/tinybirdco/ecommerce_data_project.git cd ecommerce_data_project ### Upload the project¶ You can push the whole project to your Tinybird account to check everything is fine. The `tb push` command uploads the data to Tinybird, previously checking the project dependencies and the SQL syntax. In this case, use the `--push-deps` flag to push everything: ##### Push dependencies tb push --push-deps After the upload completes, the endpoints defined in our project, `sales` and `top_products` , are available and you can start pushing data to the different Data Sources. ### Define Data Sources¶ Data Sources define how your data is ingested and stored. You can add data to Data Sources using the [Data Sources API](https://www.tinybird.co/docs/docs/api-reference/datasource-api). Each Data Source is defined by a schema and other properties. See [Datasource files](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files). The following snippet shows the content of the `event.datasource` file from the ecommerce example: DESCRIPTION > # Events from users This contains all the events produced by Kafka, there are 4 fixed columns. plus a `json` column which contains the rest of the data for that event. See [documentation](https://www.tinybird.co/docs/url_for_docs) for the different events. SCHEMA > timestamp DateTime, product String, user_id String, action String json String ENGINE MergeTree ENGINE_SORTING_KEY timestamp The file describes the schema and how the data is sorted. In this case, the access pattern is most of the time by the `timestamp` column. If no `SORTING_KEY` is set, Tinybird picks one by default, date or datetime columns in most cases. To push the Data Source, run: ##### Push the events Data Source tb push datasources/events.datasource You can't override Data Sources. If you try to push a Data Source that already exists in your account you get an error. To override a Data Source, remove it or upload a new one with a different name. ### Define data Pipes¶ The content of the `pipes/top_product_per_day.pipe` file creates a data Pipe that transforms the data as it's inserted: NODE only_buy_events DESCRIPTION > filters all the buy events SQL > SELECT toDate(timestamp) date, product, JSONExtractFloat(json, 'price') AS price FROM events WHERE action = 'buy' NODE top_per_day SQL > SELECT date, topKState(10)(product) top_10, sumState(price) total_sales FROM only_buy_events GROUP BY date TYPE materialized DATASOURCE top_per_day_mv ENGINE AggregatingMergeTree ENGINE_SORTING_KEY date Each Pipe can have one or more nodes. The previous Pipe defines two nodes, `only_buy_events` and `top_per_day`. - The first node filters `buy` events and extracts some data from the `json` column. - The second node runs the aggregation. In general, use `NODE` to start a new node and then use `SQL >` to define the SQL for that Node. You can use other nodes inside the SQL. In this case, the second node uses the first one `only_buy_events`. To push the Pipe, run: ##### Populate tb push pipes/top_product_per_day.pipe --populate If you want to populate with the existing data in `events` table, use the `--populate` flag. When using the `--populate` flag you get a job URL so you can check the status of the job by checking the URL provided. See [Populate and copy data](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/populate-data) for more information on how populate jobs work. ### Define API Endpoints¶ API Endpoints are the way you expose the data to be consumed. The following snippet shows the content of the `endpoints/top_products.pipe` file: NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date > today() - interval 7 day GROUP BY date The syntax is the same as in the data transformation Pipes, though you can access the results through the `{% user("apiHost") %}/v0/top_products.json?token=TOKEN` endpoint. When you push an endpoint a Token with `PIPE:READ` permissions is automatically created. You can see it from the [Tokens UI](https://app.tinybird.co/tokens) or directly from the CLI with the command `tb pipe token_read `. Alternatively, you can use the `TOKEN token_name READ` command to automatically create a Token with name `token_name` with `READ` permissions over the endpoint or add `READ` permissions to the existing `token_name` over the endpoint. For example: TOKEN public_read_token READ NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date > today() - interval 7 day GROUP BY date To push the endpoint, run: ##### Push the top products Pipe tb push endpoints/top_products.pipe The Token `public_read_token` was created automatically and it's provided in the test URL. You can add parameters to any endpoint. For example, parametrize the dates to be able to filter the data between two dates: NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > % SELECT date, topKMerge(10)(top_10) AS top_10 FROM top_per_day WHERE date between {{Date(start)}} AND {{Date(end)}} GRUP BY date Now, the endpoint can receive `start` and `end` parameters: `{% user("apiHost") %}/v0/top_products.json?start=2018-09-07&end=2018-09-17&token=TOKEN`. You can print the results from the CLI using the `pipe data` command. For instance, for the previous example: ##### Print the results of the top products endpoint tb pipe data top_products --start '2018-09-07' --end '2018-09-17' --format CSV For the parameters templating to work you need to start your NODE SQL definition using the character `%`. ### Override an endpoint or a data Pipe¶ When working on a project, you might need to push several versions of the same file. You can override a Pipe that has already been pushed using the `--force` flag. For example: ##### Override the Pipe tb push endpoints/top_products_params.pipe --force If the endpoint has been called before, it runs regression tests with the most frequent requests. If the new version doesn't return the same data, then it's not pushed. You can see in the example how to run all the requests tested. You can force the push without running the checks using the `--no-check` flag if needed. For example: ##### Force override tb push endpoints/top_products_params.pipe --force --no-check ### Downloading datafiles from Tinybird¶ You can download datafiles using the `pull` command. For example: ##### Pull a specific file tb pull --match endpoint_im_working_on The previous command downloads the `endpoint_im_working_on.pipe` to the current folder. --- URL: https://www.tinybird.co/docs/cli/datafiles Content: --- title: "Datafiles · Tinybird Docs" theme-color: "#171612" description: "Datafiles describe your Tinybird resources: Data Sources, Pipes, and so on. They're the source code of your project." --- # Datafiles¶ Datafiles describe your Tinybird resources, like Data Sources, Pipes, and so on. They're the source code of your project. You can use datafiles to manage your projects as source code and take advantage of version control. Tinybird CLI helps you produce and push datafiles to the Tinybird platform. ## Types of datafiles¶ Tinybird uses the following types of datafiles: - Datasource files (.datasource) represent Data Sources. See[ Datasource files](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files) . - Pipe files (.pipe) represent Pipes of various types. See[ Pipe files](https://www.tinybird.co/docs/docs/cli/datafiles/pipe-files) . - Include files (.incl) are reusable fragments you can include in .datasource or .pipe files. See[ Include files](https://www.tinybird.co/docs/docs/cli/datafiles/include-files) . ## Syntactic conventions¶ Datafiles follow the same syntactic conventions. ### Casing¶ Instructions always appear at the beginning of a line in upper case. For example: ##### Basic syntax COMMAND value ANOTHER_INSTR "Value with multiple words" ### Multiple lines¶ Instructions can span multiples lines. For example: ##### Multiline syntax SCHEMA > `d` DateTime, `total` Int32, `from_novoa` Int16 ## File structure¶ The following example shows a typical `tinybird` project directory that includes subdirectories for supported types: ##### Example file structure tinybird ├── datasources/ │ └── connections/ │ └── my_connector_name.incl │ └── my_datasource.datasource ├── endpoints/ ├── includes/ ├── pipes/ ## Next steps¶ - Understand[ CI/CD processes on Tinybird](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/continuous-integration) . - Read about[ implementing test strategies](https://www.tinybird.co/docs/docs/work-with-data/strategies/implementing-test-strategies) . --- URL: https://www.tinybird.co/docs/cli/datafiles/datasource-files Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Datasource files · Tinybird Docs" theme-color: "#171612" description: "Datasource files describe your Data Sources. Define the schema, engine, and other settings." --- # Datasource files (.datasource)¶ Datasource files describe your Data Sources. You can use .datasource files to define the schema, engine, and other settings of your Data Sources. See [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources). ## Available instructions¶ The following instructions are available for .datasource files. | Declaration | Required | Description | | --- | --- | --- | | `SCHEMA ` | Yes | Defines a block for a Data Source schema. The block must be indented. | | `DESCRIPTION ` | No | Description of the Data Source. | | `TOKEN APPEND` | No | Grants append access to a Data Source to the token named . If the token isn't specified or doesn't exist, it will be automatically created. | | `TAGS ` | No | Comma-separated list of tags. Tags are used to[ organize your data project](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/organizing-resources) . | | `ENGINE ` | No | Sets the engine for Data Source. Default value is `MergeTree` . | | `ENGINE_SORTING_KEY ` | No | Sets the `ORDER BY` expression for the Data Source. If unset, it defaults to DateTime, numeric, or String columns, in that order. | | `ENGINE_PARTITION_KEY ` | No | Sets the `PARTITION` expression for the Data Source. | | `ENGINE_TTL ` | No | Sets the `TTL` expression for the Data Source. | | `ENGINE_VER ` | No | Column with the version of the object state. Required when using `ENGINE ReplacingMergeTree` . | | `ENGINE_SIGN ` | No | Column to compute the state. Required when using `ENGINE CollapsingMergeTree` or `ENGINE VersionedCollapsingMergeTree` . | | `ENGINE_VERSION ` | No | Column with the version of the object state. Required when `ENGINE VersionedCollapsingMergeTree` . | | `ENGINE_SETTINGS ` | No | Comma-separated list of key-value pairs that describe engine settings for the Data Source. | | `INDEXES ` | No | Defines one or more indexes for the Data Source. See[ Data Skipping Indexes](https://www.tinybird.co/docs/docs/sql-reference/engines/mergetree#data-skipping-indexes) for more information. | | `SHARED_WITH ` | No | Shares the Data Source with one or more Workspaces. Use in combination with `--user_token` with admin rights in the origin Workspace. | The following example shows a typical .datasource file: ##### tinybird/datasources/example.datasource # A comment TOKEN tracker APPEND DESCRIPTION > Analytics events **landing data source** TAGS stock, recommendations SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" ENGINE_SETTINGS "index_granularity=8192" INDEXES > INDEX idx1 action TYPE bloom_filter GRANULARITY 3 SHARED_WITH > analytics_production analytics_staging ### SCHEMA¶ A `SCHEMA` declaration is a newline, comma-separated list of columns definitions. For example: ##### Example SCHEMA declaration SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` Each column in a `SCHEMA` declaration is in the format ` ` , where: - `` is the name of the column in the Data Source. - `` is one of the supported[ Data Types](https://www.tinybird.co/docs/docs/get-data-in/data-sources#supported-data-types) . - `` is optional and only required for NDJSON Data Sources. See[ JSONpaths](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) . - `` sets a default value to the column when it's null. A common use case is to set a default date to a column, like `updated_at DateTime DEFAULT now()` . To change or update JSONPaths or other default values in the schema, push a new version of the schema using `tb push --force` or use the [alter endpoint on the Data Sources API](https://www.tinybird.co/docs/docs/api-reference/datasource-api#post--v0-datasources-(.+)-alter). ### JSONPath expressions¶ `SCHEMA` definitions support JSONPath expressions. For example: ##### Schema syntax with jsonpath DESCRIPTION Generated from /Users/username/tmp/sample.ndjson SCHEMA > `d` DateTime `json:$.d`, `total` Int32 `json:$.total`, `from_novoa` Int16 `json:$.from_novoa` See [JSONPaths](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) for more information. ### ENGINE settings¶ `ENGINE` declares the engine used for the Data Source. The default value is `MergeTree`. See [Engines](https://www.tinybird.co/docs/docs/sql-reference/engines) for more information. ## Connectors¶ Connector settings are part of the .datasource content. You can use include files to reuse connection settings and credentials. When working with connectors, it’s important to understand how tokens interact with .datasource files. If a token doesn’t exist or isn't explicitly specified in the .datasource, it will be automatically created. This ensures that connectors can establish a working connection by default. However, once a token is created and associated with a connector, it's crucial to handle it with care. Avoid deleting the token or modifying its scopes, as this can break the connection and disrupt the import process. The token is a critical component for maintaining a stable connection and ensuring that data is imported correctly. ### Kafka, Confluent, RedPanda¶ The Kafka, Confluent, and RedPanda connectors use the following settings: | Instruction | Required | Description | | --- | --- | --- | | `KAFKA_CONNECTION_NAME` | Yes | The name of the configured Kafka connection in Tinybird. | | `KAFKA_BOOTSTRAP_SERVERS` | Yes | Comma-separated list of one or more Kafka brokers, including Port numbers. | | `KAFKA_KEY` | Yes | Key used to authenticate with Kafka. Sometimes called Key, Client Key, or Username, depending on the Kafka distribution. | | `KAFKA_SECRET` | Yes | Secret used to authenticate with Kafka. Sometimes called Secret, Secret Key, or Password, depending on the Kafka distribution. | | `KAFKA_TOPIC` | Yes | Name of the Kafka topic to consume from. | | `KAFKA_GROUP_ID` | Yes | Consumer Group ID to use when consuming from Kafka. | | `KAFKA_AUTO_OFFSET_RESET` | No | Offset to use when no previous offset can be found, for example when creating a new consumer. Supported values are `latest` , `earliest` . Default: `latest` . | | `KAFKA_STORE_HEADERS` | No | Store Kafka headers as field `__headers` for later processing. Default value is `'False'` . | | `KAFKA_STORE_BINARY_HEADERS` | No | Stores all Kafka headers as binary data in field `__headers` as a binary map of type `Map(String, String)` . To access the header `'key'` run: `__headers['key']` . Default value is `'True'` . This field only applies if `KAFKA_STORE_HEADERS` is set to `True` . | | `KAFKA_STORE_RAW_VALUE` | No | Stores the raw message in its entirety as an additional column. Supported values are `'True'` , `'False'` . Default: `'False'` . | | `KAFKA_SCHEMA_REGISTRY_URL` | No | URL of the Kafka schema registry. | | `KAFKA_TARGET_PARTITIONS` | No | Target partitions to place the messages. | | `KAFKA_KEY_AVRO_DESERIALIZATION` | No | Key for decoding Avro messages. | | `KAFKA_SSL_CA_PEM` | No | CA certificate in PEM format for SSL connections. | | `KAFKA_SASL_MECHANISM` | No | SASL mechanism to use for authentication. Supported values are `'PLAIN'` , `'SCRAM-SHA-256'` , `'SCRAM-SHA-512'` . Default values is `'PLAIN'` . | The following example defines a Data Source with a new Kafka, Confluent, or RedPanda connection in a .datasource file: ##### Data Source with a new Kafka/Confluent/RedPanda connection SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id The following example defines a Data Source that uses an existing Kafka, Confluent, or RedPanda connection: ##### Data Source with an existing Kafka/Confluent/RedPanda connection SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" KAFKA_CONNECTION_NAME my_connection_name KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id Refer to the [Kafka Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/kafka), [Amazon MSK](https://www.tinybird.co/docs/docs/get-data-in/connectors/msk), [Confluent Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/confluent) , or [RedPanda Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/redpanda) documentation for more details. ### BigQuery and Snowflake¶ The BigQuery and Snowflake connectors use the following settings: | Instruction | Required | Description | | --- | --- | --- | | `IMPORT_SERVICE` | Yes | Name of the import service to use. Use `bigquery` or `snowflake` . | | `IMPORT_SCHEDULE` | Yes | Cron expression, in UTC time, with the frequency to run imports. Must be higher than 5 minutes. For example, `*/5 * * * *` . Use `@auto` to sync once per minute when using `s3` , or `@on-demand` to only run manually. | | `IMPORT_CONNECTION_NAME` | Yes | Name given to the connection inside Tinybird. For example, `'my_connection'` . | | `IMPORT_STRATEGY` | Yes | Strategy to use when inserting data. Use `REPLACE` for BigQuery and Snowflake. | | `IMPORT_EXTERNAL_DATASOURCE` | No | Fully qualified name of the source table in BigQuery and Snowflake. For example, `project.dataset.table` . | | `IMPORT_QUERY` | No | The `SELECT` query to retrieve your data from BigQuery or Snowflake when you don't need all the columns or want to make a transformation before ingest. The `FROM` clause must reference a table using the full scope. For example, `project.dataset.table` . | See [BigQuery Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/bigquery) or [Snowflake Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/snowflake) for more details. #### BigQuery example¶ The following example shows a BigQuery Data Source described in a .datasource file: ##### Data Source with a BigQuery connection DESCRIPTION > bigquery demo data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `id` Integer `json:$.id`, `orderid` LowCardinality(String) `json:$.orderid`, `status` LowCardinality(String) `json:$.status`, `amount` Integer `json:$.amount` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE bigquery IMPORT_SCHEDULE */5 * * * * IMPORT_EXTERNAL_DATASOURCE mydb.raw.events IMPORT_STRATEGY REPLACE IMPORT_QUERY > select timestamp, id, orderid, status, amount from mydb.raw.events #### Snowflake example¶ The following example shows a Snowflake Data Source described in a .datasource file: ##### tinybird/datasources/snowflake.datasource - Data Source with a Snowflake connection DESCRIPTION > Snowflake demo data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `id` Integer `json:$.id`, `orderid` LowCardinality(String) `json:$.orderid`, `status` LowCardinality(String) `json:$.status`, `amount` Integer `json:$.amount` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE snowflake IMPORT_CONNECTION_NAME my_snowflake_connection IMPORT_EXTERNAL_DATASOURCE mydb.raw.events IMPORT_SCHEDULE */5 * * * * IMPORT_STRATEGY REPLACE IMPORT_QUERY > select timestamp, id, orderid, status, amount from mydb.raw.events ### S3¶ The S3 connector uses the following settings: | Instruction | Required | Description | | --- | --- | --- | | `IMPORT_SERVICE` | Yes | Name of the import service to use. Use `s3` for S3 connections. | | `IMPORT_CONNECTION_NAME` | Yes | Name given to the connection inside Tinybird. For example, `'my_connection'` . | | `IMPORT_STRATEGY` | Yes | Strategy to use when inserting data. Use `APPEND` for S3 connections. | | `IMPORT_BUCKET_URI` | Yes | Full bucket path, including the `s3://` protocol, bucket name, object path, and an optional pattern to match against object keys. For example, `s3://my-bucket/my-path` discovers all files in the bucket `my-bucket` under the prefix `/my-path` . You can use patterns in the path to filter objects, for example, ending the path with `*.csv` matches all objects that end with the `.csv` suffix. | | `IMPORT_FROM_DATETIME` | No | Sets the date and time from which to start ingesting files on an S3 bucket. The format is `YYYY-MM-DDTHH:MM:SSZ` . | See [S3 Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/s3) for more details. #### S3 example¶ The following example shows an S3 Data Source described in a .datasource file: ##### tinybird/datasources/s3.datasource - Data Source with an S3 connection DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE s3 IMPORT_CONNECTION_NAME connection_name IMPORT_BUCKET_URI s3://my-bucket/*.csv IMPORT_SCHEDULE @auto IMPORT_STRATEGY APPEND --- URL: https://www.tinybird.co/docs/cli/datafiles/include-files Last update: 2024-12-18T11:12:31.000Z Content: --- title: "Include files · Tinybird Docs" theme-color: "#171612" description: "Include files help you organize settings so that you can reuse them across .datasource and .pipe files." --- # Include files (.incl)¶ Include files (.incl) help separate connector settings and reuse them across multiple .datasource files or .pipe templates. Include files are referenced using `INCLUDE` instruction. ## Connector settings¶ Use .incl files to separate connector settings from .datasource files. For example, the following .incl file contains Kafka Connector settings: ##### tinybird/datasources/connections/kafka\_connection.incl KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password While the .datasource file only contains a reference to the .incl file using `INCLUDE`: ##### tinybird/datasources/kafka\_ds.datasource SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/kafka_connection.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id ### Pipe nodes¶ You can use .incl datafiles to [reuse node templates](https://www.tinybird.co/docs/docs/cli/advanced-templates#reusing-templates). For example, the following .incl file contains a node template: ##### tinybird/includes/only\_buy\_events.incl NODE only_buy_events SQL > SELECT toDate(timestamp) date, product, color, JSONExtractFloat(json, 'price') as price FROM events where action = 'buy' The .pipe file starts with the `INCLUDE` reference to the template: ##### tinybird/endpoints/sales.pipe INCLUDE "../includes/only_buy_events.incl" NODE endpoint DESCRIPTION > return sales for a product with color filter SQL > % select date, sum(price) total_sales from only_buy_events where color in {{Array(colors, 'black')}} group by date A different .pipe file can reuse the sample template: ##### tinybird/pipes/top\_per\_day.pipe INCLUDE "../includes/only_buy_events.incl" NODE top_per_day SQL > SELECT date, topKState(10)(product) top_10, sumState(price) total_sales from only_buy_events group by date TYPE MATERIALIZED DATASOURCE mv_top_per_day ### Include with variables¶ You can templatize .incl files. For instance you can reuse the same .incl template with different variable values: ##### tinybird/includes/top\_products.incl NODE endpoint DESCRIPTION > returns top 10 products for the last week SQL > % select date, topKMerge(10)(top_10) as top_10 from top_product_per_day {% if '$DATE_FILTER' == 'last_week' %} where date > today() - interval 7 day {% else %} where date between {{Date(start)}} and {{Date(end)}} {% end %} group by date The `$DATE_FILTER` parameter is a variable in the .incl file. The following examples show how to create two separate endpoints by injecting a value for the `DATE_FILTER` variable. The following .pipe file references the template using a `last_week` value for `DATE_FILTER`: ##### tinybird/endpoints/top\_products\_last\_week.pipe INCLUDE "../includes/top_products.incl" "DATE_FILTER=last_week" Whereas the following .pipe file references the template using a `between_dates` value for `DATE_FILTER`: ##### tinybird/endpoints/top\_products\_between\_dates.pipe INCLUDE "../includes/top_products.incl" "DATE_FILTER=between_dates" ### Include with environment variables¶ Because you can expand `INCLUDE` files using the Tinybird CLI, you can use environment variables. For example, if you have configured the `KAFKA_BOOTSTRAP_SERVERS`, `KAFKA_KEY` , and `KAFKA_SECRET` environment variables, you can create an .incl file as follows: ##### tinybird/datasources/connections/kafka\_connection.incl KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS ${KAFKA_BOOTSTRAP_SERVERS} KAFKA_KEY ${KAFKA_KEY} KAFKA_SECRET ${KAFKA_SECRET} You can then use the values in your .datasource datafiles: ##### tinybird/datasources/kafka\_ds.datasource SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/kafka_connection.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id Alternatively, you can create separate .incl files per environment variable: ##### tinybird/datasources/connections/kafka\_connection\_prod.incl KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS production_servers KAFKA_KEY the_kafka_key KAFKA_SECRET ${KAFKA_SECRET} ##### tinybird/datasources/connections/kafka\_connection\_stg.incl KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS staging_servers KAFKA_KEY the_kafka_key KAFKA_SECRET ${KAFKA_SECRET} And then include both depending on the environment: ##### tinybird/datasources/kafka\_ds.datasource SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/kafka_connection_${TB_ENV}.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id Where `$TB_ENV` is one of `stg` or `prod`. See [deploy to staging and production environments](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/staging-and-production-workspaces) to learn how to leverage environment variables. --- URL: https://www.tinybird.co/docs/cli/datafiles/pipe-files Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Pipe files · Tinybird Docs" theme-color: "#171612" description: "Pipe files describe your Tinybird Pipes. Define the type, Data Source, and other settings." --- # Pipe files (.pipe)¶ Pipe files describe your Pipes. You can use .pipe files to define the type, starting node, Data Source, and other settings of your Pipes. See [Data Sources](https://www.tinybird.co/docs/docs/work-with-data/query/pipes). ## Available instructions¶ The following instructions are available for .pipe files. | Instruction | Required | Description | | --- | --- | --- | | `%` | No | Use as the first character of a node to indicate the node uses the[ templating system](https://www.tinybird.co/docs/docs/cli/template-functions) . | | `DESCRIPTION ` | No | Sets the description for a node or the complete file. | | `TAGS ` | No | Comma-separated list of tags. Tags are used to[ organize your data project](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/organizing-resources) . | | `NODE ` | Yes | Starts the definition of a new node. All the instructions until a new `NODE` instruction or the end of the file are related to this node. | | `SQL ` | Yes | Defines a block for the SQL of a node. The block must be indented. | | `INCLUDE ` | No | Includes are pieces of a Pipe that you can reuse in multiple .pipe files. | | `TYPE ` | No | Sets the type of the node. Valid values are `ENDPOINT` , `MATERIALIZED` , `COPY` , or `SINK` . | | `DATASOURCE ` | Yes | Required when `TYPE` is `MATERIALIZED` . Sets the destination Data Source for materialized nodes. | | `TARGET_DATASOURCE ` | Yes | Required when `TYPE` is `COPY` . Sets the destination Data Source for copy nodes. | | `TOKEN READ` | No | Grants read access to a Pipe or Endpoint to the token named . If the token isn't specified or doesn't exist, it will be automatically created. | | `COPY_SCHEDULE` | No | Cron expression with the frequency to run copy jobs. Must be higher than 5 minutes. For example, `*/5 * * * *` . If undefined, it defaults to `@on-demand` . | | `COPY_MODE` | No | Strategy to ingest data for copy jobs. One of `append` or `replace` . If empty, the default strategy is `append` . | ## Materialized Pipe¶ In a .pipe file you can define how to materialize each row ingested in the earliest Data Source in the Pipe query to a materialized Data Source. Materialization happens at ingest. The following example shows how to describe a Materialized Pipe. See [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views). ##### tinybird/pipes/sales\_by\_hour\_mv.pipe DESCRIPTION Materialized Pipe to aggregate sales per hour in the sales_by_hour Data Source NODE daily_sales SQL > SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales FROM teams GROUP BY day, country TYPE MATERIALIZED DATASOURCE sales_by_hour ## Copy Pipe¶ In a .pipe file you can define how to export the result of a Pipe to a Data Source, optionally with a schedule. The following example shows how to describe a Copy Pipe. See [Copy Pipes](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes). ##### tinybird/pipes/sales\_by\_hour\_cp.pipe DESCRIPTION Copy Pipe to export sales hour every hour to the sales_hour_copy Data Source NODE daily_sales SQL > % SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales FROM teams WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now()) and country = {{ String(country, 'US')}} GROUP BY day, country TYPE COPY TARGET_DATASOURCE sales_hour_copy COPY_SCHEDULE 0 * * * * ## API Endpoint Pipe¶ In a .pipe file you can define how to export the result of a Pipe as an HTTP endpoint. The following example shows how to describe an API Endpoint Pipe. See [API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints). ##### tinybird/pipes/sales\_by\_hour\_endpoint.pipe TOKEN dashboard READ DESCRIPTION endpoint to get sales by hour filtering by date and country TAGS sales NODE daily_sales SQL > % SELECT day, country, sum(total_sales) as total_sales FROM sales_by_hour WHERE day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now()) and country = {{ String(country, 'US')}} GROUP BY day, country NODE result SQL > % SELECT * FROM daily_sales LIMIT {{Int32(page_size, 100)}} OFFSET {{Int32(page, 0) * Int32(page_size, 100)}} TYPE ENDPOINT ## Sink Pipe¶ The following parameters are available when defining Sink Pipes: | Instruction | Required | Description | | --- | --- | --- | | `EXPORT_SERVICE` | Yes | One of `gcs_hmac` , `s3` , `s3_iamrole` , or `kafka` . | | `EXPORT_CONNECTION_NAME` | Yes | The name of the export connection. | | `EXPORT_SCHEDULE` | No | Cron expression, in UTC time. Must be higher than 5 minutes. For example, `*/5 * * * *` . | ### Blob storage Sink¶ When setting `EXPORT_SERVICE` as one of `gcs_hmac`, `s3` , or `s3_iamrole` , you can use the following instructions: | Instruction | Required | Description | | --- | --- | --- | | `EXPORT_BUCKET_URI` | Yes | The desired bucket path for the exported file. Path must not include the filename and extension. | | `EXPORT_FILE_TEMPLATE` | Yes | Template string that specifies the naming convention for exported files. The template can include dynamic attributes between curly braces based on columns' data that will be replaced with real values when exporting. For example: `export_{category}{date,'%Y'}{2}` . | | `EXPORT_FORMAT` | Yes | Format in which the data is exported. The default value is `csv` . | | `EXPORT_COMPRESSION` | No | Compression file type. Accepted values are `none` , `gz` for gzip, `br` for brotli, `xz` for LZMA, `zst` for zstd. Default values is `none` . | | `EXPORT_STRATEGY` | Yes | One of the available strategies. The default is `@new` . | ### Kafka Sink¶ Kafka Sinks are currently in private beta. If you have any feedback or suggestions, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). When setting `EXPORT_SERVICE` as `kafka` , you can use the following instructions: | Instruction | Required | Description | | --- | --- | --- | | `EXPORT_KAFKA_TOPIC` | Yes | The desired topic for the export data. | --- URL: https://www.tinybird.co/docs/cli/install Last update: 2024-12-18T11:12:31.000Z Content: --- title: "Install Tinybird CLI · Tinybird Docs" theme-color: "#171612" description: "Install the Tinybird CLI on Linux or macOS, or use the prebuilt Docker image." --- # Install the Tinybird CLI¶ You can install Tinybird on your local machine or use a prebuilt Docker image. Read on to learn how to install and configure Tinybird CLI for use. ## Installation¶ Install Tinybird CLI locally to use it on your machine. ### Prerequisites¶ Tinybird CLI supports Linux and macOS 10.14 and higher. Supported Python versions are 3.8, 3.9, 3.10, 3.11, 3.12. ### Install tinybird-cli¶ Create a virtual environment before installing the `tinybird-cli` package: ##### Creating a virtual environment for Python 3 python3 -m venv .venv source .venv/bin/activate Then, install `tinybird-cli`: ##### Install tinybird-cli pip install tinybird-cli To update the `tinybird-cli` package, run the following command: ##### Update tinybird-cli pip install --upgrade tinybird-cli ## Docker image¶ The official `tinybird-cli-docker` image provides a Tinybird CLI executable ready to use in your projects and pipelines. To run Tinybird CLI using Docker from the terminal, run the following commands: ##### Build local image setting your project path # Assuming a projects/data path docker run -v ~/projects/data:/mnt/data -it tinybirdco/tinybird-cli-docker cd mnt/data ## Authentication¶ Before you start using Tinybird CLI, check that you can authenticate by running `tb auth`: ##### Authenticate tb auth -i A list of available regions appears. Select your Tinybird region, then provide your admin Token. See [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens). You can also pass the Token directly with the `--token` flag. For example: ##### Authenticate tb auth --token See the API Reference docs for the [list of supported regions](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) . You can also get the list using `tb auth ls`. The `tb auth` command saves your credentials in a .tinyb file in your current directory. Add it to your .gitignore file to avoid leaking credentials. ## Integrated help¶ After you've installed the Tinybird CLI you can access the integrated help by using the `--help` flag: ##### Integrated help tb --help You can do the same for every available command. For example: ##### Integrated command help tb datasource --help ## Telemetry¶ Starting from version 1.0.0b272, the Tinybird CLI collects telemetry on the use of the CLI commands and information about exceptions and crashes and sends it to Tinybird. Telemetry helps Tinybird improve the command-line experience. On each `tb` execution, the CLI collects information about your system, Python environment, the CLI version installed and the command you ran. All data is completely anonymous. To opt out of the telemetry feature, set the `TB_CLI_TELEMETRY_OPTOUT` environment variable to `1` or `true`. ## Configure your shell prompt¶ You can extract the current Tinybird Workspace and region from your .tinyb file and show it in your zsh or bash shell prompt. To extract the information programmatically, paste the following function to your shell config file: ##### Parse the .tinyb file to use the output in the PROMPT prompt_tb() { if [ -e ".tinyb" ]; then TB_CHAR=$'\U1F423' branch_name=`grep '"name":' .tinyb | cut -d : -f 2 | cut -d '"' -f 2` region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 2 | cut -d : -f 1` if [ "$region" = "tinybird" ]; then region=`grep '"host":' .tinyb | cut -d / -f 3 | cut -d . -f 1` fi TB_BRANCH="${TB_CHAR}tb:${region}=>${branch_name}" else TB_BRANCH='' fi echo $TB_BRANCH } When the function is available, you need to make the output visible on the prompt of your shell. The following example shows how to do this for zsh: ##### Include Tinybird information in the zsh prompt echo 'export PROMPT="' $PS1 ' $(prompt_tb)"' >> ~/.zshrc Restart your shell and go to the root of your project to see the Tinybird region and Workspace in your prompt. --- URL: https://www.tinybird.co/docs/cli/local-container Last update: 2025-01-16T14:47:27.000Z Content: --- title: "Tinybird Local container · Tinybird Docs" theme-color: "#171612" description: "Learn how to run Tinybird locally using the local container." --- # Tinybird Local container¶ You can run your own Tinybird instance locally using the `tinybird-local` container. This is useful for testing and development. For example, you can test Data Sources and Pipes in your data project before deploying them to production. Tinybird Local doesn't include the following features: - Tinybird UI - Connectors - Scheduled operations - Batch operations ## Prerequisites¶ To get started, you need a container runtime, like Docker or podman. ## Run Tinybird Local¶ To run Tinybird locally, run the following command: docker run --platform linux/amd64 -p 80:80 --name tinybird-local -d tinybirdco/tinybird-local:latest By default, Tinybird Local runs on port 80, although you can expose it locally using any other port. ## Local authentication¶ To authenticate with Tinybird Local, retrieve the Workspace admin token and pass it through the CLI: TOKEN=$(curl -s http://localhost:80/tokens | jq -r ".workspace_admin_token") tb --host http://localhost:80 --token $TOKEN auth After you've authenticated, you can get the default Workspace with the `tb workspace ls` CLI command. For example: tb workspace ls ** Workspaces: -------------------------------------------------------------------------------------------- | name | id | role | plan | current | -------------------------------------------------------------------------------------------- | Tinybird_Local_Testing | 7afc6330-3aae-4df5-8712-eaad216c5d7d | admin | Custom | True | -------------------------------------------------------------------------------------------- ## Next steps¶ - Learn about datafiles and their format. See[ Datafiles](https://www.tinybird.co/docs/docs/cli/datafiles) . - Learn how advanced templates can help you. See[ Advanced templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) . - Browse the full CLI reference. See[ Command reference](https://www.tinybird.co/docs/docs/cli/command-ref) . --- URL: https://www.tinybird.co/docs/cli/quick-start Last update: 2025-01-09T09:46:35.000Z Content: --- title: "Quick start · Tinybird Docs" theme-color: "#171612" description: "Get started with Tinybird CLI as quickly as possible. Ingest, query, and publish data in minutes." --- # Quick start for Tinybird command-line interface¶ With Tinybird, you can ingest data from anywhere, query and transform it using SQL, and publish your data as high-concurrency, low-latency REST API endpoints. After you've [familiarized yourself with Tinybird](https://www.tinybird.co/docs/docs/get-started/quick-start) , you're ready to start automating and scripting the management of your Workspace using the Tinybird command-line interface (CLI). The Tinybird CLI is essential for all [CI/CD workflows](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/continuous-integration). Read on to learn how to download and configure the Tinybird CLI, create a Workspace, ingest data, create a query, publish an API, and confirm your setup works properly. ## Step 1: Create your Tinybird account¶ [Create a Tinybird account](https://www.tinybird.co/signup) . It's free and no credit card is required. See [Tinybird pricing plans](https://www.tinybird.co/docs/docs/get-started/plans/billing) for more information. [Sign up for Tinybird](https://www.tinybird.co/signup) ## Step 2: Download and install the Tinybird CLI¶ [Follow the instructions](https://www.tinybird.co/docs/docs/cli/install) to download and install the Tinybird command-line interface (CLI). Complete the setup and authenticate with your Tinybird account in the cloud and region you prefer. ## Step 3: Create your Workspace¶ A [Workspace](https://www.tinybird.co/docs/docs/get-started/administration/workspaces) is an area that contains a set of Tinybird resources, including Data Sources, Pipes, nodes, API Endpoints, and Tokens. Create a Workspace named `customer_rewards` . Use a unique name. tb workspace create customer_rewards ## Step 4: Download and ingest sample data¶ Download the following sample data from a fictitious online coffee shop: [Download data file](https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-05.parquet) The following Tinybird CLI commands infer the schema from the datafile, generate and push a .datasource file and ingest the data. tb datasource generate orders.ndjson # Infer the schema tb push orders.datasource # Upload the datasource file tb datasource append orders orders.ndjson # Ingest the data ## Step 5: Query data using a Pipe and Publish it as an API¶ In Tinybird, you can create [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) to query your data using SQL. The following commands create a Pipe with an SQL instruction that returns the number of records Tinybird has ingested from the data file: tb pipe generate rewards 'select count() from orders' tb push rewards.pipe When you push a Pipe, Tinybird publishes it automatically as a high-concurrency, low-latency API Endpoint. ## Step 6: Call the API Endpoint¶ You can test your API Endpoint using a curl command. First, create and obtain the read Token for the API Endpoint. tb token create static rewards_read_token --scope PIPES:READ --resource rewards tb token copy rewards_read_token Copy the read Token and insert it into a curl command. curl --compressed -H 'Authorization: Bearer your_read_token_here' https://api.us-east.aws.tinybird.co/v0/pipes/rewards.json You have now created your first API Endpoint in Tinybird using the CLI. ## Next steps¶ - Learn about datafiles and their format. See[ Datafiles](https://www.tinybird.co/docs/docs/cli/datafiles) . - Learn how advanced templates can help you. See[ Advanced templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) . - Browse the full CLI reference. See[ Command reference](https://www.tinybird.co/docs/docs/cli/command-ref) . --- URL: https://www.tinybird.co/docs/cli/template-functions Last update: 2025-01-07T15:54:46.000Z Content: --- title: "Template functions · Tinybird Docs" theme-color: "#171612" description: "Template functions available in Tinybird datafiles." --- # Template functions¶ The following template functions are available. You can use them in [datafiles](https://www.tinybird.co/docs/docs/cli/datafiles) to accomplish different tasks. See [Advanced templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) for more information on templating. ## defined¶ Checks whether a variable is defined. ##### defined function % SELECT date FROM my_table {% if defined(param) %} WHERE ... {% end %} ## column¶ Retrieves the column by its name from a variable. ##### column function % {% set var_1 = 'name' %} SELECT {{column(var_1)}} FROM my_table ## columns¶ Retrieves columns by their name from a variable. ##### columns function % {% set var_1 = 'name,age,address' %} SELECT {{columns(var_1)}} FROM my_table ## date\_diff\_in\_seconds¶ Returns the absolute value of the difference in seconds between two `DateTime` . See [DateTime](https://www.tinybird.co/docs/docs/sql-reference/data-types/datetime). The function accepts the following parameters: - `date_1` : the first date or DateTime. - `date_2` : the second date or DateTime. - `date_format` : (optional) the format of the dates. Defaults to `'%Y-%m-%d %H:%M:%S'` , so you can pass `DateTime` as `YYYY-MM-DD hh:mm:ss` when calling the function. - `backup_date_format` : (optional) the format of the dates if the first format doesn't match. Use it when your default input format is a DateTime ( `2022-12-19 18:42:22` ) but you receive a date instead ( `2022-12-19` ). - `none_if_error` : (optional) whether to return `None` if the dates don't match the provided formats. Defaults to `False` . Use it to provide an alternate logic in case any of the dates are specified in a different format. An example of how to use the function: date_diff_in_seconds('2022-12-19T18:42:23.521Z', '2022-12-19T18:42:23.531Z', date_format='%Y-%m-%dT%H:%M:%S.%fz') The following example shows how to use the function in a datafile: ##### date\_diff\_in\_seconds function % SELECT date, events {% if date_diff_in_seconds(date_end, date_start, date_format="%Y-%m-%dT%H:%M:%Sz") < 3600 %} FROM my_table_raw {% else %} FROM my_table_hourly_agg {% end %} WHERE date BETWEEN parseDateTimeBestEffort({{String(date_start,'2023-01-11T12:24:04Z')}}) AND parseDateTimeBestEffort({{String(date_end,'2023-01-11T12:24:05Z')}}) See [working with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) for more information on how to work with time in Tinybird. ## date\_diff\_in\_minutes¶ Same behavior as [date_diff_in_seconds](https://www.tinybird.co/docs/about:blank#date_diff_in_seconds) , but returns the difference in minutes. ## date\_diff\_in\_hours¶ Same behavior as [date_diff_in_seconds](https://www.tinybird.co/docs/about:blank#date_diff_in_seconds) , but returns the difference in hours. ## date\_diff\_in\_days¶ Returns the absolute value of the difference in days between two dates or DateTime. ##### date\_diff\_in\_days function % SELECT date FROM my_table {% if date_diff_in_days(date_end, date_start) < 7 %} WHERE ... {% end %} `date_format` is optional and defaults to `'%Y-%m-%d` , so you can pass DateTime as `YYYY-MM-DD` when calling the function. As with `date_diff_in_seconds`, `date_diff_in_minutes` , and `date_diff_in_hours` , other [date_formats](https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes) are supported. ## split\_to\_array¶ Splits comma separated values into an array. The function accepts the following parameters: `split_to_array(arr, default, separator=',')` - `arr` : the value to split. - `default` : the default value to use if `arr` is empty. - `separator` : the separator to use. Defaults to `,` . The following example splits `code` into an array of integers: ##### split\_to\_array function % SELECT arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes FROM my_table The following example splits `param` into an array of strings using `|` as the custom separator: ##### split\_to\_array with a custom separator function % SELECT {{split_to_array(String(param, 'hi, how are you|fine thanks'), separator='|')}} ## enumerate\_with\_last¶ Creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. You can use it alongside the [split_to_array function](https://www.tinybird.co/docs/about:blank#split_to_array). ## symbol¶ Retrieves the value of a variable. The function accepts the following parameters: `symbol(x, quote)` For example: ##### enumerate\_with\_last function % SELECT {% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %} sum({{symbol(_x)}}) as {{symbol(_x)}} {% if not _last %}, {% end %} {% end %} FROM my_table ## sql\_and¶ Creates a list of "WHERE" clauses, along with "AND" separated filters, that checks if a field () is or isn't () in a list/tuple (). The function accepts the following parameters: `sql_and(__= [, ...] )` - `` : any column in the table. - `` : one of: `in` , `not_in` , `gt` (>), `lt` (<), `gte` (>=), `lte` (<=) - `` : any of the transform type functions ( `Array(param, 'Int8')` , `String(param)` , etc.). If one parameter isn't specified, then the filter is ignored. For example: ##### sql\_and function % SELECT * FROM my_table WHERE 1 {% if defined(param) or defined(param2_not_in) %} AND {{sql_and( param__in=Array(param, 'Int32', defined=False), param2__not_in=Array(param2_not_in, 'String', defined=False))}} {% end %} If this is queried with `param=1,2` and `param2_not_in=ab,bc,cd` , it translates to: ##### sql\_and function - generated sql SELECT * FROM my_table WHERE 1 AND param IN [1,2] AND param2 NOT IN ['ab','bc','cd'] If this is queried with `param=1,2` only, but `param2_not_in` isn't specified, it translates to: ##### sql\_and function - generated sql param missing SELECT * FROM my_table WHERE 1 AND param IN [1,2] ## Transform types functions¶ The following functions validate the type of a template variable and cast it to the desired data type. They also provide a default value if no value is passed. - `Boolean(x)` - `DateTime64(x)` - `DateTime(x)` - `Date(x)` - `Float32(x)` - `Float64(x)` - `Int8(x)` - `Int16(x)` - `Int32(x)` - `Int64(x)` - `Int128(x)` - `Int256(x)` - `UInt8(x)` - `UInt16(x)` - `UInt32(x)` - `UInt64(x)` - `UInt128(x)` - `UInt256(x)` - `String(x)` - `Array(x)` Each function accepts the following parameters: `type(x, default, description=, required=)` - `x` : the parameter or value. - `default` : (optional) the default value to use if `x` is empty. - `description` : (optional) the description of the value. - `required` : (optional) whether the value is required. For example, `Int32` in the following query, `lim` is the parameter to be cast to an `Int32`, `10` is the default value, and so on: ##### transform\_type\_functions % SELECT * FROM TR LIMIT {{Int32(lim, 10, description="Limit the number of rows in the response", required=False)}} --- URL: https://www.tinybird.co/docs/cli/workspaces Last update: 2024-12-18T11:12:31.000Z Content: --- title: "Manage Workspaces using the CLI · Tinybird Docs" theme-color: "#171612" description: "Learn how to switch between different Tinybird Workspaces and how to manage members using the CLI." --- # Manage Workspaces using the CLI¶ If you are a member of different Workspaces, you might need to frequently switch between Workspaces when working on a project using Tinybird CLI. This requires to authenticate and select the right Workspace. ## Authenticate¶ Authenticate using the admin Token. For example: ##### Authenticate tb auth --token ## List Workspaces¶ List the Workspaces you have access to, and the one that you're currently authenticated to: ##### List Workspaces tb workspace ls ## Create a Workspace¶ You can create new empty Workspaces or create a Workspace from a template. To create Workspaces using Tinybird CLI, you need [your user Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#your-user-token). Run the following command to create a Workspace following instructions: ##### Authenticate tb workspace create You can create a Workspace directly by defining the user Token using the `--user_token` flag: ##### Authenticate tb workspace create workspace_name --user_token ## Switch to another Workspace¶ You can switch to another Workspace using `--use` . For example: ##### Switch to a Workspace using the Workspace id or the Workspace name # Use the Workspace ID tb workspace use 841717b1-2472-44f9-9a81-42f1263cabe7 # Use the Workspace name tb workspace use Production To find out the IDs and names of available Workspaces, run `tb workspace ls`: tb workspace ls You can also check which Workspace you're currently in: ##### Show current Workspace tb workspace current ## Manage Workspace members¶ You can manage Workspace members using the `workspace members` commands. ### List members¶ To list members, run `tb workspace members ls` . For example: ##### Listing current Workspace members tb workspace members ls ### Add members¶ To add members, run `tb workspace members add` . For example: ##### Adding users to the current Workspace tb workspace members add "user1@example.com,user2@example.com,user3@example.com" ### Remove members¶ To remove members, run `tb workspace members rm` . For example: ##### Removing members from current Workspace tb workspace members rm user3@example.com You can also manage roles. For example, to set a user as admin: ##### Add admin role to user tb workspace members set-role admin user@example.com --- URL: https://www.tinybird.co/docs/get-data-in Content: --- title: "Get data in · Tinybird Docs" theme-color: "#171612" description: "Tinybird can get your data in from a variety of sources, then create Data Sources that can be queried, published, materialized, and more." --- # Get your data into Tinybird¶ You can bring your data into Tinybird from a variety of sources, then create Tinybird Data Sources that you can query, publish, materialize, and more. To get started, you only need to select your Data Source and get connected. ## Create your schema¶ You define schemas using a specific file type: .datasource files. See [.datasource](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files). You can either define the schema first or send data directly and let Tinybird infer the schema. ### Define the schema yourself¶ If you want to define your schema first, use one of the following methods: #### Option 1: Create an empty Data Source in the UI¶ Follow these steps to create an empty Data Source: 1. Select the** +** icon then "Data Source". 2. Select the** Write schema** option. 3. Review the .datasource file that appears. Update the generated parameters to match your desired schema. 4. Select** Next** . 5. Review your column names and types and rename the Data Source. 6. Select** Create Data Source** . Make sure you understand the definition and syntax for the schema, JSONPaths, engine, partition key, sorting keys, and TTL. They are all essential for efficient data operations later down the line. #### Option 2: Upload a .datasource file with the desired schema¶ You can define your schema locally in a .datasource file. Drag the .datasource file to the UI and import the resource. You can also select the **+** icon followed by **Write schema** and drag the file onto the dialog. #### Other options¶ Among other options, you can create a data source in the following ways: - Use the[ Data Sources API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/datasource-api) . - Use the[ Tinybird CLI](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files) . ### Send data and use inferred schema¶ If you want to send data directly and let Tinybird infer the schema, use one of the Connectors, upload a CSV, NDJSON, or Parquet file, or send data to the [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api#send-individual-json-events). To use the Events API to infer the schema, send a single event over the API and let Tinybird create the schema automatically. If the schema is incorrect, go to the **Data Source schema** screen and download the schema as a .datasource file. Edit the file to adjust the schema if required, then drag the file back. You can't override an existing Data Source. Delete the existing one or rename the new Data Source. ## Supported data types, file types, and compression formats¶ See [Concepts > Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources#supported-data-types) for more information on supported types and formats. ## Update your schema¶ If your data doesn't match the schema, it ends up in the quarantine Data Source. See [Quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine). You might also want to change your schema for optimization purposes. See [Iterate a Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source). ## Limits¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. --- URL: https://www.tinybird.co/docs/get-data-in/connectors Content: --- title: "Connectors · Tinybird Docs" theme-color: "#171612" description: "Learn how to connect to your data sources using Tinybird connectors." --- # Connectors¶ Tinybird Connectors are native integrations that let you seamlessly connect to and ingest data from popular data platforms and services. They provide a managed solution to stream or batch import data into Tinybird with minimal configuration. The following Connectors are available: - [ Amazon DynamoDB](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) - [ Amazon MSK](https://www.tinybird.co/docs/docs/get-data-in/connectors/msk) - [ Amazon S3](https://www.tinybird.co/docs/docs/get-data-in/connectors/s3) - [ Confluent](https://www.tinybird.co/docs/docs/get-data-in/connectors/confluent) - [ Google BigQuery](https://www.tinybird.co/docs/docs/get-data-in/connectors/bigquery) - [ Kafka](https://www.tinybird.co/docs/docs/get-data-in/connectors/kafka) - [ Redpanda](https://www.tinybird.co/docs/docs/get-data-in/connectors/redpanda) - [ Snowflake](https://www.tinybird.co/docs/docs/get-data-in/connectors/snowflake) Each Connector is fully managed by Tinybird and requires minimal setup - typically just authentication credentials and basic configuration. They handle the complexities of: - Authentication and secure connections - Schema detection and mapping - Incremental updates and change data capture - Error handling and monitoring - Scheduling and orchestration You can configure Connectors through either the Tinybird UI or CLI, making it easy to incorporate them into your data workflows and CI/CD pipelines. --- URL: https://www.tinybird.co/docs/get-data-in/connectors/bigquery Last update: 2025-01-07T15:48:10.000Z Content: --- title: "BigQuery Connector · Tinybird Docs" theme-color: "#171612" description: "Documentation for how to use the Tinybird BigQuery Connector" --- # BigQuery Connector¶ Use the BigQuery Connector to load data from BigQuery into Tinybird so that you can quickly turn it into high-concurrency, low-latency API Endpoints. You can load full tables or the result of an SQL query. The BigQuery Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you. With the BigQuery Connector you can: - Connect to your BigQuery database with a handful of clicks. Select which tables to sync and set the schedule. - Use an SQL query to get the data you need from BigQuery and then run SQL queries on that data in Tinybird. - Use authentication tokens to control access to API endpoints. Implement access policies as you need, with support for row-level security. Check the [use case examples](https://github.com/tinybirdco/use-case-examples) repository for examples of BigQuery Data Sources iteration using Git integration. The BigQuery Connector can't access BigQuery external tables, like connected Google Sheets. If you need this functionality, reach out to [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). ## Prerequisites¶ - Tinybird CLI. See[ the Tinybird CLI quick start](https://www.tinybird.co/docs/docs/cli/quick-start) . - Tinybird CLI[ authenticated with the desired Workspace](https://www.tinybird.co/docs/docs/cli/install) . You can switch the Tinybird CLI to the correct Workspace using `tb workspace use `. To use version control, connect your Tinybird Workspace with [your repository](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/working-with-version-control#connect-your-workspace-to-git-from-the-cli) , and set the [CI/CD configuration](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/continuous-integration) . For testing purposes, use a different connection than in the main branches or Workspaces. For instance to create the connections in the main branch or Workspace using the CLI: tb auth # Use the main Workspace admin Token tb connection create bigquery # Prompts are interactive and ask you to insert the necessary information You can only create connections in the main Workspace. Even when creating the connection in the branch or as part of a Data Source creation flow, it's created in the main workspace and from there it's available for every branch. ## Load a BigQuery table¶ ### Load a BigQuery table in the UI¶ Open the [Tinybird UI](https://app.tinybird.co/) and add a new Data Source by clicking **Create new (+)** next to the Data Sources section on the left hand side navigation bar (see Mark 1 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-1.png&w=3840&q=75) In the modal, select the BigQuery option from the list of Data Sources. The next modal screen shows the **Connection details** . Follow the instructions and configure access to your BigQuery. Access the GCP IAM Dashboard by selecting the **IAM & Admin** link, and use the provided **principal** name from this modal. In the GCP IAM Dashboard, click the **Grant Access** button (see Mark 1 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-4.png&w=3840&q=75) In the box that appears on the right-hand side, paste the **principal** name you just copied into the **New principals** box (see Mark 1 below). Next, in the **Role** box, find and select the role **BigQuery Data Viewer** (see Mark 2 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-5.png&w=3840&q=75) Click **Save** to complete. The principal should now be listed in the **View By Principals** list (see Mark 1 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-7.png&w=3840&q=75) OK! Now return to the Tinybird UI. In the modal, click **Next** (see Mark 1 below). Note: It can take a few seconds for the GCP permissions to apply. The next screen allows you to browse the tables available in BigQuery, and select the table you wish to load. Start by selecting the **project** that the table belongs to (see Mark 1 below), then the **dataset** (see Mark 2 below) and finally the **table** (see Mark 3 below). Finish by clicking **Next** (see Mark 4 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-9.png&w=3840&q=75) Note: the maximum allowed table size is 50 million rows, the result will be truncated if it exceeds that limit. You can now configure the schedule on which you wish to load data. You can configure a schedule in minutes, hours, or days by using the drop down selector, and set the value for the schedule in the text field (see Mark 1 below). The screenshot below shows a schedule of 10 minutes. Next, you can configure the **Import Strategy** . The strategy **Replace data** is selected by default (see Mark 2 below). Finish by clicking **Next** (see Mark 3 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-10.png&w=3840&q=75) Note: the maximum allowed frequency is 5 minutes. The final screen of the modal shows you the interpreted schema of the table, which you can edit as needed. You can also modify what the Data Source in Tinybird will be called by changing the name at the top (see Mark 1 below). To finish, click **Create Data Source** (see Mark 2 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-11.png&w=3840&q=75) You are now on the Data Source data page, where you can view the data that has been loaded (see Mark 1 below) and a status chart showing executions of the loading schedule (see Mark 2 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-12.png&w=3840&q=75) ### Load a BigQuery table in the CLI¶ You need to create a connection before you can load a BigQuery table into Tinybird using the CLI. Creating a connection grants your Tinybird Workspace the appropriate permissions to view data from BigQuery. [Authenticate your CLI](https://www.tinybird.co/docs/docs/cli/install#authentication) and switch to the desired Workspace. Then run: tb connection create bigquery The output of this command includes instructions to configure a GCP principal with read only access to your data in BigQuery. The instructions include the URL to access the appropriate page in GCP's IAM Dashboard. Copy the **principal name** shown in the output. In the GCP IAM Dashboard, select the **Grant Access** button (see Mark 1 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-4.png&w=3840&q=75) In the box that appears on the right-hand side, paste the **principal** name you just copied into the **New principals** box (see Mark 1 below). Next, in the **Role** box, find and select the role **BigQuery Data Viewer** (see Mark 2 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-5.png&w=3840&q=75) Click **Save** to complete. The principal should now be listed in the **View By Principals** list (see Mark 1 below). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-sync-first-table-ui-7.png&w=3840&q=75) Note: It can take a few seconds for the GCP permissions to apply. Once done, select **yes** (y) to create the connection. A new `bigquery.connection` file is created in your project files. Note: At the moment, the `.connection` file isn't used and can't be pushed to Tinybird. It is safe to delete this file. A future release will allow you to push this file to Tinybird to automate creation of connections, similar to Kafka connection. Now that your connection is created, you can create a Data Source and configure the schedule to import data from BigQuery. The BigQuery import is configured using the following options, which can be added at the end of your .datasource file: - `IMPORT_SERVICE` : name of the import service to use, in this case, `bigquery` - `IMPORT_SCHEDULE` : a cron expression (UTC) with the frequency to run imports, must be higher than 5 minutes, e.g. `*/5 * * * *` - `IMPORT_STRATEGY` : the strategy to use when inserting data, either `REPLACE` or `APPEND` - `IMPORT_EXTERNAL_DATASOURCE` : (optional) the fully qualified name of the source table in BigQuery e.g. `project.dataset.table` - `IMPORT_QUERY` : (optional) the SELECT query to extract your data from BigQuery when you don't need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope: `project.dataset.table` Both `IMPORT_EXTERNAL_DATASOURCE` and `IMPORT_QUERY` are optional, but you must provide one of them for the connector to work. Note: For `IMPORT_STRATEGY` only `REPLACE` is supported today. The `APPEND` strategy will be enabled in a future release. For example: ##### bigquery.datasource file DESCRIPTION > bigquery demo data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `id` Integer `json:$.id`, `orderid` LowCardinality(String) `json:$.orderid`, `status` LowCardinality(String) `json:$.status`, `amount` Integer `json:$.amount` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE bigquery IMPORT_SCHEDULE */5 * * * * IMPORT_EXTERNAL_DATASOURCE mydb.raw.events IMPORT_STRATEGY REPLACE IMPORT_QUERY > select timestamp, id, orderid, status, amount from mydb.raw.events The columns you select in the `IMPORT_QUERY` must match the columns defined in the Data Source schema. For example, if your Data Source has the columns `ColumnA, ColumnB` then your `IMPORT_QUERY` must contain `SELECT ColumnA, ColumnB FROM ...` . A mismatch of columns causes data to arrive in the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine). With your connection created and Data Source defined, you can now push your project to Tinybird using: tb push The first run of the import will begin on the next lapse of the CRON expression. ## Configure granular permissions¶ If you need to configure more granular permissions for BigQuery, you can always grant access at dataset or individual object level. The first step is creating a new role in your [IAM & Admin Console in GCP](https://console.cloud.google.com/iam-admin/roles/create) , and assigning the `resourcemanager.projects.get` permission. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-custom-role-1.png&w=3840&q=75) The Connector needs this permission to list the available projects the generated Service Account has access to, so you can explore the BigQuery tables and views in the Tinybird UI. After that, you can grant permissions to specific datasets to the Service Account by clicking on **Sharing** > **Permissions**: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-custom-role-2.png&w=3840&q=75) Then **ADD PRINCIPAL**: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-custom-role-3.png&w=3840&q=75) And finally paste the **principal** name copied earlier into the **New principals** box. Next, in the **Role** box, find and select the role **BigQuery Data Viewer**: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-bigquery-connector-custom-role-4.png&w=3840&q=75) Now the Tinybird Connector UI only shows the specific resources you've granted permissions to. ## Schema evolution¶ The BigQuery Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in BigQuery, the next sync job will automatically add it to the Tinybird Data Source. Non-backwards compatible changes, such as dropping or renaming columns, aren't supported and will cause the next sync to fail. ## Iterate a BigQuery Data Source¶ To iterate a BigQuery Data Source, use the Tinybird CLI and the version control integration to handle your resources. You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch. ### Add a new BigQuery Data Source¶ You can add a new Data Source directly with the UI or the CLI tool, following [the load of a BigQuery table section](https://www.tinybird.co/docs/about:blank#load-a-BigQuery-table). When adding a Data Source in a Tinybird Branch, it will work for testing purposes, but won't have any connection details internally. You must add the connection and BigQuery configuration in the .datasource Datafile when moving to production. To add a new Data Source using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples). ### Update a Data Source¶ - BigQuery Data Sources can't be modified directly from UI - When you create a new Tinybird Branch, the existing BigQuery Data Sources won't be connected. You need to re-create them in the Branch. - In Branches, it's usually useful to work with[ fixtures](https://www.tinybird.co/docs/docs/work-with-data/strategies/implementing-test-strategies#fixture-tests) , as they'll be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota consume from external services. BigQuery Data Sources can be modified from the CLI tool: tb auth # modify the .datasource Datafile with your editor tb push --force {datafile} # check the command output for errors To update it using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples). ### Delete a Data Source¶ BigQuery Data Sources can be deleted directly from UI or CLI like any other Data Source. To delete it using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples). ## Logs¶ Job executions are logged in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) . This log can be checked directly in the Data Source view page in the UI. Filter by `datasource_id` to monitor ingestion through the BigQuery Connector from the `datasources_ops_log`: SELECT timestamp, event_type, result, error, job_id FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'replace' ORDER BY timestamp DESC ## Limits¶ See [BigQuery Connector limits](https://www.tinybird.co/docs/docs/get-started/plans/limits#bigquery-connector-limits). --- URL: https://www.tinybird.co/docs/get-data-in/connectors/confluent Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Confluent Connector · Tinybird Docs" theme-color: "#171612" description: "Use the Confluent Connector to bring data from Confluent to Tinybird." --- # Confluent Connector¶ Use the Confluent Connector to bring data from your existing Confluent Cloud cluster into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST API Endpoints and query using SQL. The Confluent Connector is fully managed and requires no additional tooling. Connect Tinybird to your Confluent Cloud cluster, select a topic, and Tinybird automatically begins consuming messages from Confluent Cloud. ## Prerequisites¶ You need to grant `READ` permissions to both the Topic and the Consumer Group to ingest data from Confluent into Tinybird. The Confluent Cloud Schema Registry is only supported for decoding Avro messages. When using Confluent Schema Registry, the Schema name must match the Topic name. For example, if you're ingesting the Kafka Topic `my-kafka-topic` using a Connector with Schema Registry enabled, it expects to find a Schema named `my-kafka-topic-value`. ## Create the Data Source using the UI¶ To connect Tinybird to your Confluent Cloud cluster, select **Create new (+)** next to the data project section, select **Data Source** , and then select **Confluent** from the list of available Data Sources. Enter the following details: - ** Connection name** : A name for the Confluent Cloud connection in Tinybird. - ** Bootstrap Server** : The comma-separated list of bootstrap servers, including port numbers. - ** Key** : The key component of the Confluent Cloud API Key. - ** Secret** : The secret component of the Confluent Cloud API Key. - ** Decode Avro messages with schema registry** : (Optional) Turn on Schema Registry support to decode Avro messages. Enter the Schema Registry URL, username, and password. After you've entered the details, select **Connect** . This creates the connection between Tinybird and Confluent Cloud. A list of your existing topics appears and you can select the topic to consume from. Tinybird creates a **Group ID** that specifies the name of the consumer group that this Kafka consumer belongs to. You can customize the Group ID, but ensure that your Group ID has **Read** permissions to the topic. After you've chosen a topic, you can select the starting offset to consume from. You can consume from the earliest offset or the latest offset: - If you consume from the earliest offset, Tinybird consumes all messages from the beginning of the topic. - If you consume from the latest offset, Tinybird only consumes messages that are produced after the connection is created. After selecting the offset, select **Next** . Tinybird consumes a sample of messages from the topic and displays the schema. You can adjust the schema and Data Source settings as needed, then select **Create Data Source**. Tinybird begins consuming messages from the topic and loading them into the Data Source. ## Configure the connector using .datasource files¶ If you are managing your Tinybird resources in files, there are several settings available to configure the Confluent Connector in .datasource files. See the [datafiles docs](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda) for more information. The following is an example of Kafka .datasource file for an already existing connection: ##### Example data source for Confluent Connector SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String `__headers` Map(String,String) ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" # Connection is already available. If you # need to create one, add the required fields # on an include file with the details. KAFKA_CONNECTION_NAME my_connection_name KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id KAFKA_STORE_HEADERS true ### Columns of the Data Source¶ When you connect a Kafka producer to Tinybird, Tinybird consumes optional metadata columns from that Kafka record and writes them to the Data Source. The following fields represent the raw data received from Kafka: - `__value` : A String representing the entire unparsed Kafka record inserted. - `__topic` : The Kafka topic that the message belongs to. - `__partition` : The kafka partition that the message belongs to. - `__offset` : The Kafka offset of the message. - `__timestamp` : The timestamp stored in the Kafka message received by Tinybird. - `__key` : The key of the kafka message. - `__headers` : Headers parsed from the incoming topic messages. See[ Using custom Kafka headers for advanced message processing](https://www.tinybird.co/blog-posts/using-custom-kafka-headers) . Metadata fields are optional. Omit the fields you don't need to reduce your data storage. ### Use INCLUDE to store connection settings¶ To avoid configuring the same connection settings across many files, or to prevent leaking sensitive information, you can store connection details in an external file and use `INCLUDE` to import them into one or more .datasource files. You can find more information about `INCLUDE` in the [Advanced Templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) documentation. For example, you might have two Confluent Cloud .datasource files, which re-use the same Confluent Cloud connection. You can create an include file which stores the Confluent Cloud connection details. The Tinybird project might use the following structure: ##### Tinybird data project file structure ecommerce_data_project/ datasources/ connections/ my_connector_name.incl my_confluent_datasource.datasource another_datasource.datasource endpoints/ pipes/ Where the file `my_connector_name.incl` has the following content: ##### Include file containing Confluent Cloud connection details KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password And the Confluent Cloud .datasource files look like the following: ##### Data Source using includes for Confluent Cloud connection details SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/my_connection_name.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id When using `tb pull` to pull a Confluent Cloud Data Source using the CLI, the `KAFKA_KEY` and `KAFKA_SECRET` settings aren't included in the file to avoid exposing credentials. ### Internal fields¶ The `__` fields stored in the Kafka datasource represent the raw data received from Kafka: - `__value` : A String representing the whole Kafka record inserted. - `__topic` : The Kafka topic that the message belongs to. - `__partition` : The kafka partition that the message belongs to. - `__offset` : The Kafka offset of the message. - `__timestamp` : The timestamp stored in the Kafka message received by Tinybird. - `__key` : The key of the kafka message. ## Compressed messages¶ Tinybird can consume from Kafka topics where Kafka compression is enabled, as decompressing the message is a standard function of the Kafka Consumer. However, if you compressed the message before passing it through the Kafka Producer, then Tinybird can't do post-Consumer processing to decompress the message. For example, if you compressed a JSON message through gzip and produced it to a Kafka topic as a `bytes` message, it's ingested by Tinybird as `bytes` . If you produced a JSON message to a Kafka topic with the Kafka Producer setting `compression.type=gzip` , while it's stored in Kafka as compressed bytes, it's decoded on ingestion and arrive to Tinybird as JSON. ## Confluent logs¶ You can find global logs in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-datasources-ops-log) . Filter by `datasource_id` to select the correct datasource, and set `event_type` to `append-kafka`. To select all Kafka releated logs in the last day, run the following query: SELECT * FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'append-kafka' AND timestamp > now() - INTERVAL 1 day ORDER BY timestamp DESC If you can't find logs in `datasources_ops_log` , the `kafka_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-kafka-ops-log) contains more detailed logs. Filter by `datasource_id` to select the correct datasource, and use `msg_type` to select the desired log level ( `info`, `warning` , or `error` ). SELECT * FROM tinybird.kafka_ops_log WHERE datasource_id = 't_1234' AND timestamp > now() - interval 1 day AND msg_type IN ['info', 'warning', 'error'] --- URL: https://www.tinybird.co/docs/get-data-in/connectors/dynamodb Last update: 2024-12-18T09:46:02.000Z Content: --- title: "DynamoDB Connector · Tinybird Docs" theme-color: "#171612" description: "Bring your DynamoDB data to Tinybird using the DynamoDB Connector." --- # DynamoDB Connector¶ Use the DynamoDB Connector to ingest historical and change stream data from Amazon DynamoDB to Tinybird. The DynamoDB Connector is fully managed and requires no additional tooling. Connect Tinybird to DynamoDB, select your tables, and Tinybird keeps in sync with DynamoDB. With the DynamoDB Connector you can: - Connect to your DynamoDB tables and start ingesting data in minutes. - Query your DynamoDB data using SQL and enrich it with dimensions from your streaming data, warehouse, or files. - Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security. DynamoDB Connector only works with Workspaces created in AWS Regions. ## Prerequisites¶ - Tinybird CLI version 5.3.0 or higher. See[ the Tinybird CLI quick start](https://www.tinybird.co/docs/docs/cli/quick-start) . - Tinybird CLI[ authenticated with the desired Workspace](https://www.tinybird.co/docs/docs/cli/install) . - DynamoDB Streams is active on the target DynamoDB tables with `NEW_IMAGE` or `NEW_AND_OLD_IMAGE` type. - Point-in-time recovery (PITR) is active on the target DynamoDB table. You can switch the Tinybird CLI to the correct Workspace using `tb workspace use `. Supported characters for column names are letters, numbers, underscores, and dashes. Tinybird automatically sanitizes invalid characters like dots or dollar signs. ## Required permissions¶ The DynamoDB Connector requires certain permissions to access your tables. The IAM Role needs the following permissions: - `dynamodb:Scan` - `dynamodb:DescribeStream` - `dynamodb:DescribeExport` - `dynamodb:GetRecords` - `dynamodb:GetShardIterator` - `dynamodb:DescribeTable` - `dynamodb:DescribeContinuousBackups` - `dynamodb:ExportTableToPointInTime` - `dynamodb:UpdateTable` - `dynamodb:UpdateContinuousBackups` The following is an example of AWS Access Policy: When configuring the connector, the UI, CLI and API all provide the necessary policy templates. { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "dynamodb:Scan", "dynamodb:DescribeStream", "dynamodb:DescribeExport", "dynamodb:GetRecords", "dynamodb:GetShardIterator", "dynamodb:DescribeTable", "dynamodb:DescribeContinuousBackups", "dynamodb:ExportTableToPointInTime", "dynamodb:UpdateTable", "dynamodb:UpdateContinuousBackups" ], "Resource": [ "arn:aws:dynamodb:*:*:table/", "arn:aws:dynamodb:*:*:table//stream/*", "arn:aws:dynamodb:*:*:table//export/*" ] }, { "Effect": "Allow", "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"], "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ] } ] } The following is an example trust policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "AWS": "arn:aws:iam::473819111111111:root" }, "Condition": { "StringEquals": { "sts:ExternalId": "ab3caaaa-01aa-4b95-bad3-fff9b2ac789f8a9" } } } ] } ## Load a table using the CLI¶ To load a DynamoDB table into Tinybird using the CLI, create a connection and then a Data Source. The connection grants your Tinybird Workspace the necessary permissions to access AWS and your tables in DynamoDB. The Data Source then maps a table in DynamoDB to a table in Tinybird and manages the historical and continous sync. ### Create the DynamoDB connection¶ The connection grants your Tinybird Workspace the necessary permissions to access AWS and your tables in DynamoDB. To connect, run the following command: tb connection create dynamodb This command initiates the process of creating a connection. When prompted, type `y` to proceed. ### Create a new IAM Policy in AWS¶ The Tinybird CLI provides a policy template. 1. Replace `` with the name of your DynamoDB table.ç 2. Replace `` with the name of the S3 bucket you want to use for the initial load. 3. In AWS, go to** IAM** ,** Policies** ,** Create Policy** . 4. Select the** JSON** tab and paste the modified policy text. 5. Save and create the policy. ### Create a new IAM Role in AWS¶ 1. Return to the Tinybird CLI to get a trust policy template. 2. In AWS, go to** IAM** ,** Roles** ,** Create Role** . 3. Select** Custom Trust Policy** and paste the trust policy copied from the CLI. 4. In the** Permissions** tab, attach the policy created in the previous step. 5. Complete the role creation process. ### Complete the connection¶ In the AWS IAM console, find the role you've created. Copy its Amazon Resource Name (ARN), which looks like `arn:aws:iam::111111111111:role/my-awesome-role`. Provide the following information to Tinybird CLI: - The Role ARN - AWS region of your DynamoDB tables - Connection name Tinybird uses the connection name to identify the connection. The name can only contain AlphaNumeric characters `a-zA-Z` and underscores `_` , and must start with a letter. When the CLI prompts are completed, Tinybird creates the connection. The CLI will generate a `.connection` file in your project directory. This file isn't used and is safe to delete. A future release will allow you to push this file to Tinybird to automate the creation of connections, similar to Kafka connections. ### Create a DynamoDB Data Source file¶ The Data Source maps a table in DynamoDB to a table in Tinybird and manages the historical and continous sync. [Data Source files](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files) contain the table schema, and specific DynamoDB properties to target the table that Tinybird imports. Create a Data Source file called `mytable.datasource` . There are two approaches to defining the schema for a DynamoDB Data Source: 1. Define the Partition Key and Sort Key from your DynamoDB table, and access other properties from JSON at query time. 2. Define all DynamoDB item properties as columns. The Partition Key and Sort Key, if any, from your DynamoDB must be defined in the Data Source schema. These are the only properties that are mandatory to define, as they're used for deduplication of records (upserts and deletes). #### Approach 1: Define only the Partition Key and Sort Key¶ If you don't want to map all properties from your DynamoDB table, you can define only the Partition Key and Sort Keys. The entire DynamoDB item is as JSON in a `_record` column, and you can extract properties using `JSONExtract*` functions. For example, if you have a DynamoDB table with `transaction_id` as the Partition Key, you can define your Data Source schema like this: ##### mytable.datasource SCHEMA > transaction_id String `json:$.transaction_id` IMPORT_SERVICE "dynamodb" IMPORT_CONNECTION_NAME IMPORT_TABLE_ARN IMPORT_EXPORT_BUCKET Replace the `` with the name of the connection created in the first step. Replace `` with the ARN of the table you'd like to import. Replace `` with the name of the S3 bucket you want to use for the initial sync. #### Approach 2: Define all DynamoDB item properties as columns¶ If you want to strictly define all your properties and their types, you can map them into your Data Source as columns. You can map properties to [any of the supported types in Tinybird](https://www.tinybird.co/docs/docs/get-data-in/data-sources#supported-data-types) . Properties can be also arrays of the previously mentioned types, and nullable. Use the nullable type when there are properties that might not have a value in every item within your DynamoDB table. For example, if you have a DynamoDB with items like this: { "timestamp": "2024-07-25T10:46:37.380Z", "transaction_id": "399361d5-10fc-4777-8187-88aaa4623569", "name": "Chris Donnelly", "passport_number": 4904040, "flight_from": "Burien", "flight_to": "Sanford", "airline": "BrianAir" } Where `transaction_id` is the partition key, you can define your Data Source schema like this: ##### mytable.datasource SCHEMA > `timestamp` DateTime64(3) `json:$.timestamp`, `transaction_id` String `json:$.transaction_id`, `name` String `json:$.name`, `passport_number` Int64 `json:$.passport_number`, `flight_from` String `json:$.flight_from`, `flight_to` String `json:$.flight_to`, `airline` String `json:$.airline` IMPORT_SERVICE "dynamodb" IMPORT_CONNECTION_NAME IMPORT_TABLE_ARN IMPORT_EXPORT_BUCKET Replace `` with the name of the connection created in the first step. Replace `` with the ARN of the table you'd like to import. Replace `` with the name of the S3 bucket you want to use for the initial sync. You can map properties with basic types (String, Number, Boolean, Binary, String Set, Number Set) at the root item level. Follow this schema definition pattern: `json:$.` - `PropertyName` is the name of the column within your Tinybird Data Source. - `PropertyType` is the type of the column within your Tinybird Data Source. It must match the type in the DynamoDB Data Source: - Strings correspond to `String` columns. - All `Int` , `UInt` , or `Float` variants correspond to `Number` columns. - `Array(String)` corresponds to `String Set` columns. - `Array(UInt)` and all numeric variants correspond to `Number Set` columns. - `PropertyNameInDDB` is the name of the property in your DynamoDB table. It must match the letter casing. Map properties within complex types, like `Maps` , using JSONPaths. For example, you can map a property at the first level in your Data Source schema like: MyString String `json:$..`. For `Lists` , standalone column mapping isn't supported. Those properties require extraction using `JSONExtract*` functions or consumed after a transformation with a Materialized View. ### Push the Data Source¶ With your connection created and Data Source defined, push your Data Source to Tinybird using `tb push`. For example, if your Data Source file is `mytable.datasource` , run: tb push mytable.datasource Due to how Point-in-time recovery works, data might take some minutes before it appears in Tinybird. This delay only happens the first time Tinybird retrieves the table. ## Load a table using the UI¶ To load a DynamoDB table into Tinybird using the UI, select the DynamoDB option in the Data Source dialog. You need an existing connection to your DynamoDB table. The UI guides you through the process of creating a connection and finally creating a Data Source that imports the data from your DynamoDB table. ### Create a DynamoDB connection¶ When you create a connection, provide the following information: - AWS region of your DynamoDB tables - ARN of the table you want to import - Name of the S3 bucket you want to use for the initial sync. In the next step, provide the ARN of the IAM Role you created in AWS. This role must have the necessary permissions to access your DynamoDB tables and S3 bucket. ### Create a Data Source¶ After you've created the connection, a preview of the imported data appears. You can change the schema columns, the sorting key, or the TTL. Due to the schemaless nature of DynamoDB, the preview might not show all the columns in your table. You can manually add columns to the schema in the **Code Editor** tab. When you're ready, select **Create Data Source**. Due to how Point-in-time recovery works, data might take some minutes before it appears in Tinybird. This delay only happens the first time Tinybird retrieves the table. ## Columns added by Tinybird¶ When loading a DynamoDB table, Tinybird automatically adds the following columns: | Column | Type | Description | | --- | --- | --- | | `_record` | `Nullable(String)` | Contents of the event, in JSON format. Added to `NEW_IMAGES` and `NEW_AND_OLD_IMAGES` streams. | | `_old_record` | `Nullable(String)` | Stores the previous state of the record. Added to `NEW_AND_OLD_IMAGES` streams. | | `_timestamp` | `DateTime64(3)` | Date and time of the event. | | `_event_type` | `LowCardinality(String)` | Type of the event. | | `_is_deleted` | `UInt8` | Whether the record has been deleted. | If an existing table with stream type `NEW_AND_OLD_IMAGES` is missing the `_old_record` column, add it manually with the following configuration: `_old_record` Nullable(String) `json:$.OldImage`. ## Iterate a Data Source¶ To iterate a DynamoDB Data Source, use the Tinybird CLI and the [version control integration](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/working-with-version-control) to handle your resources. You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch. DynamoDB Data Sources created in a Branch aren't connected to your source. AWS DynamoDB documentation discourages [reading the same Stream from various processes](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html#Streams.Processing) , because it can result in throttling. This can affect the ingestion in the main Branch. Browse the [use case examples](https://github.com/tinybirdco/use-case-examples) repository to find basic instructions and examples to handle DynamoDB Data Sources iteration using git integration. ### Add a new DynamoDB Data Source¶ You can add a new Data Source directly with the Tinybird CLI. See [load of a DynamoDB table](https://www.tinybird.co/docs/about:blank#load-a-table-using-the-cli). To add a new Data Source using the recommended version control workflow, see the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_dynamodb). When you add a Data Source to a Tinybird Branch, it doesn't have any connection details. You must add the Connection and DynamoDB configuration in the .datasource Datafile when moving to a production environment or Branch. ### Update a Data Source¶ You can modify DynamoDB Data Sources using Tinybird CLI. For example: tb auth # modify the .datasource Datafile with your editor tb push --force {datafile} # check the command output for errors When updating an existent DynamoDB Data Source, the first sync isn't repeated, only the new item modifications are synchronized by the CDC process. To update a Data Source using the recommended version control workflow, see the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_dynamodb). In Branches, work with [fixtures](https://www.tinybird.co/docs/docs/work-with-data/strategies/implementing-test-strategies#fixture-tests) , as they're be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota usage from external services. ### Delete a Data Source¶ You can delete DynamoDB Data Sources like any other Data Source. To delete it using the recommended version control workflow, see the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_dynamodb). ## DynamoDB logs¶ You can find DynamoDB logs in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-datasources-ops-log) . Filter by `datasource_id` to select the correct Data Source. Use `event_type` to select between initial synchronization logs ( `sync-dynamodb` ), or update logs ( `append-dynamodb` ). To select all DynamoDB related logs in the last day, run the following query: SELECT * FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type in ['sync-dynamodb', 'append-dynamodb'] AND timestamp > now() - INTERVAL 1 day ORDER BY timestamp DESC ## Connector architecture¶ AWS provides two free, default functions for DynamoDB: - DynamoDB Streams captures change events for a given DynamoDB table and provides an API to access events as a stream. This allows CDC-like access to the table for continuous updates. - You can use Point-in-time recovery (PITR) to take snapshots of your DynamoDB table and save the export to S3. This allows historical access to table data for batch uploads. The DynamoDB Connector uses the following functions to send DynamoDB data to Tinybird: <-figure-> ![Connecting DynamoDB to Tinybird architecture](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fassets%2Fguides%2Fingest-from-dynamodb%2Ftinybird-dynamodb-connector-arch.png&w=3840&q=75) <-figcaption-> Connecting DynamoDB to Tinybird architecture ## Schema evolution¶ The DynamoDB Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in DynamoDB, the next sync job automatically adds it to the Tinybird Data Source. Non-backwards compatible changes, such as dropping or renaming columns, aren't supported by default and might cause the next sync to fail. ## Considerations on queries¶ The DynamoDB Connector uses the ReplacingMergeTree engine to remove duplicate entries with the same sorting key. Deduplication occurs during a merge, which happens at an unknown time in the background. Doing `SELECT * FROM ddb_ds` might yield duplicated rows after an insertion. To account for this, force the merge at query time by adding `FINAL` to the query. For example, `SELECT * FROM ddb_ds FINAL` . Adding `FINAL` also filters out the rows where `_is_deleted = 1`. ## Override sort and partition keys¶ The DynamoDB Connector automatically sets values for the Sorting Key and the Partition Key properties based on the source DynamoDB table. You might want to override the default values to fit your needs. To override Sorting and Partition key values, open your .datasource file and edit the values for `ENGINE_PARTITION_KEY` and `ENGINE_SORTING_KEY` . For the Sorting key, you must append the additional columns and leave `pk` and `sk` in place. For example: ENGINE "ReplacingMergeTree" ENGINE_PARTITION_KEY "toYYYYMM(toDateTime64(_timestamp, 3))" ENGINE_SORTING_KEY "pk, sk, " ENGINE_VER "_timestamp" Sorting key is used for deduplication of data. When adding columns to `ENGINE_SORTING_KEY` , make sure they contain the same value across record changes. You can then push the new .datasource configuration using `tb push`: tb push updated-ddb.datasource Don't edit the values for `ENGINE` or `ENGINE_VER` . The DynamoDB Connector requires the ReplacingMergeTree engine and a version based on the timestamp. ## Limits¶ See [DynamoDB Connector limits](https://www.tinybird.co/docs/docs/get-started/plans/limits#dynamodb-connector-limits). --- URL: https://www.tinybird.co/docs/get-data-in/connectors/kafka Last update: 2024-12-18T09:46:02.000Z Content: --- title: "Kafka Connector · Tinybird Docs" theme-color: "#171612" description: "Documentation for the Tinybird Kafka Connector" --- # Kafka Connector¶ Use the Kafka Connector to ingest data streams from your Kafka cluster into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs. The Kafka Connector is fully managed and requires no additional tooling. Connect Tinybird to your Kafka cluster, select a topic, and Tinybird automatically begins consuming messages from Kafka. You can transform or enrich your Kafka topics with JOINs using serverless Data Pipes. Auth tokens control access to API endpoints. Secure connections through AWS PrivateLink or Multi-VPC for MSK are available for Enterprise customers on an Enterprise plan. Reach out to support@tinybird.co for more information. ## Prerequisites¶ Grant `READ` permissions to both the Topic and the Consumer Group to ingest data from Kafka into Tinybird. You must secure your Kafka brokers with SSL/TLS and SASL. Tinybird uses `SASL_SSL` as the security protocol for the Kafka consumer. Connections are rejected if the brokers only support `PLAINTEXT` or `SASL_PLAINTEXT`. Kafka Schema Registry is supported only for decoding Avro messages. ## Add a Kafka connection¶ You can create a connection to Kafka using the Tinybird CLI or the UI. ### Using the CLI¶ Run the following commands to add a Kafka connection: tb auth # Use the main Workspace admin Token tb connection create kafka --bootstrap-servers --key --secret --connection-name --ssl-ca-pem ### Using the UI¶ Follow these steps to add a new connection using the UI: 1. Go to** Data Project** . 2. Select the** +** icon, then select** Data Source** . 3. Select** Kafka** . 4. Follow the steps to configure the connection. ### Add a CA certificate¶ You can add a CA certificate in PEM format when configuring your Kafka connection from the UI. Tinybird checks the certificate for issues before creating the connection. To add a CA certificate using the Tinybird CLI, pass the `--ssl-ca-pem ` argument to `tb connection create` , where `` is the location or value of the CA certificate. CA certificates don't work with Kafka Sinks and Streaming Queries. #### Aiven Kafka¶ Aiven for Apache Kafka service instances expose multiple SASL ports with two different kinds of SASL certificates: Private CA (self-signed) and Public CA, signed by Let's Encrypt. If you are using the Public CA port, you can connect to Aiven Kafka without any additional configuration. However, if you are using the Private CA port, you need to provide the CA certificate by pointing to the path of the CA certificate file using the `KAFKA_SSL_CA_PEM` setting. ## Update a Kafka connection¶ You can update your credentials or cluster details only from the Tinybird UI. Follow these steps: 1. Go to** Data Project** , select the** +** icon, then select** Data Source** . 2. Select** Kafka** and then the connection you want to edit or delete using the three-dots menu. Any Data Source that depends on this connection is affected by updates. ## Use .datasource files¶ You can configure the Kafka Connector using .datasource files. See the [datafiles documentation](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda). The following is an example of Kafka .datasource file for an already existing connection: SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String `__headers` Map(String,String) ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" # Connection is already available. If you # need to create one, add the required fields # on an include file with the details. KAFKA_CONNECTION_NAME my_connection_name KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id KAFKA_STORE_HEADERS true To add connection details in an INCLUDE file, see [Use INCLUDE to store connection settings](https://www.tinybird.co/docs/about:blank#use-include-to-store-connection-settings). ### Columns of the Data Source¶ When you connect a Kafka producer to Tinybird, Tinybird consumes optional metadata columns from that Kafka record and writes them to the Data Source. The following fields represent the raw data received from Kafka: - `__value` : A String representing the entire unparsed Kafka record inserted. - `__topic` : The Kafka topic that the message belongs to. - `__partition` : The kafka partition that the message belongs to. - `__offset` : The Kafka offset of the message. - `__timestamp` : The timestamp stored in the Kafka message received by Tinybird. - `__key` : The key of the kafka message. - `__headers` : Headers parsed from the incoming topic messages. See[ Using custom Kafka headers for advanced message processing](https://www.tinybird.co/blog-posts/using-custom-kafka-headers) . Metadata fields are optional. Omit the fields you don't need to reduce your data storage. ### Use INCLUDE to store connection settings¶ To avoid configuring the same connection settings across many files, or to prevent leaking sensitive information, you can store connection details in an external file and use `INCLUDE` to import them into one or more .datasource files. You can find more information about `INCLUDE` in the [Advanced Templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) documentation. For example, you might have two Kafka .datasource files that reuse the same Kafka connection. You can create an include file which stores the Kafka connection details. The Tinybird project would use the following structure: ecommerce_data_project/ datasources/ connections/ my_connector_name.incl ca.pem # CA certificate (optional) my_kafka_datasource.datasource another_datasource.datasource endpoints/ pipes/ Where the file `my_connector_name.incl` has the following content: KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password KAFKA_SSL_CA_PEM ca.pem # CA certificate (optional) And the Kafka .datasource files look like the following: SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/my_connection_name.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id When using `tb pull` to pull a Kafka Data Source using the CLI, the `KAFKA_KEY`, `KAFKA_SECRET`, `KAFKA_SASL_MECHANISM` and `KAFKA_SSL_CA_PEM` settings aren't included in the file to avoid exposing credentials. ## Iterate a Kafka Data Source¶ The following instructions use Branches. Be sure you're familiar with the behavior of Branches in Tinybird when using the Kafka Connector - see [Prerequisites](https://www.tinybird.co/docs/about:blank#prerequisites). Use Branches to test different Kafka connections and settings. See [Branches](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/branches). Connections created using the UI are created in the main Workspace. so if you create a new Branch from a Workspace with existing Kafka Data Sources, the Branch Data Sources don't receive that streaming data automatically. Use the CLI to recreate the Kafka Data Source. ### Update a Kafka Data Source¶ When you create a Branch that has existing Kafka Data Sources, the Data Sources in the Branch aren't connected to Kafka. Therefore, if you want to update the schema, you need to recreate the Kafka Data Source in the Branch. In branches, Tinybird automatically appends `_{BRANCH}` to the Kafka group ID to prevent collisions. It also forces the consumers in Branches to always consume the `latest` messages, to reduce the performance impact. ### Add a new Kafka Data Source¶ To create and test a Kafka Data Source in a Branch, start by using an existing connection. You can create and use existing connections from the Branch using the UI: these connections are always created in the main Workspace. You can create a Kafka Data Source in a Branch as in production. This Data Source doesn't have any connection details internally, so you it's useful for testing purposes. Define the connection in the .datafile and Kafka parameters that are used in production. To move the Data Source to production, include the connection settings in the Data Source .datafile, as explained in the [.datafiles documentation](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda). ### Delete a Kafka Data Source¶ If you've created a Data Source in a Branch, the Data Source is active until the Data Source is removed from the Branch or when the entire Branch is removed. If you delete an existing Kafka Data Source in a Branch, it isn't deleted in the main Workspace. To delete a Kafka Data Source, do it against the main Workspace. You can also use the CLI and include it in the CI/CD workflows as necessary. ## Kafka logs¶ You can find global logs in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-datasources-ops-log) . Filter by `datasource_id` to select the correct datasource, and set `event_type` to `append-kafka`. To select all Kafka releated logs in the last day, run the following query: SELECT * FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'append-kafka' AND timestamp > now() - INTERVAL 1 day ORDER BY timestamp DESC If you can't find logs in `datasources_ops_log` , the `kafka_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-kafka-ops-log) contains more detailed logs. Filter by `datasource_id` to select the correct datasource, and use `msg_type` to select the desired log level ( `info`, `warning` , or `error` ). SELECT * FROM tinybird.kafka_ops_log WHERE datasource_id = 't_1234' AND timestamp > now() - interval 1 day AND msg_type IN ['info', 'warning', 'error'] ## Limits¶ The limits for the Kafka connector are: - Minimum flush time: 4 seconds - Throughput (uncompressed): 20MB/s - Up to 3 connections per Workspace If you're regularly hitting these limits, contact [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) for support. ## Troubleshooting¶ ### If you aren't receiving data¶ When Kafka commits a message for a topic and a group id, it always sends data from the latest committed offset. In Tinybird, each Kafka Data Source receives data from a topic and uses a group id. The combination of `topic` and `group id` must be unique. If you remove a Kafka Data Source and you recreate it again with the same settings after having received data, you' only get data from the latest committed offset, even if `KAFKA_AUTO_OFFSET_RESET` is set to `earliest`. This happens both in the main Workspace and in Branches, if you're using them, because connections are always created in the main Workspace and are shared across Branches. Recommended next steps: - Use always a different group id when testing Kafka Data Sources. - Check in the `tinybird.datasources_ops_log`[ Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) to see global errors. - Check in the `tinybird.kafka_ops_log`[ Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) to see if you've already used a group id to ingest data from a topic. ### Compressed messages¶ Tinybird can consume from Kafka topics where Kafka compression is enabled, as decompressing the message is a standard function of the Kafka Consumer. If you compressed the message before passing it through the Kafka Producer, Tinybird can't do post-Consumer processing to decompress the message. For example, if you compressed a JSON message through gzip and produced it to a Kafka topic as a `bytes` message, it would be ingested by Tinybird as `bytes` . If you produced a JSON message to a Kafka topic with the Kafka Producer setting `compression.type=gzip` , while it would be stored in Kafka as compressed bytes, it would be decoded on ingestion and arrive to Tinybird as JSON. --- URL: https://www.tinybird.co/docs/get-data-in/connectors/msk Last update: 2024-12-18T09:46:02.000Z Content: --- title: "Amazon MSK · Tinybird Docs" theme-color: "#171612" description: "Get Amazon MSK data into Tinybird." --- # Amazon MSK¶ Use the Kafka Connector to ingest data streams from Amazon Managed Streaming for Apache Kafka (MSK) into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs. The Kafka Connector is fully managed and requires no additional tooling. Connect Tinybird to your Kafka cluster, select a topic, and Tinybird automatically begins consuming messages from Kafka. You can transform or enrich your Kafka topics with JOINs using serverless Data Pipes. Auth tokens control access to API endpoints. Secure connections through AWS PrivateLink or Multi-VPC for MSK are available for Enterprise customers on a Dedicated infrastructure plan. Reach out to support@tinybird.co for more information. ## Prerequisites¶ Grant `READ` permissions to both the Topic and the Consumer Group to ingest data from Kafka into Tinybird. You must secure your Kafka brokers with SSL/TLS and SASL. Tinybird uses `SASL_SSL` as the security protocol for the Kafka consumer. Connections are rejected if the brokers only support `PLAINTEXT` or `SASL_PLAINTEXT`. Kafka Schema Registry is supported only for decoding Avro messages. ## Add a Kafka connection¶ You can create a connection to Kafka using the Tinybird CLI or the UI. ### Using the CLI¶ Run the following commands to add a Kafka connection: ##### Adding a Kafka connection in the main Workspace tb auth # Use the main Workspace admin Token tb connection create kafka --bootstrap-servers --key --secret --connection-name --ssl-ca-pem ### Using the UI¶ Follow these steps to add a new connection using the UI: 1. Go to** Data Project** . 2. Select the** +** icon, then select** Data Source** . 3. Select** Kafka** . 4. Follow the steps to configure the connection. ### Add a CA certificate¶ You can add a CA certificate in PEM format when configuring your Kafka connection from the UI. Tinybird checks the certificate for issues before creating the connection. To add a CA certificate using the Tinybird CLI, pass the `--ssl-ca-pem ` argument to `tb connection create` , where `` is the location or value of the CA certificate. CA certificates don't work with Kafka Sinks and Streaming Queries. ## Update a Kafka connection¶ You can update your credentials or cluster details only from the Tinybird UI. Follow these steps: 1. Go to** Data Project** , select the** +** icon, then select** Data Source** . 2. Select** Kafka** and then the connection you want to edit or delete using the three-dots menu. Any Data Source that depends on this connection is affected by updates. ## Use .datasource files¶ You can configure the Kafka Connector using .datasource files. See the [datafiles documentation](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda). The following is an example of Kafka .datasource file for an already existing connection: ##### Example data source for Kafka Connector SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String `__headers` Map(String,String) ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" # Connection is already available. If you # need to create one, add the required fields # on an include file with the details. KAFKA_CONNECTION_NAME my_connection_name KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id KAFKA_STORE_HEADERS true To add connection details in an INCLUDE file, see [Use INCLUDE to store connection settings](https://www.tinybird.co/docs/about:blank#use-include-to-store-connection-settings). ### Columns of the Data Source¶ When you connect a Kafka producer to Tinybird, Tinybird consumes optional metadata columns from that Kafka record and writes them to the Data Source. The following fields represent the raw data received from Kafka: - `__value` : A String representing the entire unparsed Kafka record inserted. - `__topic` : The Kafka topic that the message belongs to. - `__partition` : The kafka partition that the message belongs to. - `__offset` : The Kafka offset of the message. - `__timestamp` : The timestamp stored in the Kafka message received by Tinybird. - `__key` : The key of the kafka message. - `__headers` : Headers parsed from the incoming topic messages. See[ Using custom Kafka headers for advanced message processing](https://www.tinybird.co/blog-posts/using-custom-kafka-headers) . Metadata fields are optional. Omit the fields you don't need to reduce your data storage. ### Use INCLUDE to store connection settings¶ To avoid configuring the same connection settings across many files, or to prevent leaking sensitive information, you can store connection details in an external file and use `INCLUDE` to import them into one or more .datasource files. You can find more information about `INCLUDE` in the [Advanced Templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) documentation. For example, you might have two Kafka .datasource files that reuse the same Kafka connection. You can create an include file which stores the Kafka connection details. The Tinybird project would use the following structure: ##### Tinybird data project file structure ecommerce_data_project/ datasources/ connections/ my_connector_name.incl ca.pem # CA certificate (optional) my_kafka_datasource.datasource another_datasource.datasource endpoints/ pipes/ Where the file `my_connector_name.incl` has the following content: ##### Include file containing Kafka connection details KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password KAFKA_SSL_CA_PEM ca.pem # CA certificate (optional) And the Kafka .datasource files look like the following: ##### Data Source using includes for Kafka connection details SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/my_connection_name.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id When using `tb pull` to pull a Kafka Data Source using the CLI, the `KAFKA_KEY`, `KAFKA_SECRET`, `KAFKA_SASL_MECHANISM` and `KAFKA_SSL_CA_PEM` settings aren't included in the file to avoid exposing credentials. ## Iterate a Kafka Data Source¶ The following instructions use Branches. Be sure you're familiar with the behavior of Branches in Tinybird when using the Kafka Connector - see [Prerequisites](https://www.tinybird.co/docs/about:blank#prerequisites). Use Branches to test different Kafka connections and settings. See [Branches](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/branches). Connections created using the UI are created in the main Workspace. so if you create a new Branch from a Workspace with existing Kafka Data Sources, the Branch Data Sources don't receive that streaming data automatically. Use the CLI to recreate the Kafka Data Source. ### Update a Kafka Data Source¶ When you create a Branch that has existing Kafka Data Sources, the Data Sources in the Branch aren't connected to Kafka. Therefore, if you want to update the schema, you need to recreate the Kafka Data Source in the Branch. In branches, Tinybird automatically appends `_{BRANCH}` to the Kafka group ID to prevent collisions. It also forces the consumers in Branches to always consume the `latest` messages, to reduce the performance impact. ### Add a new Kafka Data Source¶ To create and test a Kafka Data Source in a Branch, start by using an existing connection. You can create and use existing connections from the Branch using the UI: these connections are always created in the main Workspace. You can create a Kafka Data Source in a Branch as in production. This Data Source doesn't have any connection details internally, so you it's useful for testing purposes. Define the connection in the .datafile and Kafka parameters that are used in production. To move the Data Source to production, include the connection settings in the Data Source .datafile, as explained in the [.datafiles documentation](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda). ### Delete a Kafka Data Source¶ If you've created a Data Source in a Branch, the Data Source is active until the Data Source is removed from the Branch or when the entire Branch is removed. If you delete an existing Kafka Data Source in a Branch, it isn't deleted in the main Workspace. To delete a Kafka Data Source, do it against the main Workspace. You can also use the CLI and include it in the CI/CD workflows as necessary. ## MSK logs¶ You can find global logs in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-datasources-ops-log) . Filter by `datasource_id` to select the correct datasource, and set `event_type` to `append-kafka`. To select all Kafka releated logs in the last day, run the following query: SELECT * FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'append-kafka' AND timestamp > now() - INTERVAL 1 day ORDER BY timestamp DESC If you can't find logs in `datasources_ops_log` , the `kafka_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-kafka-ops-log) contains more detailed logs. Filter by `datasource_id` to select the correct datasource, and use `msg_type` to select the desired log level ( `info`, `warning` , or `error` ). SELECT * FROM tinybird.kafka_ops_log WHERE datasource_id = 't_1234' AND timestamp > now() - interval 1 day AND msg_type IN ['info', 'warning', 'error'] ## Limits¶ The limits for the Kafka connector are: - Minimum flush time: 4 seconds - Throughput (uncompressed): 20MB/s - Up to 3 connections per Workspace If you're regularly hitting these limits, contact [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) for support. ## Troubleshooting¶ ### If you aren't receiving data¶ When Kafka commits a message for a topic and a group id, it always sends data from the latest committed offset. In Tinybird, each Kafka Data Source receives data from a topic and uses a group id. The combination of `topic` and `group id` must be unique. If you remove a Kafka Data Source and you recreate it again with the same settings after having received data, you' only get data from the latest committed offset, even if `KAFKA_AUTO_OFFSET_RESET` is set to `earliest`. This happens both in the main Workspace and in Branches, if you're using them, because connections are always created in the main Workspace and are shared across Branches. Recommended next steps: - Use always a different group id when testing Kafka Data Sources. - Check in the `tinybird.kafka_ops_log`[ Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) to see if you've already used a group id to ingest data from a topic. ### Compressed messages¶ Tinybird can consume from Kafka topics where Kafka compression is enabled, as decompressing the message is a standard function of the Kafka Consumer. If you compressed the message before passing it through the Kafka Producer, Tinybird can't do post-Consumer processing to decompress the message. For example, if you compressed a JSON message through gzip and produced it to a Kafka topic as a `bytes` message, it would be ingested by Tinybird as `bytes` . If you produced a JSON message to a Kafka topic with the Kafka Producer setting `compression.type=gzip` , while it would be stored in Kafka as compressed bytes, it would be decoded on ingestion and arrive to Tinybird as JSON. --- URL: https://www.tinybird.co/docs/get-data-in/connectors/redpanda Last update: 2025-01-12T22:34:39.000Z Content: --- title: "Redpanda Connector · Tinybird Docs" theme-color: "#171612" description: "Documentation for the Tinybird Redpanda Connector" --- # Redpanda Connector¶ The Redpanda Connector allows you to ingest data from your existing Redpanda cluster and load it into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs. The Redpanda Connector is fully managed and requires no additional tooling. Connect Tinybird to your Redpanda cluster, choose a topic, and Tinybird will automatically begin consuming messages from Redpanda. The Redpanda Connector is: - ** Easy to use** . Connect to your Redpanda cluster in seconds. Choose your topics, define your schema, and ingest millions of events per second into a fully-managed OLAP. - ** SQL-based** . Using nothing but SQL, query your Redpanda data and enrich it with dimensions from your database, warehouse, or files. - ** Secure** . Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security. Note that you need to grant READ permissions to both the Topic and the Consumer Group to ingest data from Redpanda into Tinybird. ## Using the UI¶ To connect Tinybird to your Redpanda cluster, click the `+` icon next to the data project section on the left navigation menu, select **Data Source** , and select **Redpanda** from the list of available Data Sources. Enter the following details: - ** Connection name** : A name for the Redpanda connection in Tinybird. - ** Bootstrap Server** : The comma-separated list of bootstrap servers (including Port numbers). - ** Key** : The** Key** component of the Redpanda API Key. - ** Secret** : The** Secret** component of the Redpanda API Key. - ** Decode Avro messages with schema registry** : Optionally, you can enable Schema Registry support to decode Avro messages. You will be prompted to enter the Schema Registry URL, username and password. Once you have entered the details, select **Connect** . This creates the connection between Tinybird and Redpanda. You will then see a list of your existing topics and can select the topic to consume from. Tinybird will create a **Group ID** that specifies the name of the consumer group this consumer belongs to. You can customize the Group ID, but ensure that your Group ID has **read** permissions to the topic. Once you have chosen a topic, you can select the starting offset to consume from. You can choose to consume from the **latest** offset or the **earliest** offset. If you choose to consume from the earliest offset, Tinybird will consume all messages from the beginning of the topic. If you choose to consume from the latest offset, Tinybird will only consume messages that are produced after the connection is created. Select the offset, and click **Next**. Tinybird will then consume a sample of messages from the topic and display the schema. You can adjust the schema and Data Source settings as needed, then click **Create Data Source** to create the Data Source. Tinybird will now begin consuming messages from the topic and loading them into the Data Source. ## Using .datasource files¶ If you are managing your Tinybird resources in files, there are several settings available to configure the Redpanda Connector in .datasource files. See the [datafiles docs](https://www.tinybird.co/docs/docs/cli/datafiles/datasource-files#kafka-confluent-redpanda) for more information. The following is an example of Kafka .datasource file for an already existing connection: ##### Example data source for Redpanda Connector SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String `__headers` Map(String,String) ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" # Connection is already available. If you # need to create one, add the required fields # on an include file with the details. KAFKA_CONNECTION_NAME my_connection_name KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id KAFKA_STORE_HEADERS true ### Columns of the Data Source¶ When you connect a Kafka producer to Tinybird, Tinybird consumes optional metadata columns from that Kafka record and writes them to the Data Source. The following fields represent the raw data received from Kafka: - `__value` : A String representing the entire unparsed Kafka record inserted. - `__topic` : The Kafka topic that the message belongs to. - `__partition` : The kafka partition that the message belongs to. - `__offset` : The Kafka offset of the message. - `__timestamp` : The timestamp stored in the Kafka message received by Tinybird. - `__key` : The key of the kafka message. - `__headers` : Headers parsed from the incoming topic messages. See[ Using custom Kafka headers for advanced message processing](https://www.tinybird.co/blog-posts/using-custom-kafka-headers) . Metadata fields are optional. Omit the fields you don't need to reduce your data storage. ### Using INCLUDE to store connection settings¶ To avoid configuring the same connection settings across many files, or to prevent leaking sensitive information, you can store connection details in an external file and use `INCLUDE` to import them into one or more .datasource files. You can find more information about `INCLUDE` in the [Advanced Templates](https://www.tinybird.co/docs/docs/cli/advanced-templates) documentation. As an example, you may have two Redpanda .datasource files, which re-use the same Redpanda connection. You can create an INCLUDE file that stores the Redpanda connection details. The Tinybird project may use the following structure: ##### Tinybird data project file structure ecommerce_data_project/ ├── datasources/ │ └── connections/ │ └── my_connector_name.incl │ └── my_kafka_datasource.datasource │ └── another_datasource.datasource ├── endpoints/ ├── pipes/ Where the file `my_connector_name.incl` has the following content: ##### Include file containing Redpanda connection details KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password And the Redpanda .datasource files look like the following: ##### Data Source using includes for Redpanda connection details SCHEMA > `value` String, `topic` LowCardinality(String), `partition` Int16, `offset` Int64, `timestamp` DateTime, `key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/my_connection_name.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id When using `tb pull` to pull a Redpanda Data Source using the CLI, the `KAFKA_KEY` and `KAFKA_SECRET` settings will **not** be included in the file to avoid exposing credentials. ## Redpanda logs¶ You can find global logs in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-datasources-ops-log) . Filter by `datasource_id` to select the correct datasource, and set `event_type` to `append-kafka`. To select all Kafka releated logs in the last day, run the following query: SELECT * FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'append-kafka' AND timestamp > now() - INTERVAL 1 day ORDER BY timestamp DESC If you can't find logs in `datasources_ops_log` , the `kafka_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-kafka-ops-log) contains more detailed logs. Filter by `datasource_id` to select the correct datasource, and use `msg_type` to select the desired log level ( `info`, `warning` , or `error` ). SELECT * FROM tinybird.kafka_ops_log WHERE datasource_id = 't_1234' AND timestamp > now() - interval 1 day AND msg_type IN ['info', 'warning', 'error'] --- URL: https://www.tinybird.co/docs/get-data-in/connectors/s3 Last update: 2025-01-07T15:48:10.000Z Content: --- title: "S3 Connector · Tinybird Docs" theme-color: "#171612" description: "Bring your S3 data to Tinybird using the S3 Connector." --- # S3 Connector¶ Use the S3 Connector to ingest files from your Amazon S3 buckets into Tinybird so that you can turn them into high-concurrency, low-latency REST APIs. You can load a full bucket or load files that match a pattern. In both cases you can also set an update date from which the files are loaded. With the S3 Connector you can load your CSV, NDJSON, or Parquet files into your S3 buckets and turn them into APIs. Tinybird detects new files in your buckets and ingests them automatically. You can then run serverless transformations using Data Pipes or implement auth tokens in your API Endpoints. ## Prerequisites¶ The S3 Connector requires permissions to access objects in your Amazon S3 bucket. The IAM Role needs the following permissions: - `s3:GetObject` - `s3:ListBucket` - `s3:ListAllMyBuckets` The following is an example of AWS Access Policy: When configuring the connector, the UI, CLI and API all provide the necessary policy templates. { "Version": "2012-10-17", "Statement": [ { "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::", "arn:aws:s3:::/*" ], "Effect": "Allow" }, { "Sid": "Statement1", "Effect": "Allow", "Action": [ "s3:ListAllMyBuckets" ], "Resource": [ "*" ] } ] } The following is an example trust policy: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRole", "Principal": { "AWS": "arn:aws:iam::473819111111111:root" }, "Condition": { "StringEquals": { "sts:ExternalId": "ab3caaaa-01aa-4b95-bad3-fff9b2ac789f8a9" } } } ] } ## Supported file types¶ The S3 Connector supports the following file types: | File type | Accepted extensions | Compression formats supported | | --- | --- | --- | | CSV | `.csv` , `.csv.gz` | `gzip` | | NDJSON | `.ndjson` , `.ndjson.gz` | `gzip` | | | `.jsonl` , `.jsonl.gz` | | | | `.json` , `.json.gz` | | | Parquet | `.parquet` , `.parquet.gz` | `snappy` , `gzip` , `lzo` , `brotli` , `lz4` , `zstd` | You can upload files with .json extension, provided they follow the Newline Delimited JSON (NDJSON) format. Each line must be a valid JSON object and every line has to end with a `\n` character. Parquet schemas use the same format as NDJSON schemas, using [JSONPath](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) syntax. ## S3 file URI¶ Use the full S3 File URI and wildcards to select multiple files. The S3 Connector supports the following wildcard patterns: - Single Asterisk ( `*` ): matches zero or more characters within a single directory level, excluding `/` . It doesn't cross directory boundaries. For example, `s3://bucket-name/*.ndjson` matches all `.ndjson` files in the root of your bucket but doesn't match files in subdirectories. - Double Asterisk ( `**` ): matches zero or more characters across multiple directory levels, including `/` . It can cross directory boundaries recursively. For example: `s3://bucket-name/**/*.ndjson` matches all `.ndjson` files in the bucket, regardless of their directory depth. The file extension is required to accurately match the desired files in your pattern. ### Examples¶ The following are examples of patterns you can use and whether they'd match the example file path: | File path | S3 File URI | Will match? | | --- | --- | --- | | example.ndjson | `s3://bucket-name/*.ndjson` | Yes. Matches files in the root directory with the `.ndjson` extension. | | example.ndjson.gz | `s3://bucket-name/**/*.ndjson.gz` | Yes. Recursively matches `.ndjson.gz` files anywhere in the bucket. | | example.ndjson | `s3://bucket-name/example.ndjson` | Yes. Exact match to the file path. | | pending/example.ndjson | `s3://bucket-name/*.ndjson` | No. `*` doesn't cross directory boundaries. | | pending/example.ndjson | `s3://bucket-name/**/*.ndjson` | Yes. Recursively matches `.ndjson` files in any subdirectory. | | pending/example.ndjson | `s3://bucket-name/pending/example.ndjson` | Yes. Exact match to the file path. | | pending/example.ndjson | `s3://bucket-name/pending/*.ndjson` | Yes. Matches `.ndjson` files within the `pending` directory. | | pending/example.ndjson | `s3://bucket-name/pending/**/*.ndjson` | Yes. Recursively matches `.ndjson` files within `pending` and all its subdirectories. | | pending/example.ndjson | `s3://bucket-name/**/pending/example.ndjson` | Yes. Matches the exact path to `pending/example.ndjson` within any preceding directories. | | pending/example.ndjson | `s3://bucket-name/other/example.ndjson` | No. Does not match because the path includes directories which aren't part of the file's actual path. | | pending/example.ndjson.gz | `s3://bucket-name/pending/*.csv.gz` | No. The file extension `.ndjson.gz` doesn't match `.csv.gz` | | pending/o/inner/example.ndjson | `s3://bucket-name/*.ndjson` | No. `*` doesn't cross directory boundaries. | | pending/o/inner/example.ndjson | `s3://bucket-name/**/*.ndjson` | Yes. Recursively matches `.ndjson` files anywhere in the bucket. | | pending/o/inner/example.ndjson | `s3://bucket-name/**/inner/example.ndjson` | Yes. Matches the exact path to `inner/example.ndjson` within any preceding directories. | | pending/o/inner/example.ndjson | `s3://bucket-name/**/ex*.ndjson` | Yes. Recursively matches `.ndjson` files starting with `ex` at any depth. | | pending/o/inner/example.ndjson | `s3://bucket-name/**/**/*.ndjson` | Yes. Matches `.ndjson` files at any depth, even with multiple `**` wildcards. | | pending/o/inner/example.ndjson | `s3://bucket-name/pending/**/*.ndjson` | Yes. Matches `.ndjson` files within `pending` and all its subdirectories. | | pending/o/inner/example.ndjson | `s3://bucket-name/inner/example.ndjson` | No. Does not match because the path includes directories which aren't part of the file's actual path. | | pending/o/inner/example.ndjson | `s3://bucket-name/pending/example.ndjson` | No. Does not match because the path includes directories which aren't part of the file's actual path. | | pending/o/inner/example.ndjson.gz | `s3://bucket-name/pending/*.ndjson.gz` | No. `*` doesn't cross directory boundaries. | | pending/o/inner/example.ndjson.gz | `s3://bucket-name/other/example.ndjson.gz` | No. Does not match because the path includes directories which aren't part of the file's actual path. | ### Considerations¶ When using patterns: - Use specific directory names or even specific file URIs to limit the scope of your search. The more specific your pattern, the narrower the search. - Combine wildcards: you can combine `**` with other patterns to match files in subdirectories selectively. For example, `s3://bucket-name/**/logs/*.ndjson` matches `.ndjson` files within any logs directory at any depth. - Avoid unintended matches: be cautious with `**` as it can match a large number of files, which might impact performance and return partial matches. To test your patterns and see a sample of your matching files before proceeding, use the **Preview** step in the Connector. ### Sample file URL¶ When files that match the pattern you've provided exceed the file size limits of your plan, or when the preview step reaches request limits, Tinybird prompts you to provide a sample file URL. The sample file is used to infer the schema of the data, ensuring compatibility with the ingestion process. After the schema is inferred, all files matching the initial pattern are ingested. A sample file URL must point to a single file and must follow the full S3 URI format, including the bucket name and directory path. For example, if the initial bucket URI is `s3://example-bucket-name/data/**/*.ndjson` then the Sample file URL would be `s3://example-bucket-name/data/2024-12-01/sample-file.ndjson`. The following considerations apply: - Make sure the sample file is representative of the overall dataset to avoid mismatched schemas during ingestion or quarantined data. - When using compression format, for example .gz, make sure that the sample file is compressed in the same way as the other files in the dataset. - After the preview, all files matching the pattern are ingested, not just the ones processed for the preview. ## Set up the connection¶ You can set up an S3 connection using the UI or the CLI. The steps are as follows: 1. Create a new Data Source in Tinybird. 2. Create the AWS S3 connection. 3. Configure the scheduling options and path/file names. 4. Start ingesting the data. ## Load files using the CLI¶ Before you can load files from Amazon S3 into Tinybird using the CLI, you must create a connection. Creating a connection grants your Tinybird Workspace the appropriate permissions to view files in Amazon S3. To create a connection, you need to use the Tinybird CLI version 3.8.3 or higher. [Authenticate your CLI](https://www.tinybird.co/docs/docs/cli/install#authentication) and switch to the desired Workspace. Follow these steps to create a connection: 1. Run `tb connection create s3_iamrole --policy read` command and press `y` to confirm. 2. Copy the suggested policy and replace the bucket placeholder `` with your bucket name. 3. In AWS, create a new policy in** IAM** ,** Policies (JSON)** using the edited policy. 4. Return to the Tinybird CLI, press `y` , and copy the next policy. 5. In AWS, go to** IAM** ,** Roles** and copy the new custom trust policy. Attach the policy you edited in the previous step. 6. Return to the CLI, press `y` , and paste the ARN of the role you've created in the previous step. 7. Enter the region of the bucket. For example, `us-east-1` . 8. Provide a name for your connection in Tinybird. The `--policy` flag allows to switch between write (sink) and read (ingest) policies. Now that you've created a connection, you can add a Data Source to configure the import of files from Amazon S3. Configure the Amazon S3 import using the following options in your .datasource file: - `IMPORT_SERVICE` : name of the import service to use, in this case, `s3_iamrole` . - `IMPORT_SCHEDULE` : either `@auto` to sync once per minute, or `@on-demand` to only execute manually (UTC). - `IMPORT_STRATEGY` : the strategy used to import data. Only `APPEND` is supported. - `IMPORT_BUCKET_URI` : a full bucket path, including the `s3://` protocol , bucket name, object path and an optional pattern to match against object keys. You can use patterns in the path to filter objects. For example, ending the path with `*.csv` matches all objects that end with the `.csv` suffix. - `IMPORT_CONNECTION_NAME` : name of the S3 connection to use. - `IMPORT_FROM_TIMESTAMP` : (optional) set the date and time from which to start ingesting files. Format is `YYYY-MM-DDTHH:MM:SSZ` . When Tinybird discovers new files, it appends the data to the existing data in the Data Source. Replacing data isn't supported. The following is an example of a .datasource file for S3: ##### s3.datasource file DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE s3_iamrole IMPORT_CONNECTION_NAME connection_name IMPORT_BUCKET_URI s3://bucket-name/*.csv IMPORT_SCHEDULE @auto IMPORT_STRATEGY APPEND With your connection created and Data Source defined, you can now push your project to Tinybird using: tb push ## Load files using the UI¶ ### 1. Create a new Data Source¶ In Tinybird, go to **Data Sources** and select **Create Data Source**. Select **Amazon S3** and enter the bucket name and region, then select **Continue**. ### 2. Create the AWS S3 connection¶ Follow these steps to create the connection: 1. Open the AWS console and navigate to IAM. 2. Create and name the policy using the provided copyable option. 3. Create and name the role with the trust policy using the provided copyable option. 4. Select** Connect** . 5. Paste the connection name and ARN. ### 3. Select the data¶ Select the data you want to ingest by providing the [S3 File URI](https://www.tinybird.co/docs/about:blank#s3-file-uri) and selecting **Preview**. You can also set the ingestion to start from a specific date and time, so that the ingestion process ignores all files added or updated before the set date and time: 1. Select** Ingest since ISO date and time** . 2. Write the desired date or datetime in the input, following the format `YYYY-MM-DDTHH:MM:SSZ` . ### 4. Preview and create¶ The next screen shows a preview of the incoming data. You can review and modify any of the incoming columns, adjust their names, change their types, or delete them. You can also configure the name of the Data Source. After reviewing your incoming data, select **Create Data Source** . On the Data Source details page, you can see the sync history in the tracker chart and the current status of the connection. ## Schema evolution¶ The S3 Connector supports adding new columns to the schema of the Data Source using the CLI. Non-backwards compatible changes, such as dropping, renaming, or changing the type of columns, aren't supported. Any rows from these files are sent to the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources#the-quarantine-data-source). ## Iterate an S3 Data Source¶ To iterate an S3 Data Source, use the Tinybird CLI and the [version control integration](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/working-with-version-control) to handle your resources. Create a connection using the CLI: tb auth # use the main Workspace admin Token tb connection create s3_iamrole To iterate an S3 Data Source through a Branch, create the Data Source using a connector that already exists. The S3 Connector doesn't ingest any data, as it isn't configured to work in Branches. To test it on CI, you can directly append the files to the Data Source. After you've merged it and are running CD checks, run `tb datasource sync ` to force the sync in the main Workspace. ## Limits¶ The following limits apply to the S3 Connector: - When using the `auto` mode, execution of imports runs once every minute. - Tinybird ingests a maximum of 5 files per minute. This is a Workspace-level limit, so it's shared across all Data Sources. The following limits apply to maximum file size per type: | File type | Max file size | | --- | --- | | CSV | 10 GB for the Free plan, 32 GB for Dev and Enterprise | | NDJSON | 10 GB for the Free plan, 32 GB for Dev and Enterprise | | Parquet | 1 GB for the Free plan, 5 GB for Dev and Enterprise | Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. To adjust these limits, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Monitoring¶ You can follow the standard recommended practices for monitoring Data Sources as explained in our [ingestion monitoring guide](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . There are specific metrics for the S3 Connector. If a sync finishes unsuccessfully, Tinybird adds a new event to `datasources_ops_log`: - If all the files in the sync failed, the event has the `result` field set to `error` . - If some files failed and some succeeded, the event has the `result` field set to `partial-ok` . Failures in syncs are atomic, meaning that if one file fails, no data from that file is ingested. A JSON object with the list of files that failed is included in the `error` field. Some errors can happen before the file list can be retrieved (for instance, an AWS connection failure), in which case there are no files in the `error` field. Instead, the `error` field contains the error message and the files to be retried in the next execution. In scheduled runs, Tinybird retries all failed files in the next executions, so that rate limits or temporary issues don't cause data loss. In on-demand runs, since there is no next execution, truncate the Data Source and sync again. You can distinguish between individual failed files and failed syncs by looking at the `error` field: - If the `error` field contains a JSON object, the sync failed and the object contains the error message with the list of files that failed. - If the `error` field contains a string, a file failed to ingest and the string contains the error message. You can see the file that failed by looking at the `Options.Values` field. For example, you can use the following query to see the sync error messages for the last day: SELECT JSONExtractString(error, 'message') message, * FROM tinybird.datasources_ops_log WHERE datasource_id = '' AND timestamp > now() - INTERVAL 1 day AND message IS NOT NULL ORDER BY timestamp DESC --- URL: https://www.tinybird.co/docs/get-data-in/connectors/snowflake Last update: 2024-12-18T09:46:02.000Z Content: --- title: "Snowflake Connector · Tinybird Docs" theme-color: "#171612" description: "Documentation for how to use the Tinybird Snowflake Connector" --- # Snowflake Connector¶ Use the Snowflake Connector to load data from your existing Snowflake account into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs. The Snowflake Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you. With the Snowflake Connector you can: - Start ingesting data instantly from Snowflake using SQL. - Use SQL to query, shape, and join your Snowflake data with other sources. - Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security. Snowflake IP filtering isn't supported by the Snowflake Connector. If you need to filter IPs, use the GCS/S3 Connector. ## Load a Snowflake table¶ ### Load a Snowflake table in the UI¶ To add a Snowflake table as a Data Source, follow these steps. #### Create a connection¶ Create a new Data Source using the Snowflake Connector dialog: 1. Open Tinybird and add a new Data Source by selecting** Create new (+)** next to the** Data Sources** section. 2. In the Data Sources dialog, select the Snowflake connector. 3. Enter your Snowflake Account Identifier. To find this, log into Snowflake, find the account info and then copy the Account Identifier. 4. In Tinybird, in the** Connection details** dialog, configure authentication with your Snowflake account. Enter your user password and Account Identifier. 5. Select the role and warehouse to access your data. 6. Copy the SQL snippet from the text box. The snippet creates a new Snowflake Storage Integration linking your Snowflake account with a Tinybird staging area for your Workspace. It also grants permission to the given role to create new Stages to unload data from your Snowflake Account into Tinybird. 7. With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor. You must edit the query and replace the `` fragment with the name of your Snowflake database. 8. Select** Run** . The statement must be executed with a Snowflake `ACCOUNTADMIN` role, since Snowflake Integrations operate at Account level and usually need administrator permissions. #### Select the database, table, and schema¶ After running the query, the `Statement executed successfully` message appears. Return to your Tinybird tab to resume the configuration of the Snowflake connector. Set a name for the Snowflake connection and complete this step by selecting **Next**. The Snowflake Connector now has enough permissions to inspect your Snowflake objects available to the given role. Browse the tables available in Snowflake and select the table you wish to load. Start by selecting the database to which the table belongs, then the schema, and the table. Finish by selecting **Next**. Maximum allowed table size is 50 million rows. The result is truncated if it exceeds that limit. #### Configure the schedule¶ You can configure the schedule on which you wish to load data. By default, the frequency is set to **One-off** which performs a one-time sync of the table. You can change this by selecting a different option from the menu. To configure a schedule that runs a regular sync, select the **Interval** option. You can configure a schedule in minutes, hours, or days by using the menu, and set the value for the schedule in the text field. You can also select whether the sync should run immediately, or if it should wait until the first scheduled sync. The **Replace data** import strategy is selected by default. Finish by selecting **Next**. Maximum allowed frequency is 5 minutes. #### Complete the configuration¶ The final screen of the dialog shows the interpreted schema of the table, which you can change as needed. You can also modify what the name of the Data Source in Tinybird. Select **Create Data Source** to complete the process. After you've created the Data Source, a status chart appears showing executions of the loading schedule. The Data Source takes a moment to create the resources required to perform the first sync. When the first sync has completed, a green bar appears indicating the status. Details about the data, such as storage size and number of rows, is shown. You can also see a preview of the data. ### Load a Snowflake table in the CLI¶ To add a Snowflake table as a Data Source using the Tinybird CLI, follow these steps. #### Create a connection¶ You need to create a connection before you can load a table from Snowflake into Tinybird using the CLI. Creating a connection grants your Tinybird Workspace the appropriate permissions to view data from Snowflake. [Authenticate your CLI](https://www.tinybird.co/docs/docs/cli/install#authentication) and switch to the desired Workspace. Then run: tb connection create snowflake The output includes instructions to configure read-only access to your data in Snowflake. Enter your user, password, account identifier, role, warehouse, and a name for the connection. After introducing the required information, copy the SQL block that appears. ** Creating a new Snowflake connection at the xxxx workspace. User (must have create stage and create integration in Snowflake): Password: Account identifier: Role (optional): Warehouse (optional): Connection name (optional, current xxxx): Enter this SQL statement in Snowflake using your admin account to create the connection: ------ create storage integration if not exists "tinybird_integration_role" type = external_stage storage_provider = 'GCS' enabled = true comment = 'Tinybird Snowflake Connector Integration' storage_allowed_locations = ('gcs://tinybird-cdk-production-europe-west3/id'); grant create stage on all schemas in database to role ACCOUNTADMIN; grant ownership on integration "tinybird_integration_ACCOUNTADMIN" to role ACCOUNTADMIN; ------ Ready? (y, N): ** Validating connection... ** xxxx.connection created successfully! Connection details saved into the .env file and referenced automatically in your connection file. With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor. You must edit the query and replace the `` fragment with the name of your Snowflake database. Select **Run** . This statement must be executed with a Snowflake `ACCOUNTADMIN` role, since Snowflake Integrations operate at Account level and usually need administrator permissions. The `Statement executed successfully` message appears. Return to your terminal, select **yes** (y) and the connection is created.A new `snowflake.connection` file appears in your project files. The `.connection` file can be safely deleted. #### Create the Data Source¶ After you've created the connection, you can create a Data Source and configure the schedule to import data from Snowflake. The Snowflake import is configured using the following options, which you can add at the end of your .datasource file: - `IMPORT_SERVICE` : Name of the import service to use. In this case, `snowflake` . - `IMPORT_CONNECTION_NAME` : The name given to the Snowflake connection inside Tinybird. For example, `'my_connection'` . - `IMPORT_EXTERNAL_DATASOURCE` : The fully qualified name of the source table in Snowflake. For example, `database.schema.table` . - `IMPORT_SCHEDULE` : A cron expression (UTC) with the frequency to run imports. Must be higher than 5 minutes. For example, `*/5 * * * *` . - `IMPORT_STRATEGY` : The strategy to use when inserting data, either `REPLACE` or `APPEND` . - `IMPORT_QUERY` : (Optional) The SELECT query to extract your data from Snowflake when you don't need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope: `database.schema.table` . Note: For `IMPORT_STRATEGY` only `REPLACE` is supported today. The `APPEND` strategy will be enabled in a future release. The following example shows a configured .datasource file for a Snowflake Data Source: ##### snowflake.datasource file DESCRIPTION > Snowflake demo data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `id` Integer `json:$.id`, `orderid` LowCardinality(String) `json:$.orderid`, `status` LowCardinality(String) `json:$.status`, `amount` Integer `json:$.amount` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" ENGINE_TTL "timestamp + toIntervalDay(60)" IMPORT_SERVICE snowflake IMPORT_CONNECTION_NAME my_snowflake_connection IMPORT_EXTERNAL_DATASOURCE mydb.raw.events IMPORT_SCHEDULE */5 * * * * IMPORT_STRATEGY REPLACE IMPORT_QUERY > select timestamp, id, orderid, status, amount from mydb.raw.events The columns you select in the `IMPORT_QUERY` must match the columns defined in the Data Source schema. For example, if you Data Source has columns `ColumnA, ColumnB` then your `IMPORT_QUERY` must contain `SELECT ColumnA, ColumnB FROM ...` . A mismatch of columns causes data to arrive in the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine). #### Push the configuration to Tinybird¶ With your connection created and Data Source defined, you can now push your project to Tinybird using: tb push The first run of the import begins on the next lapse of the CRON expression. ## Iterate a Snowflake Data Source¶ ### Prerequisites¶ Use of the CLI and the version control integration to handle your resources. To use the advantages of version control, connect your Workspace with [your repository](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/working-with-version-control#connect-your-workspace-to-git-from-the-cli) , and set the [CI/CD configuration](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/continuous-integration). Check the [use case examples](https://github.com/tinybirdco/use-case-examples) repository where you can find basic instructions and examples to handle Snowflake Data Sources iteration using git integration, under the `iterate_snowflake` section. To use the [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/quick-start) check its documentation. For instance to create the connections in the main-branch Workspace using the CLI: tb auth # use the main Workspace admin Token tb connection create snowflake # these prompts are interactive and will ask you to insert the necessary information You can only create connections in the main Workspace. When creating the connection in a Branch, it's created in the main Workspace and from there is available to every Branch. For testing purposes, use different connections from main-branches workspaces. ### Add a new Snowflake Data Source¶ You can add a new Data Source directly with the UI or the CLI tool, following [the load of a Snowflake table section](https://www.tinybird.co/docs/about:blank#load-a-snowflake-table). This works for testing purposes, but doesn't carry any connection details. You must add the connection and Snowflake configuration in the .datasource file when moving to production. To add a new Data Source using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_snowflake). ### Update a Data Source¶ - Snowflake Data Sources can't be modified directly from UI - When you create a new Tinybird Branch, the existing Snowflake Data Sources won't be connected. You need to re-create them in the Branch. - In Branches, it's usually useful to work with[ fixtures](https://www.tinybird.co/docs/docs/work-with-data/strategies/implementing-test-strategies#fixture-tests) , as they'll be applied as part of the CI/CD, allowing the full process to be deterministic in every iteration and avoiding quota consume from external services. Snowflake Data Sources can be modified from the CLI tool: tb auth # modify the .datasource Datafile with your editor tb push --force {datafile} # check the command output for errors To update it using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_snowflake). ### Delete a Data Source¶ Snowflake Data Sources can be deleted directly from UI or CLI like any other Data Source. To delete it using the recommended version control workflow check the instructions in the [examples repository](https://github.com/tinybirdco/use-case-examples/tree/main/iterate_snowflake). ## Logs¶ Job executions are logged in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) . You can check this log directly in the Data Source view page in the UI. Filter by `datasource_id` to monitor ingestion through the Snowflake Connector from the `datasources_ops_log`: SELECT timestamp, event_type, result, error, job_id FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'replace' ORDER BY timestamp DESC ## Schema evolution¶ The Snowflake Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in Snowflake, the next sync job automatically adds it to the Tinybird Data Source. Non-backwards compatible changes, such as dropping or renaming columns, aren't supported and might cause the next sync to fail. ## Limits¶ See [Snowflake Connector limits](https://www.tinybird.co/docs/docs/get-started/plans/limits#snowflake-connector-limits). --- URL: https://www.tinybird.co/docs/get-data-in/data-operations Content: --- title: "Data operations · Tinybird Docs" theme-color: "#171612" description: "Replace and delete data, recover data from quarantine, iterate a data source, and more." --- # Data operations¶ After ingesting data into Tinybird, you might need to perform various operations to maintain and optimize your data. This section covers common data operations like: - [ Replacing and deleting data](https://www.tinybird.co/docs/docs/get-data-in/data-operations/replace-and-delete-data) : Update or remove data selectively or entirely from your Data Sources. - [ Iterating a Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source) : Make schema changes and evolve your Data Sources over time. - [ Scheduling data operations](https://www.tinybird.co/docs/docs/get-data-in/data-operations/scheduling-with-github-actions-and-cron) : Automate data operations using cron jobs or GitHub Actions. - [ Recovering from quarantine](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine) : Handle and fix data that didn't match your schema during ingestion. These operations help you maintain data quality, adapt to changing requirements, and ensure your data pipeline runs smoothly. Whether you need to fix data issues, modify schemas, or automate routine tasks, Tinybird provides the tools to manage your data effectively. --- URL: https://www.tinybird.co/docs/get-data-in/data-operations/iterate-a-data-source Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Iterate a Data Source · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to change the schema of a Data Source without using version control." --- # Iterating a Data Source (change or update schema)¶ Creating a Data Source for the first time is really straightforward. However, when iterating data projects, sometimes you need to edit the Data Source schema. This can be challenging when the data is already in production, and there are a few different scenarios. With Tinybird you can easily add more columns, but other operations (such as changing the sorting key or changing a column type) require you to fully recreate the Data Source. This guide is for Workspaces that **aren't** using version control. If your Workspace is linked using the Git<>Tinybird integration, see the repo of [common use cases for iterating when using version control](https://github.com/tinybirdco/use-case-examples). ## Overview¶ This guide walks through the iteration process for 4 different scenarios. Pick the one that's most relevant for you: - ** Scenario 1: I'm not in production** - ** Scenario 2: I can stop/pause data ingestion** - ** Scenario 3: I need to change a Materialized View & I can't stop data ingest** - ** Scenario 4: It's too complex and I can't figure it out** ## Prerequisites¶ You'll need to be familiar with the Tinybird CLI to follow along with this guide. Never used it before? [Read the docs here](https://www.tinybird.co/docs/docs/cli/quick-start). All of the guide examples have the same setup - a Data Source with a `nullable(Int64)` column that the user wants to change to a `Int64` for performance reasons. This requires editing the schema and, to keep the existing data, replacing any occurrences of `NULL` with a number, like `0`. ## Scenario 1: I'm not in production¶ This scenario assumes that you aren't in production and can accept losing any data you have already ingested. If you aren't in production, and you can accept losing data, use the Tinybird CLI to pull your Data Source down to a file, modify it, and push it back into Tinybird. Begin with `tb pull` to pull your Tinybird resources down to files. Then, modify the .datasource file for the Data Source you want to change. When you're finished modifying the Data Source, delete the existing Data Source from Tinybird, either in the CLI with `tb datasource rm` or through the UI. Finally, push the new Data Source to Tinybird with `tb push`. See a screencast example: [https://www.youtube.com/watch?v=gzpuQfk3Byg](https://www.youtube.com/watch?v=gzpuQfk3Byg). ## Scenario 2: I can stop data ingestion¶ This scenario assumes that you have stopped all ingestion into the affected Data Sources. ### 1. Use the CLI to pull your Tinybird resources down into files¶ Use `tb pull --auto` to pull your Tinybird resources down into files. The `--auto` flag will organize the resources into directories, with your Data Sources being places into a `datasources` directory. ### 2. Create the new Data Source¶ Create a copy of the Data Source file that you want to modify and rename it. For example, `datasources/original_ds.datasource` -> `datasources/new_ds.datasource`. Modify the new Data Source schema in the file to make the changes you need. Now push the new Data Source to Tinybird with `tb push datasources/new_ds.datasource`. ### 3. Backfill the new Data Source with existing data¶ If you want to move the existing data from the original Data Source to the new Data Source, use a Copy Pipe or a Pipe that materializes data into the new Data Source. #### 3.1 Recommended option: Copy Pipe¶ A [Copy Pipe](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes) is a Pipe used to copy data from one Data Source to another Data Source. This method is useful for one-time moves of data or scheduled executions. Move your data using the following Copy Pipe, paying particular attention to the `TYPE`, `TARGET_DATASOURCE` and `COPY_SCHEDULE` configs at the end: NODE copy_node SQL > SELECT * EXCEPT (my_nullable_column), toInt64(coalesce(my_nullable_column,0)) as my_column -- adjust query to your changes FROM original_ds TYPE COPY TARGET_DATASOURCE new_ds COPY_SCHEDULE @on-demand Push it to the Workspace: tb push pipes/temp_copy.pipe And run the Copy: tb pipe copy run temp_copy When it's done, remove the Pipe: tb pipe rm temp_copy #### 3.2 Alternative option: A Populate¶ Alternatively, you can create a Materialized View Pipe and run a Populate to transform data from the original schema into the modified schema of the new Data Source. Do this using the following Pipe, paying particular attention to the `TYPE` and `DATASOURCE` configs at the end: NODE temp_populate SQL > SELECT * EXCEPT (my_nullable_column), toInt64(coalesce(my_nullable_column,0)) as my_column FROM original_ds TYPE materialized DATASOURCE new_ds Then push the Pipe to Tinybird, passing the `--populate` flag to force it to immediately start processing data: tb push pipes/temp.pipe --populate --wait When it's done, remove the Pipe: tb pipe rm temp At this point, review your new Data Source and ensure that everything is as expected. ### 4. Delete the original Data Source and rename the new Data Source¶ You can now go to the UI, delete the original Data Source, and rename the new Data Source to use the name of the original Data Source. By renaming the new Data Source to use the same name as the original Data Source, any SQL in your Pipes or Endpoints that referred to the original Data Source will continue to work. If you have a Materialized View that depends on the Data Source, you must unlink the Pipe that is materializing data before removing the Data Source. You can modify and reconnect your Pipe after completing the steps above. ## Scenario 3: I need to change a Materialized View & I can't interrupt service¶ This scenario assumes you want to modify a Materialized View that is actively receiving data and serving API Endpoints, *and* you want to avoid service downtime. ### Before you begin¶ Because this is a complex scenario, let's introduce some names for the example resources to make it a bit easier to follow along. Let's assume that you have a Data Source that is actively receiving data; let's call this the `Landing Data Source` . From the `Landing Data Source` , you have a Pipe that is writing to a Materialized View; let's call these the `Materializing Pipe` and `Materialized View Data Source` respectively. ### 1. Use the CLI to pull your Tinybird resources down into files¶ Use `tb pull --auto` to pull your Tinybird resources down into files. The `--auto` flag organizes the resources into directories, with your Data Sources being places into a `datasources` directory. ### 2. Duplicate the Materializing Pipe & Materialized View Data Source¶ Duplicate the `Materializing Pipe` & `Materialized View Data Source`. For example: pipes/original_materializing_pipe.pipe -> pipes/new_materializing_pipe.pipe datasources/original_materialized_view_data_source.datasource -> datasources/new_materialized_view_data_source.datasource Modify the new files to change the schema as needed. Lastly, you'll need to add a `WHERE` clause to the new `Materializing Pipe` . This clause is going to filter out old rows, so that the `Materializing Pipe` is only materializing rows newer than a specific time. For the purpose of this guide, let's call this the `Future Timestamp` . Do **not** use variable time functions for this timestamp (e.g. `now()` ). Pick a static time that is in the near future; five to fifteen minutes should be enough. The condition should be `>` , for example: WHERE … AND my_timestamp > "2024-04-12 13:15:00" ### 3. Push the Materializing Pipe & Materialized View Data Source¶ Push the `Materializing Pipe` & `Materialized View Data Source` to Tinybird: tb push datasources/new_materialized_view_data_source.datasource tb push pipes/new_materializing_pipe.pipe ### 4. Create a new Pipe to transform & materialize the old schema to the new schema¶ You now have two Materialized Views: the one with the original schema, and the new one with the new schema. You need to take the data from the original Materialized View, transform it into the new schema, and write it into the new Materialized View. To do this, create a new Pipe. In this guide, it's called the `Transform Pipe` . In your `Transform Pipe` create the SQL `SELECT` logic that transforms the old schema to the new schema. Lastly, your `Transform Pipe` should have a `WHERE` clause that only selects rows that are **older** than our `Future Timestamp` . The condition should be `<=` , for example: WHERE … AND my_timestamp <= "2024-01-12 13:00:00" ### 5. Wait until after the Future Timestamp, then push & populate with the Transform Pipe¶ Now, to avoid any potential for creating duplicates or missing rows, wait until after the `Future Timestamp` time has passed. This means that there should no longer be any rows arriving that have a timestamp that is **older** than the `Future Timestamp`. Then, push the `Transform Pipe` and force a populate: tb push pipes/new_materializing_pipe.pipe --populate --wait ### 6. Wait for the populate to finish, then change your API Endpoint to read from the new Materialized View Data Source¶ Wait until the previous command has completed to ensure that all data from the original Materialized View has been written to the new `Materialized View Data Source`. When it's complete, modify the API Endpoint that is querying the old Materialized View to query from the new `Materialized View Data Source`. For example: SELECT * from original_materialized_view_data_source Would become: SELECT * from new_materialized_view_data_source ### 7. Test, then clean up old resources¶ Test that your API Endpoint is serving the correct data. If everything looks good, you can tidy up your Workspace by deleting the original Materialized View & the new `Transform Pipe`. ## Scenario 4: It's too complex and I can't figure it out¶ If you are dealing with a very complex scenario, don't worry! Contact Tinybird support ( [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) ). ## Next steps¶ - Got your schema sorted and ready to make some queries? Understand[ how to work with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) . - Learn how to[ monitor your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . --- URL: https://www.tinybird.co/docs/get-data-in/data-operations/recover-from-quarantine Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Recover data in quarantine · Tinybird Docs" theme-color: "#171612" description: "Learn how to recover data from quarantine, and how to fix common errors that cause data to be sent to quarantine." --- # Recover data from quarantine¶ In this guide you'll learn about the quarantine Data Source, and how to use it to detect and fix errors on your Data Sources. The quarantine Data Source is named `{datasource_name}_quarantine` and can be queried using Pipes like a regular Data Source. ## Prerequisites¶ This guide assumes you're familiar with the concept of the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources#the-quarantine-data-source). ## Example scenario¶ This guide uses the Tinybird CLI, but all steps can be performed in the UI as well. ### Setup¶ This example uses an NDJSON Data Source that looks like this: { "store_id": 1, "purchase": { "product_name": "shoes", "datetime": "2022-01-05 12:13:14" } } But you could use any ingestion method. Let's say you generate a Data Source file from this JSON snippet, push the Data Source to Tinybird, and ingest the JSON as a single row: ##### Push the NDJSON\_DS Data Source echo '{"store_id":1,"purchase":{"product_name":"shoes","datetime":"2022-01-05 12:13:14"}}' > ndjson_ds.ndjson tb datasource generate ndjson_ds.ndjson tb push --fixtures datasources/ndjson_ds.datasource tb sql "select * from ndjson_ds" The schema generated from the JSON will look like this: ##### NDJSON\_DS.DATASOURCE DESCRIPTION > Generated from ndjson_ds.ndjson SCHEMA > purchase_datetime DateTime `json:$.purchase.datetime`, purchase_product_name String `json:$.purchase.product_name`, store_id Int16 `json:$.store_id` At this point, you can in the UI and confirm your Data Source had been created and the row ingested. Hooray! <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-1.png&w=3840&q=75) <-figcaption-> Data Source details can be accessed from your Sidebar ### Add data that doesn't match the schema¶ Now, if you append some rows that don't match the Data Source schema, these rows will end up in the quarantine Data Source. ##### Append rows with wrong schema echo '{"store_id":2,"purchase":{"datetime":"2022-01-05 12:13:14"}}\n{"store_id":"3","purchase":{"product_name":"shirt","datetime":"2022-01-05 12:13:14"}}' > ndjson_quarantine.ndjson tb datasource append ndjson_ds ndjson_quarantine.ndjson tb sql "select * from ndjson_ds_quarantine" This time, if you check in the UI, you'll see a notification warning you about quarantined rows: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-2.png&w=3840&q=75) <-figcaption-> The quarantine Data Source is always accessible (if it contains any rows) from the Data Source modal window In the Data Source view you'll find the Log tab, which shows you details about all operations performed on a Data Source. If you're following the steps of this guide, you should see a row with `event_type` as **append** and `written_rows_quarantine` as **2**. From the quarantine warning notification, navigate to the quarantine Data Source page, and review the problematic rows: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-3.png&w=3840&q=75) <-figcaption-> Within the quarantine view you can see both, a summary of errors and the rows that have failed The **Errors** view shows you a summary of all the errors and the number of occurrences for each of those, so you can prioritize fixing the most common ones. The **Rows** view shows you all the rows that have failed, so you can further investigate why. ## Fix quarantine errors¶ There are generally three ways of fixing quarantine errors: ### 1. Modify your data producer¶ Usually, the best solution is to fix the problem at the source. This means updating the applications or systems that are producing the data, before they send it to Tinybird. The benefit of this is that you don't need to do additional processing to normalize the data after it has been ingested, which helps to save cost and reduce overall latency. However, it can come at the cost of having to push changes into a production application, which can be complex or have side effects on other systems. ### 2. Modify the Data Source schema¶ Often, the issue that causes a row to end up in quarantine is a mismatch of data types. A simple solution is to [modify the Data Source schema](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source) to accept the new type. For example, if an application is starting to send integers that are too large for `Int8` , you might update the schema to use `Int16`. Avoid Nullable columns, as they can have significantly worse performance. Instead, send alternative values like `0` for any `Int` type, or an empty string for a `String` type. ### 3. Transform data with Pipes and Materialized Views¶ This is one of the most powerful capabilities of Tinybird. If you aren't able to modify the data producer, you can apply a transformation to the erroring columns at ingestion time and materialize the result into another Data Source. You can read more about this in the [Materialized Views docs](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views). ## Recover rows from quarantine¶ The quickest way to recover rows from quarantine is to fix the cause of the errors and then re-ingest the data. However, that isn't always possible. You can recover rows from the quarantine using a recovery Pipe and the Tinybird API: ### Create a recovery Pipe¶ You can create a Pipe to select the rows from the quarantine Data Source and transform them into the appropriate schema. The previous example showed rows where the `purchase_product_name` contained `null` or the `store_id` contained a `String` rather than an `Int16`: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-5.png&w=3840&q=75) <-figcaption-> Remember that quarantined columns are Nullable(String) All columns in a quarantine Data Source are `Nullable()` , which means that you must use the [coalesce()](https://www.tinybird.co/docs/docs/sql-reference/functions/functions-for-nulls#coalesce) function if you want to transform them into a non-nullable type. This example uses coalesce to set a default value of `DateTime(0)`, `''` , or `0` for `DateTime`, `String` and `Int16` types respectively. Additionally, all columns in a quarantine Data Source are stored as `String` . This means that you must specifically transform any non-String column into its desired type as part of the recovery Pipe. This example transforms the `purchase_datetime` and `store_id` columns to `DateTime` and `Int16` types respectively. The quarantine Data Source contains additional meta-columns `c__error_column`, `c__error`, `c__import_id` , and `insertion_date` with information about the errors and the rows, so you should not use `SELECT *` to recover rows from quarantine. The following SQL transforms the quarantined rows from this example into the original Data Source schema: SELECT coalesce( parseDateTimeBestEffortOrNull( purchase_datetime ), toDateTime(0) ) as purchase_datetime, coalesce( purchase_product_name, '' ) as purchase_product_name, coalesce( coalesce( toInt16(store_id), toInt16(store_id) ), 0 ) as store_id FROM ndjson_ds_quarantine <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-6.png&w=3840&q=75) <-figcaption-> Recover endpoint Just as with any other Pipe, you can publish the results of this recovery Pipe as an API Endpoint. ### Ingest the fixed rows and truncate quarantine¶ You can then use the Tinybird CLI to append the fixed data back into the original Data Source, by hitting the API Endpoint published from the recovery Pipe: tb datasource append To avoid dealing with JSONPaths, you can hit the recovery Pipe's CSV endpoint: tb datasource append ndjson_ds https://api.tinybird.co/v0/pipes/quarantine_recover.csv?token= Check that your Data Source now has the fixed rows, either in the UI, or from the CLI using: tb sql "select * from ndjson_ds" Finally, truncate the quarantine Data Source to clear out the recovered rows, either in the UI, or from the CLI using: tb datasource truncate ndjson_ds_quarantine --yes You should see that your Data Source now has all of the rows, and the quarantine notification has disappeared. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-quarantine-7.png&w=3840&q=75) <-figcaption-> Data Source with the recovered rows and truncated quarantine If your quarantine has too many rows, you may need to add pagination based on the `insertion_date` and/or `c__import_id` columns. If you're using a Kafka Data Source, remember to add the Kafka metadata columns. ## Recover rows from quarantine with CI/CD¶ When you connect your Workspace to Git and it becomes read-only you want all your workflows to go through CI/CD. This is how you recover rows from quarantine in your data project using Git and automating the workflow. ### Prototype the process in a Branch¶ This step is optional, but it's good practice. When you need to perform a change to your data project and it's read-only, you can create a new Branch and prototype the changes there, then later bring them to Git. To test this process: 1. Create a Branch 2. Ingest a file that creates rows in quarantine 3. Prototype a Copy Pipe 4. Run it 5. Validate data is recovered ### A practical example with Git¶ There is an additional guide showing how to [recover quarantine rows from Git using CI/CD](https://github.com/tinybirdco/use-case-examples/tree/main/recover_data_from_quarantine) , where the data project is the [Web Analytics template](https://www.tinybird.co/templates). When your rows end up in quarantine, you receive an e-mail like this: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgit-quarantine.jpg&w=3840&q=75) In this additional example, the issue is the `timestamp` column - instead of being a DateTime, it's String Unix time, so the rows can't be properly ingested. {"timestamp":"1697393030","session_id":"b7b1965c-620a-402a-afe5-2d0eea0f9a34","action":"page_hit","version":"1","payload":"{ \"user-agent\":\"Mozilla\/5.0 (Linux; Android 13; SM-A102U) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/106.0.5249.118 Mobile Safari\/537.36\", \"locale\":\"en-US\", \"location\":\"FR\", \"referrer\":\"https:\/\/www.github.com\", \"pathname\":\"\/pricing\", \"href\":\"https:\/\/www.tinybird.co\/pricing\"}"} To convert the `timestamp` values in quarantine to a `DateTime` , you'd build a Copy Pipe like this: NODE copy_quarantine SQL > SELECT toDateTime(fromUnixTimestamp64Milli(toUInt64(assumeNotNull(timestamp)) * 1000)) timestamp, assumeNotNull(session_id) session_id, assumeNotNull(action) action, assumeNotNull(version) version, assumeNotNull(payload) payload FROM analytics_events_quarantine TYPE COPY TARGET_DATASOURCE analytics_events To test the changes, you'd need to do a custom deployment: #!/bin/bash # use set -e to raise errors for any of the commands below and make the CI pipeline to fail set -e tb datasource append analytics_events datasources/fixtures/analytics_events_errors.ndjson tb deploy tb pipe copy run analytics_events_quarantine_to_final --wait --yes sleep 10 First append a sample of the quarantined rows, then deploy the Copy Pipe, and finally run the copy operation. Once changes have been deployed in a test Branch, you can write data quality tests to validate the rows are effectively being copied: - analytics_events_quarantine: max_bytes_read: null max_time: null sql: | SELECT count() as c FROM analytics_events_quarantine HAVING c <= 0 - copy_is_executed: max_bytes_read: null max_time: null sql: | SELECT count() c, sum(rows) rows FROM tinybird.datasources_ops_log WHERE datasource_name = 'analytics_events' AND event_type = 'copy' HAVING rows != 74 and c = 1 `analytics_events_quarantine` checks that effectively some of the rows are in quarantine while `copy_is_executed` tests that the rows in quarantine have been copied to the `analytics_events` Data Source. Lastly, you need to deploy the Branch: # use set -e to raise errors for any of the commands below and make the CI pipeline to fail set -e tb deploy tb pipe copy run analytics_events_quarantine_to_final --wait You can now merge the Pull Request, the Copy Pipe will be deployed to the Workspace and the copy operation will be executed ingesting all rows in quarantine. After that you can optionally truncate the quarantine Data Source using `tb datasource truncate analytics_events_quarantine`. This is a [working Pull Request](https://github.com/tinybirdco/use-case-examples/pull/6) with all the steps mentioned above. ## Next steps¶ - Make sure you're familiar with the[ challenges of backfilling real-time data](https://www.tinybird.co/docs/docs/work-with-data/strategies/backfill-strategies#the-challenge-of-backfilling-real-time-data) - Learn how to[ monitor your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . --- URL: https://www.tinybird.co/docs/get-data-in/data-operations/replace-and-delete-data Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Replace and delete data · Tinybird Docs" theme-color: "#171612" description: "Update & delete operations are common in transactional databases over operational data, but sometimes you also need to make these changes on your analytical data in Tinybird." --- # Replace and delete data in your Tinybird Data Sources¶ Update and delete operations are common in transactional databases over operational data, but sometimes you also need to make these changes on your analytical data in Tinybird. Sometimes, you need to delete or replace some of your data in Tinybird. Perhaps there was a bug in your app, a transient error in your operational database, or simply an evolution of requirements due to product or regulatory changes. It's **not safe** to replace data in the partitions where you are actively ingesting data. You may lose the data inserted during the process. Tinybird works well with append-only workloads but also fully supports replacing and deleting data. It abstracts away the tricky complexities of data replication, partition management and mutations rewriting, allowing you to focus on your data engineering flows and not the internals of real-time analytical databases. This guide shows you using different examples, how to selectively delete or update data in Tinybird using the REST API. You can then adapt these processes for your own needs. All operations on this page require a Token with the correct scope. In the code snippets, replace `` by a Token whose [scope](https://www.tinybird.co/docs/docs/api-reference/token-api) is `DATASOURCES:CREATE` or `ADMIN`. ## Delete data selectively¶ To delete data that's within a condition, send a POST request to the [Data Sources /delete API](https://www.tinybird.co/docs/docs/api-reference/datasource-api#post--v0-datasources-(.+)-delete) , providing the name of one of your Data Sources in Tinybird and a `delete_condition` parameter, which is an SQL expression filter. Delete operations don't automatically cascade to downstream Materialized Views. You may need to perform separate delete operations on Materialized Views. Imagine you have a Data Source called `events` and you want to remove all the transactions for November 2019. You'd send a POST request like this: - CLI - API ##### Delete data selectively tb datasource delete events --sql-condition "toDate(date) >= '2019-11-01' and toDate(date) <= '2019-11-30'" Once you make the request, you can see that the `POST` request to the delete API Endpoint is asynchronous. It returns a [job response](https://www.tinybird.co/docs/docs/api-reference/jobs-api#jobs-api-getting-information-about-jobs) , indicating an ID for the job, the status of the job, the `delete_condition` , and some other metadata. Although the delete operation runs asynchronously, the operation waits synchronously for all the mutations to be rewritten and delete the data replicas. Queries reading data either see the state before the operation or after it's complete. { "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_url": "https://api.tinybird.co/v0/jobs/64e5f541-xxxx-xxxx-xxxx-00524051861b", "job": { "kind": "delete_data", "id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "job_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b", "status": "waiting", "created_at": "2023-04-11 13:52:32.423207", "updated_at": "2023-04-11 13:52:32.423213", "started_at": null, "is_cancellable": true, "datasource": { "id": "t_c45d5ae6781b41278fcee365f5bxxxxx", "name": "shopping_data" }, "delete_condition": "event = 'search'" }, "status": "waiting", "delete_id": "64e5f541-xxxx-xxxx-xxxx-00524051861b" } You can periodically poll the `job_url` with the given ID to check the status of the deletion process. When the status is `done` your job deleted the data matching the SQL expression filter and all your Pipes and API Endpoints continue running with the remaining data in the Data Source. ### Truncate a Data Source¶ Sometimes you want to delete all data contained in a Data Source. You can perform this action from the UI and API. Using the API, the [truncate](https://www.tinybird.co/docs/docs/api-reference/datasource-api#post--v0-datasources-(.+)-truncate) endpoint deletes all rows in a Data Source as shown in this example: - CLI - API ##### Truncate a Data Source tb datasource truncate You can also truncate a Data Source directly from the UI: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Freplacing-and-deleting-data-1.png&w=3840&q=75) <-figcaption-> Deleting selectively is only available via API, but truncating it to delete all the data is availabl via the UI. ## Replace data selectively¶ The ability to update data is often not the top priority when designing analytical databases, but there are always scenarios where you need to update or replace your analytical data. For example, you might have reconciliation processes over your transactions that affect your original data. Or maybe your ingestion process was simply faulty, and you ingested inaccurate data for a period of time. In Tinybird, you can specify a condition to replace only part of the data during the ingestion process. For instance, if you want to reingest a CSV with the data for November 2019 and update your Data Source accordingly. To update the data, you pass the `replace_condition` parameter with the `toDate(date) >= '2019-11-01' and toDate(date) <= '2019-11-30'` condition. - CLI - API ##### Replace data selectively tb datasource replace events \ https://storage.googleapis.com/tinybird-assets/datasets/guides/events_1M_november2019_1.csv \ --sql-condition "toDate(date) >= '2019-11-01' and toDate(date) <= '2019-11-30'" The response to the previous API call looks like this: ##### Response after replacing data { "id": "a83fcb35-8d01-47b9-842c-a288d87679d0", "job_id": "a83fcb35-8d01-47b9-842c-a288d87679d0", "job_url": "https://api.tinybird.co/v0/jobs/a83fcb35-8d01-47b9-842c-a288d87679d0", "job": { "kind": "import", "id": "a83fcb35-8d01-47b9-842c-a288d87679d0", "job_id": "a83fcb35-8d01-47b9-842c-a288d87679d0", "import_id": "a83fcb35-8d01-47b9-842c-a288d87679d0", "status": "waiting", "statistics": null, "datasource": { ... }, "quarantine_rows": 0, "invalid_lines": 0 }, "status": "waiting", "import_id": "a83fcb35-8d01-47b9-842c-a288d87679d0" } As in the case of the selective deletion, selective replacement also runs as an asynchronous request, so [check the status of the job](https://www.tinybird.co/docs/docs/api-reference/jobs-api#jobs-api-getting-information-about-jobs) periodically. You can see the status of the job by using the `job_url` returned in the previous response. ### About the replace condition¶ Conditional replaces apply over partitions and the match condition selects partitions needed for the operation. The records remaining after the match condition determine the partitions involved. Always include the parition key in the replace condition to maintain consistency. The replace condition filters the new data that's appended, meaning it excludes rows not matching the condition. The condition is also applied for the selected partitions in the Data Source, removing rows that don't match the condition in these partitions. Rows that don't match the condition and may be present in other partitions remain. See the [example](https://www.tinybird.co/docs/docs/get-data-in/data-operations/replace-and-delete-data#example) that follows for a better understanding of selectively replacing data in a datasource. ### Linked Materialized Views¶ If you have several connected Materialized Views, then selective replaces proceed in a cascading fashion. For example, if datasource A materializes data to datasource B and from there to datasource C, then when you replace data in datasource A, datasources B and C automatically update accordingly. All three Data Sources need to have compatible partition keys since replaces processed by partition. The command `tb dependencies --datasource the_data_source --check-for-partial-replace` returns the dependencies for both for datasouces and materialized views and raises an error if any of the dependencies have incompatible partition keys. Remember: The provided Token must have the `DATASOURCES:CREATE` [scope](https://www.tinybird.co/docs/docs/api-reference/token-api). ### Example¶ For this example, consider this Data Source: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Freplacing-example-1.jpeg&w=3840&q=75) Its partition key is `ENGINE_PARTITION_KEY "profession"` . If you wanted to replace the last two rows with new data, you can send this request with the replace condition `replace_condition=(profession='Jedi')`: - CLI - API ##### Replace with partition in condition echo "50,Mace Windu,Jedi" > jedi.csv tb datasource replace characters jedi.csv --sql-condition "profession='Jedi'" Since the replace condition column matches the partition key, the result is: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Freplacing-example-2.jpeg&w=3840&q=75) However, consider what happens if you create the Data Source with `ENGINE_PARTITION_KEY "name"`: ##### characters.datasource SCHEMA > `age` Int16, `name` String, `profession` String ENGINE "MergeTree" ENGINE_SORTING_KEY "age, name, profession" ENGINE_PARTITION_KEY "name" If you were to run the same replace request, the result probably doesn't make sense: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Freplacing-example-3.jpeg&w=3840&q=75) Why were the existed rows not removed? Because the `replace` process uses the payload rows to identify which partitions to work on. The Data Source is now partitioned by name and not profession, so the process didn't delete the other "Jedi" rows. They're in different partitions because they have different names. The rule of thumb is this: **Always make sure the replace condition uses the partition key as the filter field**. ## Replace a Data Source completely¶ To replace a complete Data Source, make an API call similar to the previous example, without providing a `replace_condition`: - CLI - API ##### Replace Data Source completely tb datasource replace events https://storage.googleapis.com/tinybird-assets/datasets/guides/events_1M_november2019_1.csv The example request is replacing a Data Source with the data found in a given URL pointing to a CSV file. You can also replace a Data Source in the Tinybird UI: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Freplacing-and-deleting-data-2.png&w=3840&q=75) <-figcaption-> Replacing a Data Source completely through the User Interface Schemas must be identical. When replacing data either selectively or entirely, the schema of the new inbound data must match that of the original Data Source. Rows not containing the same schema go to quarantine. ## Next steps¶ - Learn how[ to get rows out of quarantine](https://www.tinybird.co/docs/docs/get-data-in/data-sources#the-quarantine-data-source) . - Need to[ iterate a Data Source, including the schema](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source) ? Read how here. --- URL: https://www.tinybird.co/docs/get-data-in/data-operations/scheduling-with-github-actions-and-cron Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Schedule data ingestion · Tinybird Docs" theme-color: "#171612" description: "Cronjobs are the universal way of scheduling tasks. In this guide, you'll learn how to keep your data in sync with cron jobs or GitHub Actions and the Tinybird API." --- # Schedule data ingestion with cron and GitHub Actions¶ Cronjobs are the universal way of scheduling tasks. In this guide, you'll learn how to keep your data in sync with cronjobs or GitHub actions and the Tinybird REST API. ## Overview¶ For this example, let's assume you've already imported a Data Source to your Tinybird account and that you have properly defined its schema and partition key. Once everything is set, you can easily perform some operations using the [Data Sources API](https://www.tinybird.co/docs/docs/api-reference/datasource-api) to **periodically append to or replace data** in your Data Sources. This guide shows you some examples. ## About crontab¶ Crontab is a native Unix tool that schedules command execution at a specified time or time interval. It works by defining the schedule, and the command to execute, in a text file. This can be achieved using `sudo crontab -e` . You can learn more about using crontab using many online resources like [crontab.guru](https://crontab.guru/crontab.5.html) and [the man page for crontab](https://man7.org/linux/man-pages/man5/crontab.5.html). ### The cron table format¶ Cron follows a table format like the following (note that you can also use [external tools like crontab.guru](https://crontab.guru/) to help you define the cron job schedule): ##### Cron syntax explanation * * * * * Command_to_execute | | | | | | | | | Day of the Week ( 0 - 6 ) ( Sunday = 0 ) | | | | | | | Month ( 1 - 12 ) | | | | | Day of Month ( 1 - 31 ) | | | Hour ( 0 - 23 ) | Min ( 0 - 59 ) Using this format, the following would be typical cron schedules to execute commands at different times: - Every five minutes: `0/5 \* \* \* \*` - Every day at midnight: `0 0 \* \* \*` - Every first day of month: `\* \* 1 \* \*` - Every Sunday at midnight: `0 0 \* \* 0` Be sure you save your scripts in the right location. Save your shell scripts in the `/opt/cronjobs/` folder. ## Append data periodically¶ It's very common to have a Data Source that grows over time. There is often is also an ETL process extracting this data from the transactional database and generating CSV files with the last X hours or days of data, therefore you might want to append those recently-generated rows to your Tinybird Data Source. For this example, imagine you generate new CSV files every day at 00:00 that you want to append to Tinybird everyday at 00:10. ### Option 1: With a shell script¶ First, you need to create a shell script file containing the Tinybird API request operation: ##### Contents of append.sh #!/bin/bash TOKEN=your_token CSV_URL="http://your_url.com" curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='append' \ -d name='events' \ https://api.tinybird.co/v0/datasources Then, add a new line to your crontab file (using `sudo crontab -e` ): 10 0 * * * sh -c /opt/cronjobs/append.sh ### Option 2: Using GitHub Actions¶ If your project is hosted on GitHub, you can also use GitHub Actions to schedule periodic jobs. Create a new file called `.github/workflows/append.yml` with the following code to append data from a CSV given its URL every day at 00:10. ##### Contents of .github/workflows/append.yml name: Append data data every day at 00:10 on: push: workflow_dispatch: schedule: - cron: '10 0 * * *' jobs: scheduled: runs-on: ubuntu-latest steps: - name: Check out this repo uses: actions/checkout@v2 - name: Append new data run: |- curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='append' \ -d name='events' \ https://api.tinybird.co/v0/datasources ## Replace data periodically¶ Let's use another example. With this new fictional Data Source, imagine a scenario where you want to replace the whole Data Source with a CSV file sitting in a publicly-accessible URL every first day of the month. ### Option 1: With a shell script¶ ##### Contents of replace.sh #!/bin/bash TOKEN=your_token CSV_URL="http://your_url.com" curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='replace' \ -d name='events' \ https://api.tinybird.co/v0/datasources Then edit the crontab file which takes care of periodically executing your script. Run `sudo crontab -e`: ##### Setting up a crontab to run a script periodically * * 1 * * sh -c /opt/cronjobs/replace.sh ### Option 2: With GitHub Actions¶ Create a new file called `.github/workflows/replace.yml` with the following code to replace all your data with given the URL of the CSV with the new data every day at 00:10. ##### Contents of .github/workflows/replace.yml name: Replace all data every day at 00:10 on: push: workflow_dispatch: schedule: - cron: '10 0 * * *' jobs: scheduled: runs-on: ubuntu-latest steps: - name: Check out this repo uses: actions/checkout@v2 - name: Replace all data run: |- curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='replace' \ -d name='events' \ https://api.tinybird.co/v0/datasources ## Replace just one month of data¶ Having your API call inside a shell script allows you to script more complex ingestion processes. For example, imagine you want to replace the last month of events data, every day. Then each day, you would export a CSV file to a publicly accessible URL and name it something like `events_YYYY-MM-DD.csv`. ### Option 1: With a shell script¶ You could script a process that would do a conditional data replacement as follows: ##### Script to replace data selectively on Tinybird #!/bin/bash TODAY=`date +"%Y-%m-%d"` ONE_MONTH_AGO=`date -v -1m +%Y-%m-%d` TOKEN=your_token DATASOURCE=events CSV_URL="http://your_url.com" curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='replace' \ -d replace_condition=(created_at+BETWEEN+'${ONE_MONTH_AGO}'+AND+'${TODAY}')" \ -d name=$DATASOURCE \ https://api.tinybird.co/v0/datasources Then, after saving that file to `/opt/cronjobs/daily_replace.sh` , add the following line to `crontab` to run it every day at midnight: ##### Setting up a crontab to run a script periodically 0 0 * * * sh -c /opt/cronjobs/daily_replace.sh ### Option 2: With GitHub Actions¶ Create a new file called `.github/workflows/replace_last_month.yml` with the following code to replace all the data for the last month every day at 00:10. ##### Contents of .github/workflows/replace.yml name: Replace all data every day at 00:10 on: push: workflow_dispatch: schedule: - cron: '10 0 * * *' jobs: scheduled: runs-on: ubuntu-latest steps: - name: Check out this repo uses: actions/checkout@v2 - name: Replace all data run: |- TODAY=`date +"%Y-%m-%d"` ONE_MONTH_AGO=`date -v -1m +%Y-%m-%d` DATASOURCE=events # could also be set via github secrets CSV_URL="http://your_url.com" # could also be set via github secrets curl \ -H "Authorization: Bearer $TOKEN" \ -X POST \ -d url=$CSV_URL \ -d mode='replace' \ -d replace_condition=(created_at+BETWEEN+'${ONE_MONTH_AGO}'+AND+'${TODAY}')" \ -d name=$DATASOURCE \ https://api.tinybird.co/v0/datasources Use GitHub secrets: Store `TOKEN` as an [encrypted secret](https://docs.github.com/en/actions/reference/encrypted-secrets) to avoid hardcoding secret keys in your repositories, and replace `DATASOURCE` and `CSV_URL` by their values or save them as secrets as well. ## Next steps¶ - Learn more about[ GitHub Actions and CI/CD processes on Tinybird](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/continuous-integration) . - Understand how to[ work with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) . --- URL: https://www.tinybird.co/docs/get-data-in/data-sources Last update: 2025-01-13T13:56:57.000Z Content: --- title: "Data Sources · Tinybird Docs" theme-color: "#171612" description: "Data Sources contain all the data you bring into Tinybird, acting like tables in a database." --- # Data Sources¶ When you get data into Tinybird, it's stored in a Data Source. You then write SQL queries to explore the data from the Data Source. Tinybird represents Data Sources using the icon. For example, if your event data lives in a Kafka topic, you can create a Data Source that connects directly to Kafka and writes the events to Tinybird. You can then [create a Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#creating-pipes-in-the-ui) to query fresh event data. A Data Source can also be the result of materializing a SQL query through a [Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#creating-pipes-in-the-ui). ## Create Data Sources¶ You can use Tinybird's UI, CLI, and API to create Data Sources. ### Using the UI ¶ Follow these steps to create a new Data Source: 1. In your Workspace, go to** Data Sources** . 2. Select** +** to add a new Data Source. ### Using the CLI ¶ You can create Data Source using the `tb datasource` command. See [tb datasource](https://www.tinybird.co/docs/docs/cli/command-ref#tb-datasource) in the CLI reference. ### Using the Events API ¶ If you send data to the Events API and the Data Source doesn't exist, the Events API creates a Data Source by guessing the types from the data you send. See [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api). You can still modify the Data Source using the alter options or adding a TTL as explained in the following section. ## Set the Data Source TTL¶ You can apply a TTL (Time To Live) to a Data Source in Tinybird. Use a TTL to define how long you want to store data. For example, you can define a TTL of 7 days, which means Tinybird automatically deletes data older than 7 days. You may set the TTL at the time of creating the Data Source, or set it later. Your data must have a column with a type that represents a date or datetime. Valid types are `Date` and `DateTime`. ### Using the UI ¶ Follow these steps to set a TTL using the Tinybird UI: 1. Select** Advanced Settings** . 2. Open the** TTL** menu. 3. Select a column that represents a date. 4. Define the TTL period in days. If you need to apply transformations to the date column, or want to use more complex logic, select the **Code editor** tab and enter SQL code to define your TTL. ### Using the CLI ¶ Follow these steps to set a TTL using the Tinybird CLI: 1. Create a new Data Source and .datasource file using the `tb datasource` command. 2. Edit the .datasource file you've created. 3. Go to the Engine settings. 4. Add a new setting called `ENGINE_TTL` and enter your TTL string enclosed in double quotes. 5. Save the file. The following example shows a .datasource file with TTL defined: SCHEMA > `date` DateTime, `product_id` String, `user_id` Int64, `event` String, `extra_data` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYear(date)" ENGINE_SORTING_KEY "date, user_id, event, extra_data" ENGINE_TTL "date + toIntervalDay(90)" ## Change Data Source TTL¶ You can modify the TTL of an existing Data Source, either by adding a new TTL or by updating an existing TTL. ### Using the UI ¶ Follow these steps to modify a TTL using the Tinybird UI: 1. Go to the Data Source details page by clicking on the Data Source with the TTL you wish to change. 2. Select the** Schema** tab. 3. Select the TTL text. 4. A dialog opens. Select the menu. 5. Select the field to use for the TTL. 6. Change the TTL interval. 7. Select** Save** . The updated TTL value appears in the Data Source's schema page. ### Using the CLI ¶ Follow these steps to modify a TTL using the Tinybird CLI: 1. Open the .datasource file. 2. Go to the Engine settings. 3. If `ENGINE_TTL` doesn't exist, add it and enter your TTL enclosed in double quotes. 4. If a TTL is already defined, modify the existing setting. The following is an example TTL setting: ENGINE_TTL "date + toIntervalDay(90)" When ready, save the .datasource file and push the changes to Tinybird using the CLI: tb push DATA_SOURCE_FILE -f ## Share a Data Source¶ Workspace administrators can share a Data Source with another Workspace they've access to on the same region and cluster. To share a Data Source, follow these steps: 1. Find the Data Source you want to share inside** Data Project** . 2. Select the** More actions (⋯)** icon next to Data Source. 3. Select** Share** . 4. Type the Workspace name or ID. 5. Select** Share** . You can use the shared Data Source to create Pipes and Materialized Views in the target Workspace. Users that have access to a shared Data Source can access the `tinybird.datasources_ops_log` and the `tinybird.kafka_ops_log` Service Data Sources. ### Limitations¶ The following limitations apply to shared Data Sources: - Shared Data Sources are read-only. - You can't share a shared Data Source, only the original. - You can't check the quarantine of a shared Data Source. ## Supported engines¶ Tinybird features different strategies to store data, which define where and how the data is stored and also what kind of data access, queries, and availability your data has. A Tinybird Data Source uses a table engine that determines those factors. See [Engines](https://www.tinybird.co/docs/docs/sql-reference/engines). ## Supported data types¶ Data types specify how Tinybird stores and processes values in a database. They determine what kind of data can fit in a column (like numbers, text, dates, etc.), how much storage space the data uses, and what operations you can perform on the values. Choosing the most appropriate data type is important for both data integrity and query performance. See [Data types](https://www.tinybird.co/docs/docs/sql-reference/data-types). ### Set a different codec¶ Tinybird applies compression codecs to data types to optimize performance. You can override the default compression codecs by adding the `CODEC()` statement after the type declarations in your .datasource schema. For example: SCHEMA > `product_id` Int32 `json:$.product_id`, `timestamp` DateTime64(3) `json:$.timestamp` CODEC(DoubleDelta, ZSTD(1)), ## Supported file types and compression formats for ingest¶ Tinybird supports these file types and compression formats at ingest time: | File type | Method | Accepted extensions | Compression formats supported | | --- | --- | --- | --- | | CSV | File upload, URL | `.csv` , `.csv.gz` | `gzip` | | NDJSON | File upload, URL, Events API | `.ndjson` , `.ndjson.gz` | `gzip` | | Parquet | File upload, URL | `.parquet` , `.parquet.gz` | `gzip` | | Avro | Kafka | | `gzip` | ## Quarantine Data Sources¶ Every Data Source you create in your Workspace has a quarantine Data Source associated that store data that doesn't fit the schema. If you send rows that don't fit the Data Source schema, they're automatically sent to the quarantine table so that the ingest process doesn't fail. By convention, quarantine Data Sources follow the naming pattern `{datasource_name}_quarantine` . You can review quarantined rows at any time or perform operations on them using Pipes. This is a useful source of information when fixing issues in the origin source or applying changes during ingest. The quarantine Data Source schema contains the columns of the original row and the following columns with information about the issues that caused the quarantine: - `c__error_column` Array(String) contains an array of all the columns that contain an invalid value. - `c__error` Array(String) contains an array of all the errors that caused the ingestion to fail and lead to store the values in quarantine. This column along the `c__error_column` allows you so easily identify which is the columns that has problems and which is the error. - `c__import_id` Nullable(String) contains the job's identifier in case the column was imported through a job. - `insertion_date` (DateTime) contains the timestamp in which the ingestion was done. See the [Quarantine guide](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine) for practical examples on using the quarantine Data Source. ## Partitioning¶ Use partitions for data manipulation. Partitioning isn't intended to speed up `SELECT` queries: experiment with more efficient sorting keys, as defined by `ENGINE_SORTING_KEY` , for that. A bad partition key, or creating too many partitions, can negatively impact query performance. Configure partitioning using the `ENGINE_PARTITION_KEY` setting. When choosing a partition key: - Leave the `ENGINE_PARTITION_KEY` key empty. If the table is small or you aren't sure what the best partition key should be, leave it empty: Tinybird places all data in a single partition. - Use a date column. Depending on the filter, you can opt for more or less granularity based on your needs. `toYYYYMM(date_column)` or `toYear(date_column)` are valid default choices. Don't use too granular a partition key, like a customer ID or name. This could lead to the `TOO_MANY_PARTS` error. ### Examples¶ The following examples show how to define partitions. ##### Using an empty tuple to create a single partition ENGINE_PARTITION_KEY "tuple()" ##### Using a Date column to create monthly partitions ENGINE_PARTITION_KEY "toYYYYMM(date_column)" ##### Using a column to partition by event types ENGINE_PARTITION_KEY "event_type % 8" ### TOO\_MANY\_PARTS error¶ Each insert operation creates a new part containing compressed data files and index files for each column. Tinybird merges smaller parts into bigger parts in the background based on specific rules. The goal is to maintain one large part, or a few large parts, per partition. The `TOO_MANY_PARTS` error happens when you insert data faster than Tinybird can merge the parts. Inserting data to many partitions at once multiplies the problem by the number of partitions. To prevent this error: - Batch your inserts into larger chunks instead of making many small inserts. - Limit the number of partitions you write to simultaneously . - Define a less granular partition key. If the error persists, contact [support](https://www.tinybird.co/docs/docs/get-started/plans/support). ## Upserts and deletes¶ See [this guide](https://www.tinybird.co/docs/docs/get-data-in/data-operations/replace-and-delete-data) . Depending on the frequency needed, you might want to convert upserts and deletes into an append operation that you can solve through [deduplication](https://www.tinybird.co/docs/docs/work-with-data/strategies/deduplication-strategies). ## Limits¶ There is a limit of 100 Data Sources per Workspace. --- URL: https://www.tinybird.co/docs/get-data-in/guides Content: --- title: "Ingest guides · Tinybird Docs" theme-color: "#171612" description: "Guides for ingesting data into Tinybird." --- # Ingest guides¶ Tinybird provides multiple ways to bring data into the platform. While [native connectors](https://www.tinybird.co/docs/docs/get-data-in/connectors) offer the most streamlined experience, you can use the [Ingest APIs](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis) and other mechanisms to bring data from virtually any source. Each guide provides step-by-step instructions and best practices for setting up reliable data ingestion pipelines. Whether you're working with batch files, streaming events, or database synchronization, you can find examples of how to effectively bring that data into Tinybird. - [ Auth0](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-auth0-logs) - [ AWS Kinesis](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-aws-kinesis) - [ Clerk](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-clerk) - [ CSV files](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-csv-files) - [ Dub](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-dub) - [ DynamoDB Single-Table](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-dynamodb-single-table-design) - [ Estuary](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-with-estuary) - [ GitHub](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-github) - [ GitLab](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-gitlab) - [ Google Cloud Storage](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-google-gcs) - [ Google Pub/Sub](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-google-pubsub) - [ HTTP Requests](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api) - [ Knock](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-knock) - [ Mailgun](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-mailgun) - [ MongoDB](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-mongodb) - [ NDJSON data](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data) - [ Orb](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-orb) - [ PagerDuty](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-pagerduty) - [ Postgres CDC with Redpanda Connect](https://www.tinybird.co/docs/docs/get-data-in/guides/postgres-cdc-with-redpanda-connect) - [ PostgreSQL](https://www.tinybird.co/docs/docs/get-data-in/guides/postgresql) - [ Python logs](https://www.tinybird.co/docs/docs/get-data-in/guides/python-sdk) - [ Resend](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-resend) - [ RudderStack](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-rudderstack) - [ Sentry](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-sentry) - [ Snowflake](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-snowflake-via-unloading) - [ Stripe](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-stripe) - [ Vercel (log drains)](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-vercel-logdrains) - [ Vercel (webhooks)](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-vercel) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-auth0-logs Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Auth0 Log Streams to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send Auth0 Log Streams to Tinybird using webhooks and the Events API." --- # Send Auth0 Logs Streams to Tinybird¶ [Auth0](https://auth0.com/) is a developer-focused user management platform to handle user authentication with many prebuilt UI components. By integrating Auth0 with Tinybird, you can analyze your user authentication data in real time and enrich it with other data sources. Some common use cases for sending Auth0 logs to Tinybird include: 1. Tracking net user and organization growth. 2. Monitoring user churn. 3. Identifying common auth errors. 4. Creating custom dashboards for auth analysis. 5. User authentication audit logs. Read on to learn how to send data from Auth0 Logs Streams to Tinybird. ## Before you start¶ Before you connect Auth0 Logs Streams to Tinybird, ensure: - You have an Auth0 account. - You have a Tinybird Workspace. ## Connect Auth0 to Tinybird¶ 1. From the Auth0 dashboard, select** Monitoring** >** Streams** . 2. Select** Create Stream** . 3. In Tinybird, create a Data Source, called `auth0` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/auth0.datasource) : SCHEMA > `event_time` DateTime64(3) `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.data.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Auth0 Logs Streams in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Auth0, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=auth0&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. Content Type is `application/json` and Content Format is `JSON Lines`. 1. Select the any event category to filter, like `All` , and a date in case you want to perform some backfilling. Then select** Save** . 2. You're done. Any of the Auth0 Log Streams events you selected is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . You can check the status of the integration from the **Health** tab in the created webhook or from the **Log** tab in the Tinybird `auth0` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Auth0 Logs Streams](https://auth0.com/docs/customize/log-streams/custom-log-streams) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-aws-kinesis Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Stream from AWS Kinesis · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to send data from AWS Kinesis to Tinybird." --- # Stream from AWS Kinesis¶ In this guide, you'll learn how to send data from AWS Kinesis to Tinybird. If you have a [Kinesis Data Stream](https://aws.amazon.com/kinesis/data-streams/) that you want to send to Tinybird, it should be pretty quick thanks to [Kinesis Firehose](https://aws.amazon.com/kinesis/data-firehose/) . This page explains how to integrate Kinesis with Tinybird using Firehose. ## 1. Push messages From Kinesis To Tinybird¶ ### Create a Token with the right scope¶ In your Workspace, create a Token with the `Create new Data Sources or append data to existing ones` scope: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fingest-from-aws-kinesis-1.png&w=3840&q=75) <-figcaption-> Create a Token with the right scope ### Create a new Data Stream¶ Start by creating a new Data Stream in AWS Kinesis (see the [AWS documentation](https://docs.aws.amazon.com/streams/latest/dev/working-with-streams.html) for more information). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fingest-from-aws-kinesis-2.png&w=3840&q=75) <-figcaption-> Create a Kinesis Data Stream ### Create a Firehose Delivery Stream¶ Next, [create a Kinesis Data Firehose Delivery Stream](https://docs.aws.amazon.com/firehose/latest/dev/basic-create.html). Set the **Source** to **Amazon Kinesis Data Streams** and the **Destination** to **HTTP Endpoint**. In the **Destination Settings** , set **HTTP Endpoint URL** to point to the [Tinybird Events API](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api). https://api.tinybird.co/v0/events?name=&wait=true&token= This example is for Workspaces in the `GCP` --> `europe-west3` region. If necessary, replace with the [correct region for your Workspace](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) . Additionally, note the `wait=true` parameter. Learn more about it [in the Events API docs](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api#wait-for-acknowledgement). You don't need to create the Data Source in advance; it will automatically be created for you. ### Send sample messages and check that they arrive to Tinybird¶ If you don't have an active data stream, follow [this python script](https://gist.github.com/GnzJgo/f1a80186a301cd8770a946d02343bafd) to generate dummy data. Back in Tinybird, you should see 3 columns filled with data in your Data Source. `timestamp` and `requestId` are self explanatory, and your messages are in `records\_\data`: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fingest-from-aws-kinesis-3.png&w=3840&q=75) <-figcaption-> Firehose Data Source ## 2. Decode message data¶ ### Decode message data¶ The `records\_\data` column contains an array of encoded messages. In order to get one row per each element of the array, use the ARRAY JOIN Clause. You'll also need to decode the messages with the base64Decode() function. Now that the raw JSON is in a column, you can use [JSONExtract functions](https://www.tinybird.co/docs/docs/sql-reference/functions/json-functions) to extract the desired fields: ##### Decoding messages NODE decode_messages SQL > SELECT base64Decode(encoded_m) message, fromUnixTimestamp64Milli(timestamp) kinesis_ts FROM firehose ARRAY JOIN records__data as encoded_m NODE extract_message_fields SQL > SELECT kinesis_ts, toDateTime64(JSONExtractString(message, 'datetime'), 3) datetime, JSONExtractString(message, 'event') event, JSONExtractString(message, 'product') product FROM decode_messages <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fingest-from-aws-kinesis-4.png&w=3840&q=75) <-figcaption-> Decoding messages ## Recommended settings¶ When configuring AWS Kinesis as a Data Source, use the following settings: - Set `wait=true` when calling the Events API. See[ the Events API docs](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api#wait-for-acknowledgement) for more information. - Set the buffer size lower than 10 Mb in Kinesis. - Set 128 shards as the maximum in Kinesis. ## Performance optimizations¶ It is highly recommended to persist the decoded and unrolled result in a different Data Source. You can do it with a [Materialized View](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) : A combination of a Pipe and a Data Source that leaves the transformed data into the destination Data Source as soon as new data arrives to the Firehose Data Source. Don't store what you won't need. In this example, some of the extra columns could be skipped. [Add a TTL](https://www.tinybird.co/docs/docs/get-data-in/data-sources#setting-data-source-ttl) to the Firehose Data Source to prevent keeping more data than you need. Another alternative is to create the Firehose Data Source with a Null Engine. This way, data ingested there can be transformed and fill the destination Data Source without being persisted in the Data Source with the Null Engine. ## Next steps¶ - Ingest from other sources - see the[ Overview page](https://www.tinybird.co/docs/docs/get-data-in) and explore. - Build your first[ Tinybird Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-clerk Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Clerk webhooks to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send data from Clerk to Tinybird." --- # Send Clerk webhooks to Tinybird¶ [Clerk](https://clerk.com/) is a developer-focused user management platform to handle user authentication with many prebuilt UI components. By integrating Clerk with Tinybird, you can analyze your user authentication data in real time and enrich it with other data sources. Some common use cases for sending Clerk webhooks to Tinybird include: 1. Tracking net user and organization growth. 2. Monitoring user churn. 3. Identifying common auth errors. 4. Creating custom dashboards for auth analysis. 5. Enriching other data sources with real-time auth metrics. Read on to learn how to send data from Clerk to Tinybird. ## Before you start¶ Before you connect Clerk to Tinybird, ensure: - You have a Clerk account. - You have a Tinybird Workspace. ## Connect Clerk to Tinybird¶ 1. From the Clerk UI, select** Configure** >** Webhooks** . 2. Select** Add Endpoint** . 3. In Tinybird, create a Data Source, called `clerk` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/clerk.datasource) : SCHEMA > `event_time` DateTime64(3) `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Clerk in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. Back in Clerk, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird, for example: https://https://api.tinybird.co.tinybird.co/v0/events?name=clerk Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Return to Tinybird, and copy a token with privileges to write to the Data Source you created. You can use the admin token or create one with the required scope. 2. Return to the Clerk Webhooks page, and update the URL to add a new search parameter `token` with the token you copied. The final URL looks like the following: https://europe-west3.tinybird.co/v0/events?name=clerk&token=p.eyXXXXX 1. Select the checkboxes for the Clerk events you want to send to Tinybird, and select** Create** . 2. You're done. Any of the Clerk events you selected is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . You can test the integration from the** Testing** tab in the Clerk Webhooks UI. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-csv-files Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest CSV files · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to ingest data into Tinybird using CSV (comma-separated values) files." --- # Ingest CSV files¶ CSV (comma-separated values) is one of the most widely used formats out there. However, it's used in different ways; some people don't use commas, and other people use escape values differently, or are unsure about using headers. The Tinybird platform is smart enough to handle many scenarios. If your data doesn't comply with format and syntax best practices, Tinybird will still aim to understand your file and ingest it, but following certain best practices can speed your CSV processing speed by up to 10x. ## Syntax best practices¶ By default, Tinybird processes your CSV file assuming the file follows the most common standard ( [RFC4180](https://datatracker.ietf.org/doc/html/rfc4180#section-2) ). Key points: - Separate values with commas. - Each record is a line (with CRLF as the line break). The last line may or may not have a line break. - First line as a header is optional (though not using one is faster in Tinybird.) - Double quotes are optional but using them means you can escape values (for example, if your content has commas or line breaks). Example: Instead of using the backslash `\` as an escape character, like this: 1234567890,0,0,0,0,2021-01-01 10:00:00,"{\"authorId\":\"123456\",\"handle\":\"aaa\"}" Use two double quotes: ##### More performant 1234567890,0,0,0,0,2021-01-01 10:00:00,"{""authorId"":""123456"",""handle"":""aaa""}" - Fields containing line breaks, double quotes, and commas should be enclosed in double quotes. - Double quotes can also be escaped by using another double quote (""aaa"",""b""""bb"",""ccc"") In addition to the previous points, it's also recommended to: 1. Format `DateTime` columns as `YYYY-MM-DD HH:MM:SS` and `Date` columns as `YYYY-MM-DD` . 2. Send the encoding in the `charset` part of the `content-type` header, if it's different to UTF-8. The expectation is UTF-8, so it should look like this `Content-Type: text/html; charset=utf-8` . 3. You can set values as `null` in different ways, for example,* ""[]""* ,* """"* (empty space),* N* and* "N"* . 4. If you use a delimiter other than a comma, explicitly define it with the API parameter* ``dialect_delimiter``.* 5. If you use an escape character other than a ", explicitly define it with the API parameter* ``dialect_escapechar``.* 6. If you have no option but to use a different line break character, explicitly define it with the API parameter `dialect_new_line` . For more information, check the [Data Sources API docs](https://www.tinybird.co/docs/docs/api-reference/datasource-api). ## Append data¶ Once the Data Source schema has been created, you can optimize your performance by not including the header. Just keep the data in the same order. However, if the header is included and it contains all the names present in the Data Source schema the ingestion will still work (even if the columns follow a different order to the initial creation). ## Next steps¶ - Got your schema sorted and ready to make some queries? Understand[ how to work with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) . - Learn how to[ monitor your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-dub Last update: 2025-01-14T18:33:52.000Z Content: --- title: "Send Dub webhooks to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send data from Dub to Tinybird." --- # Send Dub webhooks to Tinybird¶ With [Dub](https://dub.co/) , you can shorten any link and get powerful [conversion analytics](https://dub.co/analytics) . By integrating Dub with Tinybird, you can analyze your events and usage data in real time. Some common use cases for sending Dub webhooks to Tinybird include: 1. Tracking link clicks. 2. Monitoring link performance. 3. Analyzing user engagement patterns. 4. Creating custom dashboards for link performance. 5. Enriching other data sources with real-time link metrics. Read on to learn how to send data from Dub to Tinybird. ## Before you start¶ Before you connect Dub to Tinybird, ensure: - You have a Dub account. - You have a Tinybird Workspace. ## Connect Dub to Tinybird¶ 1. Open the Dub UI and go to the** Settings** >** Webhooks** page. 2. Select** Create Webhook** . 3. In Tinybird, create a Data Source, called `dub` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/dub.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.event` DEFAULT 'unknown', `event` JSON(max_dynamic_types=2, max_dynamic_paths=16) `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Dub in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Dub, paste the Events API URL as your webhook URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=dub&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select the checkboxes for the Dub events you want to send to Tinybird, and select** Create webhook** . 2. You're done. Dub will now push events to Tinybird via the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . - [ Dub webhooks docs](https://dub.co/docs/integrations/webhooks) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-dynamodb-single-table-design Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Working with DynamoDB Single-Table Design · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to work with data that follows DynamoDB Single-Table Design." --- # Working with DynamoDB Single-Table Design¶ Single-Table Design is a common pattern [recommended by AWS](https://aws.amazon.com/blogs/compute/creating-a-single-table-design-with-amazon-dynamodb/) in which different table schemas are stored in the same table. Single-table design makes it easier to support many-to-many relationships and avoid the need for JOINs, which DynamoDB doesn't support. Single-Table Design is a good pattern for DynamoDB, but it's not optimal for analytics. To achieve higher performance in Tinybird, normalize data from DynamoDB into multiple tables that support the access patterns of your analytical queries. The normalization process is achieved entirely within Tinybird by ingesting the raw DynamoDB data into a landing Data Source and then creating Materialized Views to extract items into separate tables. This guide assumes you're familiar with DynamoDB, Tinybird, creating DynamoDB Data Sources in Tinybird, and Materialized Views. ## Example DynamoDB Table¶ For example, if Tinybird metadata were stored in DynamoDB using Single-Table Design, the table might look like this: - ** Partition Key** : `Org#Org_name` , example values:** Org#AWS** or** Org#Tinybird** . - ** Sort Key** : `Item_type#Id` , example values:** USER#1** or** WS#2** . - ** Attributes** : the information stored for each kind of item, like user email or Workspace cores. ## Create the DynamoDB Data Source¶ Use the [DynamoDB Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) to ingest your DynamoDB table into a Data Source. Rather than defining all columns in this landing Data Source, set only the Partition Key (PK) and Sort Key (SK) columns. The rest of the attributes are stored in the `_record` column as JSON. You don't need to define the `_record` column in the schema, as it's created automatically. SCHEMA > `PK` String `json:$.Org#Org_name`, `SK` String `json:$.Item_type#Id` ENGINE "ReplacingMergeTree" ENGINE_SORTING_KEY "PK, SK" ENGINE_VER "_timestamp" ENGINE_IS_DELETED "_is_deleted" IMPORT_SERVICE 'dynamodb' IMPORT_CONNECTION_NAME IMPORT_TABLE_ARN IMPORT_EXPORT_BUCKET The following image shows how data looks. The DynamoDB Connector creates some additional rows, such as `_timestamp` , that aren't in the .datasource file: <-figure-> ![DynamoDB Table storing users and worskpaces information](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-ddb-std-2.png&w=3840&q=75) <-figcaption-> DynamoDB Table storing users and worskpaces information ## Use a Pipe to filter and extract items¶ Data is now be available in your landing Data Source. However, you need to use the `JSONExtract` function to access attributes from the `_record` column. To optimize performance, use [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) to extract and store item types in separate Data Sources with their own schemas. Create a Pipe, use the PK and SK columns as needed to filter for a particular item type, and parse the attributes from the JSON in `_record` column. The example table has User and Workspace items, requiring a total of two Materialized Views, one for each item type. <-figure-> ![Workspace Data Flow showing std connection, landing DS and users and workspaces Materialized Views](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-ddb-std-4.png&w=3840&q=75) <-figcaption-> Two Materialized Views from landing DS To extract the Workspace items, the Pipe uses the SK to filter for Workspace items, and parses the attributes from the JSON in `_record` column. For example: SELECT toLowCardinality(splitByChar('#', PK)[2]) org, toUInt32(splitByChar('#', SK)[2]) workspace_id, JSONExtractString(_record,'ws_name') ws_name, toUInt16(JSONExtractUInt(_record,'cores')) cores, JSONExtractUInt(_record,'storage_tb') storage_tb, _record, _old_record, _timestamp, _is_deleted FROM dynamodb_ds_std WHERE splitByChar('#', SK)[1] = 'WS' ## Create the Materialized Views¶ Create a Materialized View from the Pipe to store the extracted data in a new Data Source. The Materialized View must use the ReplacingMergeTree engine to handle the deduplication of rows, supporting updates and deletes from DynamoDB. Use the following engine settings and configure them as needed for your table: - `ENGINE "ReplacingMergeTree"` : the ReplacingMergeTree engine is used to deduplicate rows. - `ENGINE_SORTING_KEY "key1, key2"` : the columns used to identify unique items, can be one or more columns, typically the part of the PK and SK that isn't idetifying Item type. - `ENGINE_VER "_timestamp"` : the column used to identify the most recent row for each key. - `ENGINE_IS_DELETED "_is_deleted"` : the column used to identify if a row has been deleted. For example, the Materialized View for the Workspace items uses the following schema and engine settings: SCHEMA > `org` LowCardinality(String), `workspace_id` UInt32, `ws_name` String, `cores` UInt16, `storage_tb` UInt64, `_record` String, `_old_record` Nullable(String), `_timestamp` DateTime64(3), `_is_deleted` UInt8 ENGINE "ReplacingMergeTree" ENGINE_SORTING_KEY "org, workspace_id" ENGINE_VER "_timestamp" ENGINE_IS_DELETED "_is_deleted" Repeat the same process for each item type. <-figure-> ![Materialized View for extracting Users attributes](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-ddb-std-3.png&w=3840&q=75) <-figcaption-> Materialized View for extracting Users attributes You have now your Data Sources with the extracted columns ready to be queried. ## Review performance gains¶ This process offers significant performance gains over querying the landing Data Source. To demonstrate this, you can use a Playground to compare the performance of querying the raw data vs the extracted data. For the example table, the following queries aggregate the total number of users, workspaces, cores, and storage per organization using the unoptimized raw data and the optimized extracted data. The query over raw data took 335 ms, while the query over the extracted data took 144 ms, for a 2.3x improvement. NODE users_stats SQL > SELECT org, count() total_users FROM ddb_users_mv FINAL GROUP BY org NODE ws_stats SQL > SELECT org, count() total_workspaces, sum(cores) total_cores, sum(storage_tb) total_storage_tb FROM ddb_workspaces_mv FINAL GROUP BY org NODE users_stats_raw SQL > SELECT toLowCardinality(splitByChar('#', PK)[2]) org, count() total_users FROM dynamodb_ds_std FINAL WHERE splitByChar('#', SK)[1] = 'USER' GROUP BY org NODE ws_stats_raw SQL > SELECT toLowCardinality(splitByChar('#', PK)[2]) org, count() total_ws, sum(toUInt16(JSONExtractUInt(_record,'cores'))) total_cores, sum(JSONExtractUInt(_record,'storage_tb')) total_storage_tb FROM dynamodb_ds_std FINAL WHERE splitByChar('#', SK)[1] = 'WS' GROUP BY org NODE org_stats SQL > SELECT * FROM users_stats JOIN ws_stats using org NODE org_stats_raw SQL > SELECT * FROM users_stats_raw JOIN ws_stats_raw using org This is how the outcome looks in Tinybird: <-figure-> ![Comparison of same query](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-ddb-std-5.png&w=3840&q=75) <-figcaption-> Same info, faster and more efficient from Materialized Views --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-github Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send GitHub Events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send GitHub events to Tinybird using webhooks and the Events API." --- # Send GitHub events to Tinybird¶ [GitHub](https://github.com/) is a platform for building and deploying web applications. By integrating GitHub with Tinybird, you can analyze your GitHub events in real time and enrich it with other data sources. Some common use cases for sending GitHub events to Tinybird include: 1. Analyze GitHub issues and pull requests. 2. Analyze GitHub push events. 3. Analyze and monitor GitHub pipeline. 4. Analyze custom DORA metrics. All this allows you to build a more complete picture of your GitHub events and improve your DevOps processes. Read on to learn how to send events from GitHub to Tinybird. ## Before you start¶ Before you connect GitHub to Tinybird, ensure: - You have a GitHub account. - You have a Tinybird Workspace. ## Connect GitHub to Tinybird¶ GitHub provides a variety of webhooks (+70) that you can use to send events to Tinybird at organization, repository or application level. This guide covers the base case for sending GitHub events from a repository to Tinybird. 1. In GitHub, go to your repository** Settings** >** Webhooks** . 2. Select** Add webhook** . 3. Webhooks payloads vary depending on the event type. You can check here the list of[ GitHub events](https://docs.github.com/en/webhooks/webhook-events-and-payloads) . Select **Send me everything**. 1. In Tinybird, create a Data Source, called `github` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/github.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from GitHub in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in GitHub, paste the Events API URL in your Webhook URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=github&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** application/json** as the content type. 2. You're done. Check the status of the integration from the `Recent deliveries` in the GitHub webhooks panel or from the **Log** tab in the Tinybird `github` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ GitHub Webhook events and payloads](https://docs.github.com/en/webhooks/webhook-events-and-payloads) - [ GitHub Webhooks](https://docs.github.com/en/webhooks/using-webhooks/creating-webhooks) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-gitlab Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send GitLab Events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send GitLab events to Tinybird using webhooks and the Events API." --- # Send GitLab events to Tinybird¶ [GitLab](https://gitlab.com/) is a platform for building and deploying web applications. By integrating GitLab with Tinybird, you can analyze your GitLab events in real time and enrich it with other data sources. Some common use cases for sending GitLab events to Tinybird include: 1. Analyze GitLab issues and merge requests. 2. Analyze GitLab push events. 3. Analyze and monitor GitLab pipeline. 4. Analyze custom DORA metrics. All this allows you to build a more complete picture of your GitLab events and improve your DevOps processes. Read on to learn how to send events from GitLab to Tinybird. ## Before you start¶ Before you connect GitLab to Tinybird, ensure: - You have a GitLab account. - You have a Tinybird Workspace. ## Connect GitLab to Tinybird¶ 1. In GitLab, go to** Settings** >** Webhooks** . 2. Select** Add new webhook** . 3. Webhooks payloads vary depending on the event type. You can check here the list of[ GitLab events](https://docs.gitlab.com/ee/user/project/integrations/webhook_events.html) . Select **Issues Events**. 1. In Tinybird, create a Data Source, called `gitlab` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/gitlab.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.object_kind` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from GitLab in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in GitLab, paste the Events API URL in your Webhook URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=gitlab Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Add custom header** and add 'Authorization' as** Header name** and paste the token you created in Tinybird as** Header value** . Bearer 1. You're done. You can select** Test** to check if the webhook is working. Check the status of the integration from the **Log** tab in the Tinybird `gitlab` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ GitLab Webhooks](https://docs.gitlab.com/ee/user/project/integrations/webhook_events.html) - [ Tinybird Data Sources](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-google-gcs Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest from Google Cloud Storage · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to automatically synchronize all the CSV files in a Google GCS bucket to a Tinybird Data Source." --- # Ingest from Google Cloud Storage¶ In this guide, you'll learn how to automatically synchronize all the CSV files in a Google GCS bucket to a Tinybird Data Source. ## Prerequisites¶ This guide assumes you have familiarity with [Google GCS buckets](https://cloud.google.com/storage/docs/buckets) and the basics of [ingesting data into Tinybird](https://www.tinybird.co/docs/docs/get-data-in). ## Perform a one-off load¶ When building on Tinybird, people often want to load historical data that comes from another system (called 'seeding' or 'backfilling'). A very common pattern is exporting historical data by creating a dump of CSV files into a Google GCS bucket, then ingesting these CSV files into Tinybird. You can append these files to a Data Source in Tinybird using the Data Sources API. Let's assume you have a set of CSV files in your GCS bucket: ##### List of events files tinybird-assets/datasets/guides/events/events_0.csv tinybird-assets/datasets/guides/events/events_1.csv tinybird-assets/datasets/guides/events/events_10.csv tinybird-assets/datasets/guides/events/events_11.csv tinybird-assets/datasets/guides/events/events_12.csv tinybird-assets/datasets/guides/events/events_13.csv tinybird-assets/datasets/guides/events/events_14.csv tinybird-assets/datasets/guides/events/events_15.csv tinybird-assets/datasets/guides/events/events_16.csv tinybird-assets/datasets/guides/events/events_17.csv tinybird-assets/datasets/guides/events/events_18.csv tinybird-assets/datasets/guides/events/events_19.csv tinybird-assets/datasets/guides/events/events_2.csv tinybird-assets/datasets/guides/events/events_20.csv tinybird-assets/datasets/guides/events/events_21.csv tinybird-assets/datasets/guides/events/events_22.csv tinybird-assets/datasets/guides/events/events_23.csv tinybird-assets/datasets/guides/events/events_24.csv tinybird-assets/datasets/guides/events/events_25.csv tinybird-assets/datasets/guides/events/events_26.csv tinybird-assets/datasets/guides/events/events_27.csv tinybird-assets/datasets/guides/events/events_28.csv tinybird-assets/datasets/guides/events/events_29.csv tinybird-assets/datasets/guides/events/events_3.csv tinybird-assets/datasets/guides/events/events_30.csv tinybird-assets/datasets/guides/events/events_31.csv tinybird-assets/datasets/guides/events/events_32.csv tinybird-assets/datasets/guides/events/events_33.csv tinybird-assets/datasets/guides/events/events_34.csv tinybird-assets/datasets/guides/events/events_35.csv tinybird-assets/datasets/guides/events/events_36.csv tinybird-assets/datasets/guides/events/events_37.csv tinybird-assets/datasets/guides/events/events_38.csv tinybird-assets/datasets/guides/events/events_39.csv tinybird-assets/datasets/guides/events/events_4.csv tinybird-assets/datasets/guides/events/events_40.csv tinybird-assets/datasets/guides/events/events_41.csv tinybird-assets/datasets/guides/events/events_42.csv tinybird-assets/datasets/guides/events/events_43.csv tinybird-assets/datasets/guides/events/events_44.csv tinybird-assets/datasets/guides/events/events_45.csv tinybird-assets/datasets/guides/events/events_46.csv tinybird-assets/datasets/guides/events/events_47.csv tinybird-assets/datasets/guides/events/events_48.csv tinybird-assets/datasets/guides/events/events_49.csv tinybird-assets/datasets/guides/events/events_5.csv tinybird-assets/datasets/guides/events/events_6.csv tinybird-assets/datasets/guides/events/events_7.csv tinybird-assets/datasets/guides/events/events_8.csv tinybird-assets/datasets/guides/events/events_9.csv ### Ingest a single file¶ To ingest a single file, [generate a signed URL in GCP](https://cloud.google.com/storage/docs/access-control/signed-urls) , and send the URL to the Data Sources API using the `append` mode flag: ##### Example POST request with append mode flag curl -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=&mode=append" \ --data-urlencode "url=" ### Ingest multiple files¶ If you want to ingest multiple files, you probably don't want to manually write each cURL. Instead, create a script to iterate over the files in the bucket and generate the cURL commands automatically. The following script example requires the [gsutil tool](https://cloud.google.com/storage/docs/gsutil) and assumes you have already created your Tinybird Data Source. You can use the `gsutil` tool to list the files in the bucket, extract the name of the CSV file, and create a signed URL. Then, generate a cURL to send the signed URL to Tinybird. To avoid hitting [API rate limits](https://www.tinybird.co/docs/docs/api-reference#limits) you should delay 15 seconds between each request. Here's an example script in bash: ##### Ingest CSV files from a Google Cloud Storage Bucket to Tinybird TB_HOST= TB_TOKEN= BUCKET=gs:// DESTINATION_DATA_SOURCE= GOOGLE_APPLICATION_CREDENTIALS= REGION= for url in $(gsutil ls $BUCKET | grep csv) do echo $url SIGNED=`gsutil signurl -r $REGION $GOOGLE_APPLICATION_CREDENTIALS $url | tail -n 1 | python3 -c "import sys; print(sys.stdin.read().split('\t')[-1])"` curl -H "Authorization: Bearer $TB_TOKEN" \ -X POST "$TB_HOST/v0/datasources?name=$DESTINATION_DATA_SOURCE&mode=append" \ --data-urlencode "url=$SIGNED" echo sleep 15 done The script uses the following variables: - `TB_HOST` as the corresponding URL for[ your region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) . - `TB_TOKEN` as a Tinybird[ Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) with `DATASOURCE:CREATE` or `DATASOURCE:APPEND` scope. See the[ Tokens API](https://www.tinybird.co/docs/docs/api-reference/token-api) for more information. - `BUCKET` as the GCS URI of the bucket containing the events CSV files. - `DESTINATION_DATA_SOURCE` as the name of the Data Source in Tinybird, in this case `events` . - `GOOGLE_APPLICATION_CREDENTIALS` as the local path of a Google Cloud service account JSON file. - `REGION` as the Google Cloud region name. ## Automatically sync files with Google Cloud Functions¶ The previous scenario covered a one-off dump of CSV files in a bucket to Tinybird. A slightly more complex scenario is appending to a Tinybird Data Source each time a new CSV file is dropped into a GCS bucket, which can be done using Google Cloud Functions. That way you can have your ETL process exporting data from your Data Warehouse (such as Snowflake or BigQuery) or any other origin and you don't have to think about manually synchronizing those files to Tinybird. Imagine you have a GCS bucket named `gs://automatic-ingestion-poc/` and each time you put a CSV there you want to sync it automatically to an `events` Data Source previously created in Tinybird: 1. Clone this GitHub repository ( `gcs-cloud-function` ) . 2. Install and configure the `gcloud` command line tool. 3. Run `cp .env.yaml.sample .env.yaml` and set the `TB_HOST` , and `TB_TOKEN` variable 4. Run: ##### Syncing from GCS to Tinybird with Google Cloud Functions # set some environment variables before deploying PROJECT_NAME= SERVICE_ACCOUNT_NAME= BUCKET_NAME= REGION= TB_FUNCTION_NAME= # grant permissions to deploy the cloud function and read from storage to the service account gcloud projects add-iam-policy-binding $PROJECT_NAME --member serviceAccount:$SERVICE_ACCOUNT_NAME --role roles/storage.admin gcloud projects add-iam-policy-binding $PROJECT_NAME --member serviceAccount:$SERVICE_ACCOUNT_NAME --role roles/iam.serviceAccountTokenCreator gcloud projects add-iam-policy-binding $PROJECT_NAME --member serviceAccount:$SERVICE_ACCOUNT_NAME --role roles/editor # deploy the cloud function gcloud functions deploy $TB_FUNCTION_NAME \ --runtime python38 \ --trigger-resource $BUCKET_NAME \ --trigger-event google.storage.object.finalize \ --region $REGION \ --env-vars-file .env.yaml \ --service-account $SERVICE_ACCOUNT_NAME It deploys a Google Cloud Function with name `TB_FUNCTION_NAME` to your Google Cloud account, which listens for new files in the `BUCKET_NAME` provided (in this case `automatic-ingestion-poc` ), and automatically appends them to the Tinybird Data Source described by the `FILE_REGEXP` environment variable. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fsyncing-data-from-s3-or-gcs-buckets-3.png&w=3840&q=75) <-figcaption-> Cloud function to sync a GCS bucket to Tinybird Now you can drop CSV files into the configured bucket: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fsyncing-data-from-s3-or-gcs-buckets-4.gif&w=3840&q=75) <-figcaption-> Drop files to a GCS bucket and check the datasources_ops_log A recommended pattern is naming the CSV files in the format `datasourcename_YYYYMMDDHHMMSS.csv` so they are automatically appended to `datasourcename` in Tinybird. For instance, `events_20210125000000.csv` will be appended to the `events` Data Source. ## Next steps¶ - Got your schema sorted and ready to make some queries? Understand[ how to work with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) . - Learn how to[ monitor your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-google-pubsub Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest from Google Pub/Sub · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send data from Google Pub/Sub to Tinybird." --- # Stream from Google Pub/Sub¶ In this guide you'll learn how to send data from Google Pub/Sub to Tinybird. ## Overview¶ Tinybird is a Google Cloud partner & supports integrating with Google Cloud services. [Google Pub/Sub](https://cloud.google.com/pubsub) is often used as a messaging middleware that decouples event stream sources from the end destination. Pub/Sub streams are usually consumed by Google's DataFlow which can send events on to destinations such as BigQuery, BigTable, or Google Cloud Storage. This DataFlow pattern works with Tinybird too, however, Pub/Sub also has a feature called [Push subscriptions](https://cloud.google.com/pubsub/docs/push) which can forward messages directly from Pub/Sub to Tinybird. The following guide steps use the subscription approach. ## Push messages from Pub/Sub to Tinybird¶ ### 1. Create a Pub/Sub topic¶ Start by creating a topic in Google Pub/Sub following the [Google Pub/Sub documentation](https://cloud.google.com/pubsub/docs/admin#create_a_topic). ### 2. Create a push subscription¶ Next, [create a Push subscription in Pub/Sub](https://cloud.google.com/pubsub/docs/create-subscription#push_subscription). Set the **Delivery Type** to **Push**. In the **Endpoint URL** field, ue the following snippet (which uses the [Tinybird Events API](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api) ) and pass your own Token, which you can find in your Workspace > Tokens: ##### Endpoint URL https://api.tinybird.co/v0/events?wait=true&name=&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. If you are sending single-line JSON payload through Pubsub, tick the **Enable payload unwrapping** option to enable unwrapping. This means that data isn't base64 encoded before sending it to Tinybird. If you are sending any other format via Pubsub, leave this unchecked (you'll need to follow the decoding steps at the bottom of this guide). Set **Retry policy** to **Retry after exponential backoff delay** . Set the **Minimum backoff** to **1** and **Maximum backoff** to **60**. You don't need to create the Data Source in advance, it will automatically be created for you. This snippet also includes the `wait=true` parameter, which is explained in the [Events API docs](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api#wait-for-acknowledgement). ### 3. Send sample messages¶ Generate and send some sample messages to test your connection. If you don't have your own messages to test, use [this script](https://gist.github.com/alejandromav/dec8e092ef62d879e6821da06f6459c2). ### 4. Check the Data Source¶ Pub/sub will start to push data to Tinybird. Check the Tinybird UI to see that the Data Source has been created and events are arriving. ### (Optional) Decode the payload¶ If you enabled the **Enable payload unwrapping** option, there is nothing else to do. However, if you aren't sending single-line JSON payloads (NDJSON, JOSNL) through Pubsub, you'll need to continue to base64 encode data before sending it to Tinybird. When the data arrived in Tinybird, you can decode it using the `base64Decode` function, like this: SELECT message_message_id as message_id, message_publish_time, base64Decode(message_data) as message_data FROM events_demo ## Next steps¶ - Explore other Google <> Tinybird integrations like[ how to query Google Sheets with SQL](https://www.tinybird.co/blog-posts/query-google-sheets-with-sql-in-real-time) . - Ready to start querying your data? Make sure you're familiar with[ how to work with time](https://www.tinybird.co/docs/docs/work-with-data/query/guides/working-with-time) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-knock Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Knock Events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send Knock events to Tinybird using webhooks and the Events API." --- # Send Knock events to Tinybird¶ [Knock](https://knock.app/) is a platform for notifications and alerts, and it provides a way to send events to Tinybird using webhooks. Some common use cases for sending Knock events to Tinybird include: 1. Monitor Knock message events. 2. Run analytical workflows based on Knock events. 3. Create custom dashboards based on Knock events. 4. Create alerts and notifications based on Knock events. 5. Join Knock message events with other Data Sources to enrich your user data. Read on to learn how to send events from Knock to Tinybird. ## Before you start¶ Before you connect Knock to Tinybird, ensure: - You have a Knock account. - You have a Tinybird Workspace. ## Connect Knock to Tinybird¶ Knock provides a variety of [webhook event types](https://docs.knock.app/developer-tools/outbound-webhooks/event-types#message-events) that you can use to send events to Tinybird. This guide covers the base case for sending Knock Message events to Tinybird. 1. In Knock, go to your repository** Developers** >** Webhooks** . 2. Select** Create webhook** . 3. Webhooks payloads vary depending on the event type. You can check here the list of[ Knock events](https://docs.knock.app/developer-tools/outbound-webhooks/event-types#message-events) . For this guide, select events related to `message`. 1. In Tinybird, create a Data Source, called `knock` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/knock.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Knock in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Knock, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=knock&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Save webhook** . 2. You're done. Check the status of the integration from the `Logs` tab in the Knock webhook or from the **Log** tab in the Tinybird `knock` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Knock Webhooks](https://docs.knock.app/developer-tools/outbound-webhooks/overview) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-mailgun Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Mailgun Events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send Mailgun events to Tinybird using webhooks and the Events API." --- # Send Mailgun events to Tinybird¶ [Mailgun](https://www.mailgun.com/) is a platform for sending email, and it provides a way to send events to Tinybird using webhooks. Read on to learn how to send events from Mailgun to Tinybird. ## Before you start¶ Before you connect Mailgun to Tinybird, ensure: - You have a Mailgun account. - You have a Tinybird Workspace. ## Connect Mailgun to Tinybird¶ Mailgun provides a variety of [webhook event types](https://mailgun-docs.redoc.ly/docs/mailgun/user-manual/events/#event-structure) that you can use to send events to Tinybird. This guide covers the base case for sending Mailgun events to Tinybird. 1. In Mailgun, go to** Send** >** Sending** >** Webhooks** . 2. Select** Domain** and** Add webhook** . 3. In Tinybird, create a Data Source, called `mailgun` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/mailgun.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.event-data.event` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Mailgun in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Mailgun, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=mailgun&format=json&token= Make sure to use the `format=json` query parameter. Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Event type** and choose the event you want to send to Tinybird. You can use the same Tinybird Data Source for multiple events. 2. Select** Create webhook** abd you're done. Check the status of the integration from the **Log** tab in the Tinybird `mailgun` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Mailgun Events](https://mailgun-docs.redoc.ly/docs/mailgun/user-manual/events/#event-structure) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-mongodb Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest data from MongoDB · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to ingest data into Tinybird from MongoDB." --- # Connect MongoDB to Tinybird¶ In this guide, you'll learn how to ingest data into Tinybird from MongoDB. You'll use: - MongoDB Atlas as the source MongoDB database. - Confluent Cloud's MongoDB Atlas Source connector to capture change events from MongoDB Atlas and push to Kafka - Tinybird Confluent Cloud connector to ingest the data from Kafka This guide uses Confluent Cloud as a managed Kafka service, and MongoDB Atlas as a managed MongoDB service. You can use any Kafka service and MongoDB instance, but the setup steps may vary. ## Prerequisites¶ This guide assumes you have: - An existing Tinybird account & Workspace - An existing Confluent Cloud account - An existing MongoDB Atlas account & collection ## 1. Create Confluent Cloud MongoDB Atlas Source¶ [Create a new MongoDB Atlas Source in Confluent Cloud](https://docs.confluent.io/cloud/current/connectors/cc-mongo-db-source.html#get-started-with-the-mongodb-atlas-source-connector-for-ccloud) . Use the following template to configure the Source: { "name": "", "config": { "name": "", "connection.host": "", "connection.user": "", "connection.password": "", "database": "", "collection": "", "cloud.provider": "", "cloud.environment": "", "kafka.region": "", "kafka.auth.mode": "KAFKA_API_KEY", "kafka.api.key": "", "kafka.api.secret": "", "kafka.endpoint": "", "topic.prefix": "", "errors.deadletterqueue.topic.name": "", "startup.mode": "copy_existing", "copy.existing": "true", "copy.existing.max.threads": "1", "copy.existing.queue.size": "16000", "poll.await.time.ms": "5000", "poll.max.batch.size": "1000", "heartbeat.interval.ms": "10000", "errors.tolerance": "all", "max.batch.size": "100", "connector.class": "MongoDbAtlasSource", "output.data.format": "JSON", "output.json.format": "SimplifiedJson", "json.output.decimal.format": "NUMERIC", "change.stream.full.document": "updateLookup", "change.stream.full.document.before.change": "whenAvailable", "tasks.max": "1" } } When the Source is created, you should see a new Kafka topic in your Confluent Cloud account. This topic will contain the change events from your MongoDB collection. ## 2. Create Tinybird Data Source (CLI)¶ Using the Tinybird CLI, create a new Kafka connection `tb connection create kafka` The CLI will prompt you to enter the connection details to your Kafka service. You'll also provide a name for the connection, which is used by Tinybird to reference the connection, and you'll need it below. Next, create a new file called `kafka_ds.datasource` (you can use any name you want, just use the .datasource extension). Add the following content to the file: SCHEMA > `_id` String `json:$.documentKey._id` DEFAULT JSONExtractString(__value, '_id._id'), `operation_type` LowCardinality(String) `json:$.operationType`, `database` LowCardinality(String) `json:$.ns.db`, `collection` LowCardinality(String) `json:$.ns.coll` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(__timestamp)" ENGINE_SORTING_KEY "__timestamp, _id" KAFKA_CONNECTION_NAME '' KAFKA_TOPIC '' KAFKA_GROUP_ID '' KAFKA_AUTO_OFFSET_RESET 'earliest' KAFKA_STORE_RAW_VALUE 'True' KAFKA_STORE_HEADERS 'False' KAFKA_STORE_BINARY_HEADERS 'True' KAFKA_TARGET_PARTITIONS 'auto' KAFKA_KEY_AVRO_DESERIALIZATION '' Now push the Data Source to Tinybird using: tb push kafka_ds.datasource ## 3. Validate the Data Source¶ Go to the Tinybird UI and validate that a Data Source has been created. As changes occur in MongoDB, you should see the data being ingested into Tinybird. Note that this is an append log of all changes, so you will see multiple records for the same document as it's updated. ## 4. Deduplicate with ReplacingMergeTree¶ To deduplicate the data, you can use a `ReplacingMergeTree` engine on a Materialized View. This is explained in more detail in the [deduplication guide](https://www.tinybird.co/docs/docs/work-with-data/strategies/deduplication-strategies#use-the-replacingmergetree-engine). Tinybird creates a new Data Source using the ReplacingMergeTree engine to store the deduplicated data, and a Pipe to process the data from the original Data Source and write to the new Data Source. First, create a new Data Source to store the deduplicated data. Create a new file called `deduped_ds.datasource` and add the following content: SCHEMA > `fullDocument` String, `_id` String, `database` LowCardinality(String), `collection` LowCardinality(String), `k_timestamp` DateTime, `is_deleted` UInt8 ENGINE "ReplacingMergeTree" ENGINE_SORTING_KEY "_id" ENGINE_VER "k_timestamp" ENGINE_IS_DELETED "is_deleted" Now push the Data Source to Tinybird using: tb push deduped_ds.datasource Then, create a new file called `dedupe_mongo.pipe` and add the following content: NODE mv SQL > SELECT JSONExtractRaw(__value, 'fullDocument') as fullDocument, _id, database, collection, __timestamp as k_timestamp, if(operation_type = 'delete', 1, 0) as is_deleted FROM TYPE materialized DATASOURCE Now push the Pipe to Tinybird using: tb push dedupe_mongo.pipe As new data arrives via Kafka, it will be processed automatically through the Materialized View, writing it into the `ReplacingMergeTree` Data Source. Query this new Data Source to access the deduplicated data: SELECT * FROM deduped_ds FINAL --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-orb Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Orb events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send Orb events to Tinybird using webhooks and the Events API." --- # Send Orb events to Tinybird¶ [Orb](https://withorb.com/) is a developer-focused platform to manage your subscription billing and revenue operations. By integrating Orb with Tinybird, you can analyze your subscription billing data in real time and enrich it with other data sources. Some common use cases for sending Orb events to Tinybird include: 1. Tracking and monitoring subscriptions. 2. Monitoring user churn. 3. Creating custom dashboards for subscription analysis. 4. Subscriptions logs. Read on to learn how to send events from Orb to Tinybird. ## Before you start¶ Before you connect Orb to Tinybird, ensure: - You have an Orb account. - You have a Tinybird Workspace. ## Connect Orb to Tinybird¶ 1. From the Orb dashboard, select** Developers** >** Webhooks** . 2. Select** Add Endpoint** . 3. In Tinybird, create a Data Source, called `orb` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/orb.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Orb in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Orb, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=orb&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Send test request** to test the connection and check the data gets to the `orb` Data Source in Tinybird. 2. You're done. Any of the Orb events is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . You can check the status of the integration by clicking on the Webhook endpoint in Orb or from the **Log** tab in the Tinybird `orb` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Orb webhooks](https://docs.withorb.com/guides/integrations-and-exports/webhooks) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-pagerduty Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send PagerDuty events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send PagerDuty events to Tinybird using webhooks and the Events API." --- # Send PagerDuty events to Tinybird¶ [PagerDuty](https://www.pagerduty.com/) is a platform for incident management and alerting. By integrating PagerDuty with Tinybird, you can analyze your incident data in real time and enrich it with other data sources. Some common use cases for sending PagerDuty events to Tinybird include: 1. Monitoring and alerting on incidents. 2. Creating custom dashboards for incident analysis. 3. Incident logs. Read on to learn how to send events from PagerDuty to Tinybird. ## Before you start¶ Before you connect PagerDuty to Tinybird, ensure: - You have an PagerDuty account. - You have a Tinybird Workspace. ## Connect PagerDuty to Tinybird¶ 1. From the PagerDuty dashboard, select** Integrations** >** Developer Tools** >** Webhooks** . 2. Select** New Webhook** . 3. In Tinybird, create a Data Source, called `pagerduty` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/pagerduty.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.event.event_type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from PagerDuty in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in PagerDuty, paste the Events API URL in your Webhook URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=pagerduty Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Add custom header** and add 'Authorization' as** Name** and paste the token you created in Tinybird as** Value** . Bearer 1. Select all event subcriptions and** Add webhook** 2. You're done. Any of the PagerDuty events is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . You can check the status of the integration by testing the Webhook integration in PagerDuty or from the **Log** tab in the Tinybird `pagerduty` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ PagerDuty webhooks](https://support.pagerduty.com/main/docs/webhooks) - [ PagerDuty webhook payload](https://developer.pagerduty.com/docs/webhooks-overview#webhook-payload) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-resend Last update: 2025-01-13T13:37:38.000Z Content: --- title: "Send Resend webhooks to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send data from Resend to Tinybird." --- # Send Resend webhooks to Tinybird¶ With [Resend](https://resend.com/) you can send and receive emails programmatically. By integrating Resend with Tinybird, you can analyze your email data in real time. Some common use cases for sending Resend webhooks to Tinybird include: 1. Tracking email opens and clicks. 2. Monitoring delivery rates and bounces. 3. Analyzing user engagement patterns. 4. Creating custom dashboards for email performance. 5. Enriching other data sources with real-time email metrics. Read on to learn how to send data from Resend to Tinybird. ## Before you start¶ Before you connect Resend to Tinybird, ensure: - You have a Resend account. - You have a Tinybird Workspace. ## Connect Resend to Tinybird¶ 1. Open the Resend UI and go to the Webhooks page. 2. Select** Add Webhook** . 3. In Tinybird, create a Data Source, called `resend` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/resend.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Resend in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Resend, paste the Events API URL in your Webhook URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=resend&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select the checkboxes for the Resend events you want to send to Tinybird, and select** Add** . 2. You're done. Sending emails to Resend will now push events to Tinybird via the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . - [ Resend event types](https://resend.com/docs/dashboard/webhooks/event-types) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-rudderstack Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Stream from RudderStack · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn two different methods to send events from RudderStack to Tinybird." --- # Stream from RudderStack¶ In this guide, you'll learn two different methods to send events from RudderStack to Tinybird. To better understand the behavior of their customers, companies need to unify timestamped data coming from a wide variety of products and platforms. Typical events to track would be 'sign up', 'login', 'page view' or 'item purchased'. A customer data platform can be used to capture complete customer data like this from wherever your customers interact with your brand. It defines events, collects them from different platforms and products, and routes them to where they need to be consumed. [RudderStack](https://www.rudderstack.com/) is an open-source customer data pipeline tool. It collects, processes and routes data from your websites, apps, cloud tools, and data warehouse. By using Tinybird's event ingestion endpoint for [high-frequency ingestion](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api) as a Webhook in RudderStack, you can stream customer data in real time to Data Sources. ## Option 1: A separate Data Source for each event type¶ This is the preferred approach. It sends each type of event to a corresponding Data Source. This [2-minute video](https://www.youtube.com/watch?v=z3TkPvo5CRQ) shows you how to set up high-frequency ingestion through RudderStack using these steps. The advantages of this method are: - Your data is well organized from the start. - Different event types can have different attributes (columns in their Data Source). - Whenever new attributes are added to an event type you will be prompted to add new columns. - New event types will get a new Data Source. Start by generating a Token in the UI to allow RudderStack to write to Tinybird. ### Create a Tinybird Token¶ Go to the Workspace in Tinybird where you want to receive data and select "Tokens" in the side panel. Create a new Token by selecting "Create Token" (top right). Give your Token a descriptive name. In the section "DATA SOURCES SCOPES" mark the "Data Sources management" checkbox (Enabled) to give your Token permission to create Data Sources. Select "Save changes". ### Create a RudderStack Destination¶ In RudderStack, Select "Destinations" in the side panel and then "New destination" (top right). Select Webhook: 1. Give the destination a descriptive name. 2. Connect your source(s), you can test with the Rudderstack Sample HTTP Source. 3. Input the following Connection Settings: - Webhook URL:* < https://api.tinybird.co /v0/events>* - URL Method:* POST* - Headers Key:* Authorization* - Headers Value:* Bearer TINYBIRD_AUTH_TOKEN* <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fstreaming-via-rudderstack-1.png&w=3840&q=75) <-figcaption-> Webhook connection settings for high-frequency ingestion On the next page, select "Create new transformation". You can code a function in the box to apply to events when this transformation is active using the example snippet below (feel free to update it to suit your needs). In this function, you can dynamically append the target Data Source to the target URL of the Webhook. Give your transformation a descriptive name and a helpful description. ##### Transformation code export function transformEvent(event, metadata){ event.appendPath="?name=rudderstack_"+event.event.toLowerCase().replace(/[\s\.]/g, '_') return event; } This example snippet uses the prefix `*rudderstack\_*` followed by the name of the event in lower case, with its words separated by an underscore (for instance, a "Product purchased" event would go to a Data Source named `rudderstack_product_purchased` ). Save the transformation. Your destination has been created successfully! ### Test Ingestion¶ In Rudderstack, select Sources --> Rudderstack Sample HTTP --> Live events (top right) --> "Send test event" and paste the provided curl command into your terminal. The event will appear on the screen and be sent to Tinybird. If, after sending some events through RudderStack, you see that your Data Source in Tinybird exists but is empty (0 rows after sending a few events), you will need to authorize the Token that you created to **append** data to the Data Source. In the UI, navigate to "Tokens", select the Token you created, select "Data Sources management" --> "Add Data Source scope", and choose the name of the Data Source that you want to write to. Mark the "Append" checkbox and save the changes. ## Option 2: All events in the same Data Source¶ This alternative approach consists of sending all events into a single Data Source and then splitting them using Tinybird. By pre-configuring the Data Source, any events that RudderStack sends will be ingested with the JSON object in full as a String in a single column. This is very useful when you have complex JSON objects as explained in the [ingesting NDJSON docs](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) but be aware that using JSONExtract to parse data from the JSON object after ingestion has an impact on performance. New columns from parsing the data will be detected and you will be asked if you want to save them. You can adjust the inferred data types before saving any new columns. Pipes can be used to filter the Data Source by different events. The following example assumes you have already installed the Tinybird CLI. If you're not familiar with how to use or install it, [read the CLI docs](https://www.tinybird.co/docs/docs/cli/install). ### Pre-configure a Data Source¶ Authenticate to your Workspace by typing **tb auth** and entering your Token for the Workspace into which you want to ingest data from RudderStack. Create a new file in your local Workspace, named `rudderstack_events.datasource` , for example, to configure the empty Data Source. ##### Data Source schema SCHEMA > 'value' String 'json:$' ENGINE "MergeTree" ENGINE_SORTING_KEY "value" Push the file to your Workspace using `tb push rudderstack_events.datasource`. Note that this pre-configured Data Source is only required if you need a column containing the JSON object in full as a String. Otherwise, just skip this step and let Tinybird infer the columns and data types when you send the first event. You will then be able to select which columns you wish to save and adjust their data types. Create the Token as in method 1. ### Create a Tinybird Token¶ Go to the Workspace in Tinybird where you want to receive data and select "Tokens" in the side panel. Create a new Token by selecting "Create Token" (top right). Give your Token a descriptive name. In the section "DATA SOURCES SCOPES", select "Add Data Source scope", choose the name of the Data Source that you just created, and mark the "Append" checkbox. Select "Save changes". ### Create a RudderStack Destination¶ In RudderStack, Select "Destinations" in the side panel and then "New destination" (top right). Select Webhook: 1. Give the destination a descriptive name. 2. Connect your source(s), you can test with the Rudderstack Sample HTTP Source. 3. Input the following Connection Settings: - Webhook URL:* < https://api.tinybird.co /v0/events?name=rudderstack_events>* - URL Method:* POST* - Headers Key:* Authorization* - Headers Value:* Bearer TINYBIRD_AUTH_TOKEN* <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fstreaming-via-rudderstack-3.png&w=3840&q=75) <-figcaption-> Webhook connection settings with Data Source name Select 'No transformation needed' and save. Your destination has been created successfully! ### Test Ingestion¶ Select Sources --> Rudderstack Sample HTTP --> "Live events" (top right) --> "Send test event" and paste the provided curl command into your terminal. The event will appear on the screen and be sent to Tinybird. The `value` column contains the full JSON object. You will also have the option of having the data parsed into columns. When viewing the new columns you can select which ones to save and adjust their data types. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fstreaming-via-rudderstack-4.png&w=3840&q=75) <-figcaption-> New columns detected not in schema Whenever new columns are detected in the stream of events you will be asked if you want to save them. ## Next steps¶ - Need to[ iterate a Data Source, including the schema](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source) ? Read how here. - Want to schedule your data ingestion? Read the docs on[ cron and GitHub Actions](https://www.tinybird.co/docs/docs/get-data-in/data-operations/scheduling-with-github-actions-and-cron) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-sentry Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Sentry Webhooks to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send Sentry events to Tinybird using webhooks and the Events API." --- # Send Sentry events to Tinybird¶ [Sentry](https://sentry.io/) is a platform for monitoring and alerting on errors in your applications. By integrating Sentry with Tinybird, you can analyze your Sentry events in real time and enrich it with other data sources. Some common use cases for sending Sentry events to Tinybird include: 1. Analyze errors from your applications. 2. Detect patterns in your error data. 3. Build an alert system based on error patterns. 4. Build custom analytical dashboards. Read on to learn how to send logs from Sentry to Tinybird. ## Before you start¶ Before you connect Sentry to Tinybird, ensure: - You have a Sentry account. - You have a Tinybird Workspace. ## Connect Sentry to Tinybird¶ 1. In Sentry, go to** Settings** >** Developer Settings** >** Custom Integrations** . 2. Select** Create New Integration** . 3. In Tinybird, create a Data Source, called `sentry` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/sentry.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.action` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Sentry in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Sentry, paste the Events API URL in your Custom Integration. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=sentry&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Alert Rule Action** . 2. In the** Permissions** box** Issue and Event** >** Read** . 3. Check all webhooks and** Save Changes** . 4. If you also want to send alerts to Tinybird, select** Alerts** from the left menu, click on an alert and select** Edit Rule** . You can select** Send Notifications via** your previously created Custom Integration. 5. You can then select** Send Test Notification** to check the connection. 6. You're done. Any of the Sentry events you selected are automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . Check the status of the integration from the **Log** tab in the Tinybird `sentry` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Sentry Webhooks](https://docs.sentry.io/organization/integrations/integration-platform/webhooks/) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-snowflake-via-unloading Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest from Snowflake via unloading · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send data from Snowflake to Tinybird via unloading." --- # Ingest from Snowflake via unloading¶ In this guide you'll learn how to send data from Snowflake to Tinybird, for scenarios where the [native connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/snowflake) can't be used —things outside a one-off load or periodical full replaces of the table, or where [limits](https://www.tinybird.co/docs/docs/get-data-in/connectors/snowflake#limits) apply—. This process relies on [unloading](https://docs.snowflake.com/en/user-guide/data-unload-overview) (aka bulk export) data as gzipped CSVs and then ingesting via [Data Sources API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/datasource-api). This guide explains the process using Azure Blob Storage, but it's easy to replicate using Amazon S3, Google Cloud Storage, or any storage service where you can unload data from Snowflake and share presigned URLs to access the files. This guide is a walkthrough of the most common, basic process: Unload the table from Snowflake, then ingest this export into Tinybird. ## Prerequisites¶ This guide assumes you have a Tinybird account, you are familiar with creating a Tinybird Workspace and pushing resources to it. You will also need access to Snowflake, and permissions to create SAS Tokens for Azure Blob Storage or its equivalents in AWS S3 and Google Cloud Storage. ## 1. Unload the Snowflake table¶ Snowflake makes it really easy to [unload](https://docs.snowflake.com/en/user-guide/data-unload-overview) query results to flat files to and external storage service. COPY INTO 'azure://myaccount.blob.core.windows.net/unload/' FROM mytable CREDENTIALS = ( AZURE_SAS_TOKEN='****' ) FILE_FORMAT = ( TYPE = CSV COMPRESSION = GZIP ) HEADER = FALSE; The most basic implementation is [unloading directly](https://docs.snowflake.com/en/sql-reference/sql/copy-into-location#unloading-data-from-a-table-directly-to-files-in-an-external-location) , but for production use cases consider adding a [named stage](https://docs.snowflake.com/en/user-guide/data-unload-azure#unloading-data-into-an-external-stage) as suggested in the docs. Stages will give you more fine-grained control to grant access rights. ## 2. Create a SAS token for the file¶ Using [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli) , generate a [shared access signature (SAS) token](https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=blobs) so Tinybird can read the file: az storage blob generate-sas \ --account-name myaccount \ --account-key '****' \ --container-name unload \ --name data.csv.gz \ --permissions r \ --expiry \ --https-only \ --output tsv \ --full-uri > 'https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D' Use the same behavior in S3 and GCS to generate pre-signed URLs. ## 3. Ingest into Tinybird¶ Take that generated URL and make a call to Tinybird. You'll need a [Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#tokens) with `DATASOURCES:CREATE` permissions: curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \ -d url='https://myaccount.blob.core.windows.net/unload/data.csv.gz?se=2024-05-31T10%3A57%3A41Z&sp=r&spr=https&sv=2022-11-02&sr=b&sig=PMC%2E9ZvOFtKATczsBQgFSsH1%2BNkuJvO9dDPkTpxXH0g%5D' You should now have your Snowflake Table in Tinybird. ## Automation¶ To adapt to more "real-life" scenarios (like having to append data on a timely basis, replace data that has been updated in Snowflake, etc.), you may need to define scheduled actions to move the data. You can see examples in the [Ingest from Google Cloud Storage guide](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-google-gcs#automatically-sync-files-with-google-cloud-functions) and in [Schedule data ingestion with cron and GitHub Actions guide](https://www.tinybird.co/docs/docs/get-data-in/data-operations/scheduling-with-github-actions-and-cron). ## Limits¶ You will be using Data Sources API, so its [limits](https://www.tinybird.co/docs/docs/api-reference#limits) apply: | Description | Limit | | --- | --- | | Append/Replace data to Data Source | 5 times per minute | | Max file size (uncompressed) | Free plan 10GB | | Max file size (uncompressed) | pro and enterprise 32GB | As a result of these limits, you may need to adjust your [COPY INTO ](https://docs.snowflake.com/en/sql-reference/sql/copy-into-location) expression adding `PARTITION` or `MAX_FILE_SIZE = 5000000000`. COPY INTO 'azure://myaccount.blob.core.windows.net/unload/' FROM mytable CREDENTIALS=( AZURE_SAS_TOKEN='****') FILE_FORMAT = ( TYPE = CSV COMPRESSION = GZIP ) HEADER = FALSE MAX_FILE_SIZE = 5000000000; ## Next steps¶ These resources may be useful: - [ Tinybird Snowflake Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/snowflake) - [ Tinybird S3 Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/s3) - [ Guide: Ingest from Google Cloud Storage](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-google-gcs) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-stripe Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Stripe Events to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send Stripe events to Tinybird using webhooks and the Events API." --- # Send Stripe events to Tinybird¶ [Stripe](https://stripe.com/) is a platform for payments and financial services, and it provides a way to send events to Tinybird using webhooks. Some common use cases for sending Stripe events to Tinybird include: 1. Monitor Stripe events. 2. Run analytical workflows based on Stripe events. 3. Create custom dashboards based on Stripe events. 4. Create alerts and notifications based on Stripe events. 5. Join Stripe events with other Data Sources to enrich your user data. Read on to learn how to send events from Stripe to Tinybird. ## Before you start¶ Before you connect Stripe to Tinybird, ensure: - You have a Stripe account. - You have a Tinybird Workspace. ## Connect Stripe to Tinybird¶ Stripe provides a variety of [webhook event types](https://docs.stripe.com/api/events/object) that you can use to send events to Tinybird. This guide covers the base case for sending Stripe events to Tinybird. 1. In Stripe, go to[ Webhooks](https://dashboard.stripe.com/webhooks) 2. Select** Add endpoint** . 3. In Tinybird, create a Data Source, called `stripe` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/stripe.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Stripe in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Stripe, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. https://api.tinybird.co/v0/events?name=stripe&format=json&token= Make sure to use the `format=json` query parameter. Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Select events** and choose the events you want to send to Tinybird. 2. Save and You're done. Check the status of the integration by selecting the webhook in Stripe or from the **Log** tab in the Tinybird `stripe` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Stripe Webhooks](https://docs.stripe.com/webhooks) - [ Stripe Events](https://docs.stripe.com/api/events/object) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-the-events-api Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Stream with HTTP Requests · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to use the Tinybird Events API to ingest thousands of JSON messages per second." --- # Stream with HTTP Requests¶ In this guide, you'll learn how to use the Tinybird Events API to ingest thousands of JSON messages per second with HTTP Requests. For more details about the Events API endpoint, read the [Events API](https://www.tinybird.co/docs/docs/api-reference/events-api) docs. ## Setup: Create the target Data Source¶ Firstly, you need to create an NDJSON Data Source. You can use the [API](https://www.tinybird.co/docs/docs/api-reference/datasource-api) , or simply drag & drop a file on the UI. Even though you can add new columns later on, you have to upload an initial file. The Data Source will be created and ordered based upon those initial values. As an example, upload this NDJSON file: {"date": "2020-04-05 00:05:38", "city": "New York"} ## Ingest from the browser: JavaScript¶ Ingesting from the browser requires making a standard POST request; see below for an example. Input your own Token and change the name of the target Data Source to the one you created. Check your URL ( `const url` ) is the corresponding [URL for your region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints). ##### Browser High-Frequency Ingest async function sendEvents(events){ const date = new Date(); events.forEach(ev => { ev.date = date.toISOString() }); const headers = { 'Authorization': 'Bearer TOKEN_HERE', }; const url = 'https://api.tinybird.co/' // you may be on a different host const rawResponse = await fetch(`${url}v0/events?name=hfi_multiple_events_js`, { method: 'POST', body: events.map(JSON.stringify).join('\n'), headers: headers, }); const content = await rawResponse.json(); console.log(content); } sendEvents([ { 'city': 'Jamaica', 'action': 'view'}, { 'city': 'Jamaica', 'action': 'click'}, ]); Remember: Publishing your Admin Token on a public website is a security vulnerability. It is **highly recommend** that you [create a new Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#create-a-token) that restricts access granularity. ## Ingest from the backend: Python¶ Ingesting from the backend is a similar process to ingesting from the browser. Use the following Python snippet and replace the Auth Token and Data Source name, as in the example above. ##### Python High-Frequency Ingest import requests import json import datetime def send_events(events): params = { 'name': 'hfi_multiple_events_py', 'token': 'TOKEN_HERE', } for ev in events: ev['date'] = datetime.datetime.now().isoformat() data = '\n'.join([json.dumps(ev) for ev in events]) r = requests.post('https://api.tinybird.co/v0/events', params=params, data=data) print(r.status_code) print(r.text) send_events([ {'city': 'Pretoria', 'action': 'view'}, {'city': 'Pretoria', 'action': 'click'}, ]) ## Ingest from the command line: curl¶ The following curl snippet sends two events in the same request: ##### curl High-Frequency Ingest curl -i -d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' -H "Authorization: Bearer $TOKEN" 'https://api.tinybird.co/v0/events?name=hfi_test' ## Add new columns from the UI¶ As you add extra information in the form of new JSON fields, the UI will prompt you to include those new columns on the Data Source. For instance, if you send a new event with an extra field: ##### curl High-Frequency Ingest curl -i -d '{"date": "2020-04-05 00:05:38", "city": "Chicago", "country": "US"}' -H "Authorization: Bearer $TOKEN" 'https://api.tinybird.co/v0/events?name=hfi_test' And navigate to the UI's Data Source screen, you'll be asked if you want to add the new column: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhigh-frequency-ingestion-1.png&w=3840&q=75) Here, you'll be able to select the desired columns and adjust the types: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhigh-frequency-ingestion-2.png&w=3840&q=75) After you confirm the addition of the column, it will be populated by new events: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhigh-frequency-ingestion-3.png&w=3840&q=75) ## Error handling and retries¶ Read more about the possible [responses returned by the Events API](https://www.tinybird.co/docs/docs/api-reference/events-api). When using the Events API to send data to Tinybird, you can choose to 'fire and forget' by sending a POST request and ignoring the response. This is a common choice for non-critical data, such as tracking page hits if you're building Web Analytics, where some level of loss is acceptable. However, if you're sending data where you can't tolerate events being missed, you must implement some error handling & retry logic in your application. ### Wait for acknowledgement¶ When you send data to the Events API, you'll usually receive a `HTTP202` response, which indicates that the request was successful. However, it's important to note that this response only indicates that the Events API successfully **accepted** your HTTP request. It doesn't confirm that the data has been **committed** into the underlying database. Using the `wait` parameter with your request will ask the Events API to wait for acknowledgement that the data you sent has been committed into the underlying database. If you use the `wait` parameter, you will receive a `HTTP200` response that confirms data has been committed. To use this, your Events API request should include `wait` as a query parameter, with a value of `true`. For example: https://api.tinybird.co/v0/events?wait=true It is good practice to log your requests to, and responses from, the Events API. This will help give you visibility into any failures for reporting or recovery. ### When to retry¶ Failures are indicated by a `HTTP4xx` or `HTTP5xx` response. It's recommended to only implement automatic retries for `HTTP5xx` responses, which indicate that a retry might be successful. `HTTP4xx` responses should be logged and investigated, as they often indicate issues that can't be resolved by simply retrying with the same request. For HTTP2 clients, you may receive the `0x07 GOAWAY` error. This indicates that there are too many alive connections. It is safe to recreate the connection and retry these errors. ### How to retry¶ You should aim to retry any requests that fail with a `HTTP5xx` response. In general, you should retry these requests 3-5 times. If the failure persists beyond these retries, log the failure, and attempt to store the data in a buffer to resend later (for example, in Kafka, or a file in S3). It's recommended to use an exponential backoff between retries. This means that, after a retry fails, you should increase the amount of time you wait before sending the next retry. If the issue causing the failure is transient, this gives you a better chance of a successful retry. Be careful when calculating backoff timings, so that you don't run into memory limits on your application. ## Next steps¶ - Learn more about[ the schema](https://www.tinybird.co/docs/docs/get-data-in#create-your-schema) and why it's important. - Ingested your data and ready to go? Start[ querying your Data Sources](https://www.tinybird.co/docs/docs/work-with-data/query) and build some Pipes! --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-from-vercel Last update: 2025-01-08T09:32:25.000Z Content: --- title: "Send Vercel Webhooks to Tinybird · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to send Vercel events to Tinybird using webhooks and the Events API." --- # Send Vercel events to Tinybird¶ [Vercel](https://vercel.com/) is a platform for building and deploying web applications. By integrating Vercel with Tinybird, you can analyze your Vercel events in real time and enrich it with other data sources. Some common use cases for sending Vercel events to Tinybird include: 1. Tracking deployments, projects, integrations and domains status and errors. 2. Creating custom analytical dashboards. 3. Monitoring attacks. Read on to learn how to send data from Vercel to Tinybird. ## Before you start¶ Before you connect Vercel webhooks to Tinybird, ensure: - You have a Vercel account. - You have a Tinybird Workspace. ## Connect Vercel to Tinybird¶ 1. Choose your team scope on the dashboard, and go to** Settings** >** Webhooks** . 2. Select the Webhooks and Projects you want to send to Tinybird. 3. In Tinybird, create a Data Source, called `vercel` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/vercel.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" Using the [JSON Data Type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) you can store the semi-structured data you receive from Vercel webhooks in a single column. You can later retrieve various events and their metadata as needed in your Pipes. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Vercel, paste the Events API URL in your Webhook Endpoint URL. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. For example: https://api.tinybird.co/v0/events?name=vercel&token= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. You're done. Any of the Vercel events you selected is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . You can check the status of the integration from the **Log** tab in the Tinybird `vercel` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Vercel webhooks](https://vercel.com/docs/observability/webhooks-overview) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-ndjson-data Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Ingest NDJSON data · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to ingest unstructured data, like NDJSON to Tinybird." --- # Ingest NDJSON data¶ In this guide you'll learn how to ingest unstructured NDJSON data into Tinybird. ## Overview¶ A common scenario is having a document-based database, using nested records on your data warehouse or generated events in JSON format from a web application. For cases like this, the process used to be: Export the `JSON` objects as if they were a `String` in a CSV file, ingest them to Tinybird, and then use the built-in `JSON` functions to prepare the data for real-time analytics as it was being ingested. But this isn't needed anymore, as Tinybird now accepts JSON imports by default! Although Tinybird allows you to ingest `.json` and `.ndjson` files, it only accepts the [Newline Delimited JSON](https://github.com/ndjson/ndjson-spec) as content. Each line must be a valid JSON object and every line has to end with `\n` . The API will return an error if each line isn't a valid JSON value. ## Ingest to Tinybird¶ This guide will use an example scenario including this [100k rows NDJSON file](https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson) , which contains events from an ecommerce website with different properties. ### With the API¶ Ingesting NDJSON files using the API is similar to the CSV process. There are only two differences to be managed in the query parameters: - ** format** : It has to be "ndjson" - ** schema** : Usually, the name and the type are provided for every column but in this case it needs an additional property, called the `jsonpath` (see the[ JSONPath syntax](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) ). Example:* "schema=event_name String `json:$.event.name`"* You can guess the `schema` by first calling the [Analyze API](https://www.tinybird.co/docs/docs/api-reference/analyze-api) . It's a very handy way to not have to remember the `schema` and `jsonpath` syntax: Just send a sample of your file and the Analyze API will describe what's inside (columns, types, schema, a preview, etc.). ##### Analyze API request curl \ -H "Authorization: Bearer $TOKEN" \ -G -X POST "https://api.tinybird.co/v0/analyze" \ --data-urlencode "url=https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson" Take the `schema` attribute in the response and either use it right away in the next API request to create the Data Source, or modify as you wish: Column names, types, remove any columns, etc. ##### Analyze API response excerpt { "analysis": { "columns": [ { "path": "$.date", "recommended_type": "DateTime", "present_pct": 1, "name": "date" }, ... ... ... "schema": "date DateTime `json:$.date`, event LowCardinality(String) `json:$.event`, extra_data_city LowCardinality(String) `json:$.extra_data.city`, product_id String `json:$.product_id`, user_id Int32 `json:$.user_id`, extra_data_term Nullable(String) `json:$.extra_data.term`, extra_data_price Nullable(Float64) `json:$.extra_data.price`" }, "preview": { "meta": [ { "name": "date", "type": "DateTime" }, ... ... ... } Now you've analyzed the file, create the Data Source. In the example below, you will ingest the 100k rows NDJSON file only taking 3 columns from it: date, event, and product_id. The `jsonpath` allows Tinybird to match the Data Source column with the JSON property path: ##### Ingest NDJSON to Tinybird TOKEN= curl \ -H "Authorization: Bearer $TOKEN" \ -X POST "https://api.tinybird.co/v0/datasources" \ -G --data-urlencode "name=events_example" \ -G --data-urlencode "mode=create" \ -G --data-urlencode "format=ndjson" \ -G --data-urlencode "schema=date DateTime \`json:$.date\`, event String \`json:$.event\`, product_id String \`json:$.product_id\`" \ -G --data-urlencode "url=https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson" ### With the Command Line Interface¶ There are no changes in the CLI in order to ingest an NDJSON file. Just run the command you are used to with CSV: ##### Generate Data Source schema from NDJSON tb datasource generate https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson Once it's finished, it automatically generates a .datasource file with all the columns, with their proper types, and `jsonpaths` . For example: ##### Generated Data Source schema DESCRIPTION generated from https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson SCHEMA > date DateTime `json:$.date`, event String `json:$.event`, extra_data_city String `json:$.extra_data.city`, product_id String `json:$.product_id`, user_id Int32 `json:$.user_id`, extra_data_price Nullable(Float64) `json:$.extra_data.price`, extra_data_city Nullable(String) `json:$.extra_data.city` You can then push that .datasource file to Tinybird and start using it in your Pipes or append new data to it: ##### Push Data Source to Tinybird and append new data tb push events_100k.datasource tb datasource append events_100k https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_100k.ndjson ### With the UI¶ To create a new Data Source from an NDJSON file, navigate to your Workspace and select the **Add Data Source** button. In the modal, select "File upload" and upload the NDJSON/JSON file or drag and drop onto the modal. You can use provide a URL such as the one provided in this guide. Confirm you're happy with the schema and data, and select "Create Data Source". Once your data is imported, you will have a Data Source with your JSON data structured in columns, which are easy to transform and consume in any Pipe. Ingest just the columns you need. After exploration of your data, always remember to create a Data Source that only has the columns needed for your analyses. That will help to make your ingestion, materialization, and your real time data project faster. ## When new JSON fields are added¶ Tinybird can automatically detect if a new JSON property is being added when new data is being ingested. Using the Data Source import example from the previous paragraph, you can include a new property to know the origin country of the event, complementing the city. Append new JSON data with the extra property ( [using this example file](https://storage.googleapis.com/tinybird-assets/datasets/guides/how-to-ingest-ndjson-data/events_with_country.ndjson) ). After finishing the import, open the Data Source modal and confirm that a new blue banner appears, warning you about the new properties detected in the last ingestion: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhow-to-ingest-ndjson-data-4.png&w=3840&q=75) <-figcaption-> Automatically suggesting new columns Once you accept viewing those new columns, the application will allow you to add them, change the column types and the column names, as it did in the preview step during the import: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhow-to-ingest-ndjson-data-5.png&w=3840&q=75) <-figcaption-> Accepting new columns From now on, whenever you append new data where the new column is defined and has a value, it will appear in the Data Source and will be available to be consumed from your Pipes: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhow-to-ingest-ndjson-data-6.png&w=3840&q=75) <-figcaption-> New column receiving data Tinybird automatically detects if there are new columns available. If you ingest data periodically into your NDJSON Data Source (from a file or a Kafka connection) and new columns are coming in, you will see a blue dot in the Data Source icon that appears in the sidebar (see Mark 1 below). Click on the Data Source to view the new columns and add them to the schema, following the steps above. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fhow-to-ingest-ndjson-data-7.png&w=3840&q=75) <-figcaption-> New columns detected, notified by a blue dot ## JSONPaths¶ This section applies to both NDJSON **and** Parquet data. When creating a Data Source using NDJSON/Parquet data, for each column in the `schema` you have to provide a JSONPath using the [JSONPath syntax](https://goessner.net/articles/JsonPath). This is easy for simple schemas, but it can get complex if you have nested fields and arrays. For example, given this NDJSON object: { "field": "test", "nested": { "nested_field": "bla" }, "an_array": [1, 2, 3], "a_nested_array": { "nested_array": [1, 2, 3] } } The schema would be something like this: ##### schema with jsonpath a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`, an_array Array(Int16) `json:$.an_array[:]`, field String `json:$.field`, nested_nested_field String `json:$.nested.nested_field` Tinybird's JSONPath syntax support has some limitations: It support nested objects at multiple levels, but it supports nested arrays only at the first level, as in the example above. To ingest and transform more complex JSON objects, use the root object JSONPath syntax as described in the next section. ### JSONPaths and the root object¶ Defining a column as "column_name String `json:$` " in the Data Source schema will ingest each line in the NDJSON file as a String in the `column_name` column. This is very useful in some scenarios. When you have nested arrays, such as polygons: ##### Nested arrays { "id": 49518, "polygon": [ [ [30.471785843000134, -1.066836591999916], [30.463855835000118, -1.075127054999925], [30.456156047000093, -1.086082457999908], [30.453003785000135, -1.097347919999962], [30.456311076000134, -1.108096617999891], [30.471785843000134, -1.066836591999916] ] ] } You can parse the `id` and then add the whole JSON string to the root column to extract the polygon with JSON functions. ##### schema definition id String `json:$.id`, root String `json:$` When you have complex objects: ##### Complex JSON objects { "elem": { "payments": [ { "users": [ { "user_id": "Admin_XXXXXXXXX", "value": 4 } ] } ] } } Or if you have variable schema ("schemaless") events: ##### Schemaless events { "user_id": "1", "data": { "whatever": "bla", "whatever2": "bla" } } { "user_id": "1", "data": [1, 2, 3] } You can simply put the whole event in the root column and parse as needed: ##### schema definition root String `json:$` ## JSON data type BETA¶ You can use the `JSON` data type as an alternative to ingesting NDJSON data. The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ### Schemaless JSON type¶ If you expect a completely variable schema, you can store the payload as JSON as follows: ##### events\_100k.datasource SCHEMA > `payload` JSON `json:$` ENGINE "MergeTree" ENGINE_SORTING_KEY "tuple()" Fields are stored as `Dynamic` type columns, so you might want to cast them for operations where type matters. ### JSON type queries¶ You can use dot notation with `json.field` or `getSubcolumn(json, 'field')` . To query nested JSONs, like `extra_data` in the example, use the `^` syntax. SELECT payload, payload.date as date, payload.event as event, payload.^extra_data as extra_data, getSubcolumn(payload, 'product_id') as product_id, getSubcolumn(payload, 'user_id') as user_id FROM events_100k The previous example uses `getSubcolumn(payload, 'date')` and `payload.date` indistinctly. If you ever find any issue with dot notation, try `getSubcolumn(json,'path')` syntax. ### Explicit JSON typing¶ Storing everything as `Dynamic` has performance implications. If you know the schema you can use JSONPaths as described in previous sections, use `JSON(field_name field_type)` , or a mix of both approaches. For example: ##### events\_100k\_typed\_json.datasource SCHEMA > `payload` JSON(date DateTime, event LowCardinality(String), product_id String, user_id Int32) `json:$`, `extra_data` JSON(city String, price Float32) `json:$.extra_data` ENGINE "MergeTree" ENGINE_SORTING_KEY "tuple()" ### JSON dynamic paths and types¶ In addition to explicit typing, the `JSON` type has two optional arguments to control how data is stored: `max_dynamic_paths` and `max_dynamic_types` . See [JSOn data type](https://www.tinybird.co/docs/docs/sql-reference/data-types/json) for details. If you don't specify these arguments, the default values are the following: - `max_dynamic_paths` : 16 - `max_dynamic_types` : 2 This is to avoid potential problems with JSON objects containing many properties: up to `max_dynamic_paths` could be stored into specific sets of files for each property. A number too big might cause performance degradation in your data ingest and endpoints. If your JSON objects contain many properties, consider using the `SKIP` argument to omit those not needed. Using complex objects can degrade the performance of ingestion and reads. Use the default values. Check with Tinybird support before using custom values. ## Next steps¶ - Learn more about the[ Data Sources API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/datasource-api) . - Want to schedule your data ingestion? Read the docs on[ cron and GitHub Actions](https://www.tinybird.co/docs/docs/get-data-in/data-operations/scheduling-with-github-actions-and-cron) . --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-vercel-logdrains Last update: 2024-12-24T11:01:26.000Z Content: --- title: "Send Vercel log drains to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to send Vercel events to Tinybird using webhooks and the Events API." --- # Send Vercel log drains to Tinybird¶ [Vercel](https://vercel.com/) is a platform for building and deploying web applications. By integrating Vercel with Tinybird, you can analyze your Vercel events in real time and enrich it with other data sources. Some common use cases for sending Vercel Log Drains to Tinybird include: 1. Analyze logs from your applications. 2. Monitor logs from your applications. 3. Create custom analytical dashboards. 4. Build an alert system based on logging patterns. Read on to learn how to send logs from Vercel to Tinybird. ## Before you start¶ Before you connect Vercel Log Drains to Tinybird, ensure: - You have a Vercel account. - You have a Tinybird Workspace. ## Connect Vercel to Tinybird¶ 1. Choose your team scope on the dashboard, and go to** Team Settings** >** Log Drains** . 2. Select the** Projects** to send logs to Tinybird. 3. Select** Sources** you want to send logs to Tinybird. 4. Select** NDJSON** as Delivery Format. 5. Select** Environments** and** Sampling Rate** . 6. In Tinybird, create a Data Source, called `vercel_logs` . You can follow this[ schema](https://github.com/tinybirdco/tinynest/blob/main/tinybird/datasources/vercel_logs.datasource) : SCHEMA > `event_time` DateTime `json:$.tinybirdIngestTime` DEFAULT now(), `event_type` String `json:$.type` DEFAULT 'unknown', `event` JSON `json:$` DEFAULT '{}' ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(event_time)" ENGINE_SORTING_KEY "event_time" The proxy column is a JSON string. Use the [JSONExtract](https://www.tinybird.co/docs/docs/sql-reference/functions/json-functions) functions to extract the data you need in your Pipes. 1. In Tinybird, copy a token with privileges to append to the Data Source you created. You can use the admin token or create one with the required scope. 2. Back in Vercel, paste the Events API URL in your Log Drains Endpoint. Use the query parameter `name` to match the name of the Data Source you created in Tinybird. Log Drains webhook needs to be verified by Vercel. You can do this by adding the `x-vercel-verify` parameter to the request. https://api.tinybird.co/v0/events?name=vercel_logs&x-vercel-verify= Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. 1. Select** Custom Headers** , add `Authorization` with the value `Bearer ` and select** Add** . 2. Select** Verify** and optionally use** Test Log Drain** from Vercel to check data gets to the `vercel_logs` Data Source in Tinybird. 3. You're done. Any of the Vercel Log Drains you selected is automatically sent to Tinybird through the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . Check the status of the from the **Log** tab in the Tinybird `vercel_logs` Data Source. ## See also¶ - [ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Vercel Log Drains](https://vercel.com/docs/observability/log-drains/log-drains-reference) --- URL: https://www.tinybird.co/docs/get-data-in/guides/ingest-with-estuary Last update: 2025-01-17T08:20:45.000Z Content: --- title: "Ingest with Estuary · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to use Estuary to push data streams to Tinybird." --- # Ingest with Estuary¶ In this guide, you'll learn how to use Estuary to push data streams to Tinybird. [Estuary](https://estuary.dev/) is a real-time ETL tool that allows you capture data from a range of source, and push it to a range of destinations. Using Estuary's Dekaf, you can connect Tinybird to Estuary as if it was a Kafka broker - meaning you can use Tinybird's native Kafka Connector to consume data from Estuary. [Read more about Estuary Dekaf.](https://docs.estuary.dev/guides/dekaf_reading_collections_from_kafka/#connection-details) ## Prerequisites¶ - An Estuary account & collection - A Tinybird account & Workspace ## Connecting to Estuary¶ In Estuary, create a new token to use for the Tinybird connection. You can do this from the Estuary Admin Dashboard. In your Tinybird Workspace, create a new Data Source and use the [Kafka Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/kafka). To configure the connection details, use the following settings (these can also be found in the [Estuary Dekaf docs](https://docs.estuary.dev/guides/dekaf_reading_collections_from_kafka/#connection-details) ). - Bootstrap servers: `dekaf.estuary-data.com` - SASL Mechanism: `PLAIN` - SASL Username: `{}` - SASL Password: Estuary Refresh Token (Generate your token in the Estuary Admin Dashboard) Tick the `Decode Avro messages with Schema Register` box, and use the following settings: - URL: `https://dekaf.estuary-data.com` - Username: `{}` - Password: The same Estuary Refresh Token as above Click **Next** and you will see a list of topics. These topics are the collections you have in Estuary. Select the collection you want to ingest into Tinybird, and click **Next**. Configure your consumer group as needed. Finally, you will see a preview of the Data Source schema. Feel free to make any modifications as required, then click **Create Data Source**. This will complete the connection with Estuary, and new data from the Estuary collection will arrive in your Tinybird Data Source in real-time. --- URL: https://www.tinybird.co/docs/get-data-in/guides/postgres-cdc-with-redpanda-connect Last update: 2024-12-18T15:47:50.000Z Content: --- title: "Postgres CDC with Redpanda Connect · Tinybird Docs" theme-color: "#171612" description: "Learn how to ingest data from a Postgres database using Redpanda Connect and the Postgres CDC input." --- # PostgreSQL CDC with Redpanda Connect¶ [Redpanda Connect](https://www.redpanda.com/blog/redpanda-connect) is an ecosystem of high-performance streaming connectors that serves as a simplified and powerful alternative to Kafka Connect. Tinybird is the ideal complement to Postgres for handling OLAP workloads. The following guide shows you how to use Redpanda Connect to ingest data from a Postgres database into Tinybird. ## Before you start¶ Before you connect Postgres to Redpanda, ensure: - You have a Redpanda cluster and Redpanda Connect installed with version 4.43.0 or higher. The following instructions use Redpanda Serverless, but you can use Redpanda Cloud Dedicated or self-hosted. - You have a PostgreSQL database with logical replication enabled. ## Connect Postgres to Redpanda¶ 1. In the Redpanda Cloud console, select** Connect** from the navigation menu, then select** Create Pipeline** . 2. Add the pipeline configuration. You need the following information: - Postgres connection string ( `dsn` ) - Redpanda brokers ( `seed_brokers` ) - SASL mechanism ( `mechanism` ) - Username ( `username` ) - Password ( `password` ) Use the following YAML template: input: label: "postgres_cdc" postgres_cdc: dsn: <> include_transaction_markers: false slot_name: test_slot_native_decoder snapshot_batch_size: 100000 stream_snapshot: true temporary_slot: true schema: public tables: - <
> output: redpanda: seed_brokers: - ${REDPANDA_BROKERS} topic: <> tls: enabled: false sasl: - mechanism: SCRAM-SHA-512 password: <> username: <> See the Redpanda Connect docs for more information on the `redpanda` output and `postgres_cdc` input. 1. Start the Redpanda Connect pipeline Select **Create** to save and create the pipeline. This takes you back to the pipeline screen, where you can find your new pipeline. Open the new pipeline to view the logs and confirm that the pipeline is running. Select the Topics page from the navigation menu and confirm that the topic exists and that messages are being produced. 1. Connect Redpanda to Tinybird In Tinybird, add a new Data Source and select the **Redpanda** connector. Enter your Redpanda connection details, then select the topic used in the Redpanda Connect pipeline. Confirm your schema and select **Create Data Source**. Redpanda Connect continuosly consumes changes from Postgres and pushes them to your Redpanda topic. Tinybird consumes the changes from Redpanda in real time, making them available to query with minimal latency. ## See also¶ - [ Tinybird Redpanda Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/redpanda) - `redpanda` output - `postgres_cdc` input --- URL: https://www.tinybird.co/docs/get-data-in/guides/postgresql Last update: 2025-01-20T11:43:08.000Z Content: --- title: "PostgreSQL table function · Tinybird Docs" theme-color: "#171612" description: "Documentation for the Tinybird PostgreSQL table function" --- # PostgreSQL table function BETA¶ The Tinybird `postgresql` table function is currently in public beta. The Tinybird `postgresql()` table function allows you to read data from your existing PostgreSQL database into Tinybird, then schedule a regular Copy Pipe to orchestrate synchronization. You can load full tables, and every run performs a full replace on the Data Source. To use it, define a node using standard SQL and the `postgresql` function keyword, then publish the node as a Copy Pipe that does a sync on every run. ## Set up¶ ### Prerequisites¶ Your postgres database needs to be open and public (exposed to the internet, with publicly-signed certs), so you can connect it to Tinybird via the hostname and port using your username and password. You'll also need familiarity with making cURL requests to [manage your secrets](https://www.tinybird.co/docs/about:blank#about-secrets). ### Type support and inference¶ Here's a detailed conversion table: | PostgreSQL data type | Tinybird data type | | --- | --- | | BOOLEAN | UInt8 or Bool | | SMALLINT | Int16 | | INTEGER | Int32 | | BIGINT | Int64 | | REAL | Float32 | | DOUBLE PRECISION | Float64 | | NUMERIC or DECIMAL | Decimal(p, s) | | CHAR(n) | FixedString(n) | | VARCHAR (n) | String | | TEXT | String | | BYTEA | String | | TIMESTAMP | DateTime | | TIMESTAMP WITH TIME ZONE | DateTime (with appropriate timezone handling) | | DATE | Date | | TIME | String (since there is no direct TIME type) | | TIME WITH TIME ZONE | String | | INTERVAL | String | | UUID | UUID | | ARRAY | Array(T) where T is the array element type | | JSON | String or JSON | | JSONB | String | | INET | String | | CIDR | String | | MACADDR | String | | ENUM | Enum8 or Enum16 | | GEOMETRY | String | Notes: - Tinybird doesn't support all PostgreSQL types directly, so some types are mapped to String in Tinybird, which is the most flexible type for arbitrary data. - For the NUMERIC and DECIMAL types, Decimal(p, s) in Tinybird requires specifying precision (p) and scale (s). - Time zone support in Tinybird's DateTime can be managed via additional functions or by ensuring consistent storage and retrieval time zones. - Some types like INTERVAL don't have a direct equivalent in Tinybird and are usually stored as String or decomposed into separate fields. ## About secrets¶ The Environment Variables API is currently only accessible at API level. UI support will be released in the near future. Pasting your credentials into a Pipe node or `.datafile` as plain text is a security risk. Instead, use the Environment Variables API to [create two new secrets](https://www.tinybird.co/docs/docs/api-reference/environment-variables-api#post-v0secrets) for your postgres username and password. In the next step, you'll then be ready to interpolate your new secrets using the `tb_secret` function: {{tb_secret('pg_username')}} {{tb_secret('pg_password')}} ## Load a PostgreSQL table¶ In the Tinybird UI, create a new Pipe Node. Call the `postgresql` table function and pass the hostname & port, database, table, user, and password: ##### Example node logic with actual values SELECT * FROM postgresql( 'aws-0-eu-central-1.TODO.com:3866', 'postgres', 'orders', {{tb_secret('pg_username')}}, {{tb_secret('pg_password')}}, ) Publish this node as a Copy Pipe, thereby running the query manually. You can choose to append only new data, or replace all data. ### Alternative: Use datafiles¶ As well as using the UI, you can also define node logic in Pipe `.datafile` files . An example for an ecommerce `orders_backfill` scenario, with a node called `all_orders` , would be: NODE all_orders SQL > % SELECT * FROM postgresql( 'aws-0-eu-central-1.TODO.com:3866', 'postgres', 'orders', {{tb_secret('pg_username')}}, {{tb_secret('pg_password')}}, ) TYPE copy TARGET_DATASOURCE orders COPY_SCHEDULE @on-demand COPY_MODE replace ## Include filters¶ You can use a source column in postgres and filter by a value in Tinybird, for example: ##### Example Copy Pipe with postgresql function and filters SELECT * FROM postgresql( 'aws-0-eu-central-1.TODO.com:3866', 'postgres', 'orders', {{tb_secret('pg_username')}}, {{tb_secret('pg_password')}}, ) WHERE orderDate > (select max(orderDate) from orders) ## Schedule runs¶ When publishing as a Copy Pipe, most users set it to run at a frequent interval using a cron expression. It's also possible to trigger manually: curl -H "Authorization: Bearer " \ -X POST "https:/tinybird.co/api/v0/pipes//run" Having manual Pipes in your Workspace is helpful, as you can run a full sync manually any time you need it - sometimes delta updates aren't 100% accurate. Some users also leverage them for weekly full syncs. ## Synchronization strategies¶ When copying data from PostgreSQL to Tinybird you can use one of the following strategies: - Use `COPY_MODE replace` to synchronize small dimensions tables, up to a few million rows, in a frequent schedule (1 to 5 minutes). - Use `COPY_MODE append` to do incremental appends. For example, you can append events data tagged with a timestamp. Combine it with `COPY_SCHEDULE` and filters in the Copy Pipe SQL to sync the new events. ### Timeouts¶ When synchronizing dimensions tables with `COPY_MODE replace` and 1 minute schedule, the copy job might timeout because it can't ingest the whole table in the defined schedule. Timeouts depend on several factors: - The `statement_timeout` configured in PostgreSQL. - The PostgreSQL database load. - Network connectivity, for example when copying data from different cloud regions. Follow these steps to avoid timeouts using incremental appends: 1. Make sure your PostgreSQL dimensions rows are tagged with an updated timestamp. Use the column to filter the copy Pipe SQL. In the following example, the column is `updated_at`: CREATE TABLE users ( created_at TIMESTAMPTZ(6) NOT NULL, updated_at TIMESTAMPTZ(6) NOT NULL, name TEXT, user_id TEXT PRIMARY KEY ); 1. Create the target Data Source as a[ ReplacingMergeTree](https://www.tinybird.co/docs/docs/work-with-data/strategies/deduplication-strategies#use-the-replacingmergetree-engine) using a unique or primary key as the `ENGINE_SORTING_KEY` in the Postgres table. Rows with the same `ENGINE_SORTING_KEY` are deduplicated. SCHEMA > `created_at` DateTime64(6), `updated_at` DateTime64(6), `name` String, `user_id` String ENGINE "ReplacingMergeTree" ENGINE_SORTING_KEY "user_id" 1. Configure the Copy Pipe with an incremental append strategy and 1 minute schedule. That way you make sure only new records in the last minute are ingested, thus optimizing the copy job duration. NODE copy_pg_users_rmt_0 SQL > % SELECT * FROM postgresql( 'aws-0-eu-central-1.TODO.com:6543', 'postgres', 'users', {{ tb_secret('pg_username') }}, {{ tb_secret('pg_password') }} ) WHERE updated_at > (SELECT max(updated_at) FROM pg_users_rmt)::String TYPE copy TARGET_DATASOURCE pg_users_rmt COPY_MODE append COPY_SCHEDULE * * * * * Optionally, you can create an index in the PostgreSQL table to speed up filtering: -- Create an index on updated_at for faster queries CREATE INDEX idx_updated_at ON users (updated_at); 1. A Data Source with `ReplacingMergeTree` engine deduplicates records based on the sorting key in batch mode. As you can't ensure when deduplication is going to happen, use the `FINAL` keyword when querying the Data Source to force deduplication at query time. SELECT * FROM pg_users FINAL 1. You can combine this approach with an hourly or daily replacement to get rid of deleted rows. Learn about[ how to handle deleted rows](https://www.tinybird.co/docs/docs/work-with-data/strategies/deduplication-strategies#use-the-replacingmergetree-engine) when using `ReplacingMergeTree` . Learn more about [how to migrate from Postgres to Tinybird](https://www.tinybird.co/docs/docs/get-data-in/migrate/migrate-from-postgres). ## Observability¶ Job executions are logged in the `datasources_ops_log` [Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) . This log can be checked directly in the Data Source view page in the UI. Filter by `datasource_id` to monitor ingestion through the PostgreSQL table function from the `datasources_ops_log`: ##### Example query to the datasources\_ops\_log Service Data Source SELECT timestamp, event_type, result, error, job_id FROM tinybird.datasources_ops_log WHERE datasource_id = 't_1234' AND event_type = 'copy' ORDER BY timestamp DESC ## Limits¶ The table function inherits all the [limits of Copy Pipes](https://www.tinybird.co/docs/docs/get-started/plans/limits#copy-pipe-limits). Secrets are created at a Workspace level, so you will be able to connect one PostgreSQL database per Tinybird Workspace. Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Billing¶ When set up, this functionality is a Copy Pipe with a query (processed data). There are no additional or specific costs for the table function itself. See the [billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) for more information on data operations and how they're charged. --- URL: https://www.tinybird.co/docs/get-data-in/guides/python-sdk Last update: 2024-12-18T09:46:02.000Z Content: --- title: "Send Python logs to Tinybird · Tinybird Docs" theme-color: "#171612" description: "Send your Python logs to Tinybird using the standard logging library and Tinybird Python SDK." --- # Send Python logs to Tinybird¶ You can send logs from a Python application or service to Tinybird using the standard Python logging library and the [tinybird-python-sdk](https://pypi.org/project/tinybird-python-sdk/). ## Prerequisites¶ To use the Tinybird Python SDK you need Python 3.11 or higher. ## Configure the logging handler¶ First, configure a Tinybird logging handler in your application. For example: import logging from multiprocessing import Queue from tb.logger import TinybirdLoggingQueueHandler logger = logging.getLogger('your-logger-name') handler = TinybirdLoggingHandler(, , 'your-app-name') formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.addHandler(handler) Each time you call the logger, the SDK sends an event to the `tb_logs` Data Source in your Workspace. To configure the Data Source name, initialize the `TinybirdLoggingHandler` like this: handler = TinybirdLoggingHandler(, , 'your-app-name', ds_name="your_tb_ds_name") ## Non-blocking logging¶ If you want to avoid blocking the main thread, use a queue to send the logs to a different thread. For example: import logging from multiprocessing import Queue from tb.logger import TinybirdLoggingQueueHandler from dotenv import load_dotenv load_dotenv() TB_API_URL = os.getenv("") TB_WRITE_TOKEN = os.getenv("") logger = logging.getLogger('your-logger-name') handler = TinybirdLoggingQueueHandler(Queue(-1), TB_API_URL, TB_WRITE_TOKEN, 'your-app-name', ds_name="your_tb_ds_name") formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') handler.setFormatter(formatter) logger.addHandler(handler) --- URL: https://www.tinybird.co/docs/get-data-in/ingest-apis Content: --- title: "Ingest APIs · Tinybird Docs" theme-color: "#171612" description: "Ingest data into Tinybird using the Datasource and Events APIs." --- # Ingest APIs¶ Tinybird provides the following APIs for ingesting data: - The** Data Sources API** lets you create and manage Data Sources, as well as import data from files (CSV, NDJSON, Parquet). You can use it to create new Data Sources from files, append data to existing Data Sources, or replace data selectively. The Data Sources API supports both local and remote files, with automatic schema inference for CSV files. - The** Events API** provides high-throughput streaming ingestion through a simple HTTP API. It's designed for sending JSON events individually or in batches using NDJSON format. The Events API is optimized for real-time data ingestion, supporting compression and write acknowledgements when needed. Both APIs require authentication using tokens with appropriate scopes. For detailed information about each API's capabilities and usage, see: - [ Data Sources API documentation](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/datasource-api) - [ Events API documentation](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) - [ Data Sources API Reference](https://www.tinybird.co/docs/docs/api-reference/datasource-api) - [ Events API Reference](https://www.tinybird.co/docs/docs/api-reference/events-api) --- URL: https://www.tinybird.co/docs/get-data-in/ingest-apis/datasource-api Last update: 2024-12-18T09:46:02.000Z Content: --- title: "Data Sources API · Tinybird Docs" theme-color: "#171612" description: "Use the Data Sources API to create and manage your Data Sources as well as importing data into them." --- # Data Sources API¶ Use Tinybird's Data Sources API to import files into your Tinybird Data Sources. With the Data Sources API you can use files to create new Data Sources, and append data to, or replace data from, an existing Data Source. See [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources). The following examples show how to use the Data Sources API to perform various tasks. See the [Data Sources API Reference](https://www.tinybird.co/docs/docs/api-reference/datasource-api) for more information. ## Import a file into a new Data Source¶ Tinybird can create a Data Source from a file. This operation supports CSV, NDJSON, and Parquet files. You can create a Data Source from local or remote files. Automatic schema inference is supported for CSV files, but isn't supported for NDJSON or Parquet files. ### CSV files¶ CSV files must follow these requirements: - One line per row - Comma-separated Tinybird supports Gzip compressed CSV files with .csv.gz extension. The Data Sources API automatically detects and optimizes your column types, so you don't need to manually define a schema. You can use the `type_guessing=false` parameter to force Tinybird to use `String` for every column. CSV headers are optional. When creating a Data Source from a CSV file, if your file contains a header row, Tinybird uses the header to name your columns. If no header is present, your columns receive default names with an incrementing number. When appending a CSV file to an existing Data Source, if your file has a header, Tinybird uses the headers to identify the columns. If no header is present, Tinybird uses the order of columns. If the order of columns in the CSV file is always the same, you can omit the header line. For example, to create a new Data Source from a local file using cURL: ##### Creating a Data Source from a local CSV file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \ -F csv=@local_file.csv From a remote file: ##### Creating a Data Source from a remote CSV file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=my_datasource_name" \ -d url='https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv' When importing a remote file from a URL, the response contains the details of an import Job. To see the status of the import, use the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api). ### NDJSON and Parquet files¶ The Data Sources API doesn't support automatic schema inference for NDJSON and Parquet files. You must specify the `schema` parameter with a valid schema to parse the files. The schema for both NDJSON and Parquet files uses [JSONPaths](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths) to identify columns in the data. You can add default values to the schema. Tinybird supports compressed NDJSON and Parquet files with .ndjson.gz and .parquet.gz extensions. You can use the [Analyze API](https://www.tinybird.co/docs/about:blank#generate-schemas-with-the-analyze-api) to automatically generate a schema definition from a file. For example, assume your NDJSON or Parquet data looks like this: ##### Simple NDJSON data example { "id": 123, "name": "Al Brown"} Your schema definition must provide the JSONPath expressions to identify the columns `id` and `name`: ##### Simple NDJSON schema definition id Int32 `json:$.id`, name String `json:$.name` To create a new Data Source from a local file using cURL, you must URL encode the Schema as a query parameter. The following examples use NDJSON. To use Parquet, adjust the `format` parameter to `format=parquet`: ##### Creating a Data Source from a local NDJSON file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=events&mode=create&format=ndjson&schema=id%20Int32%20%60json%3A%24.id%60%2C%20name%20String%20%60json%3A%24.name%60" \ -F ndjson=@local_file.ndjson From a remote file: ##### Creating a Data Source from a remote NDJSON file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=events&mode=create&format=ndjson" \ --data-urlencode "schema=id Int32 \`json:$.id\`, name String \`json:$.name\`" \ -d url='http://example.com/file.json' Note the escape characters in this example are only required due to backticks in cURL. When importing a remote file from a URL, the response contains the details of an import Job. To see the status of the import, you must the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api). To add default values to the schema, use the `DEFAULT` parameter after the JSONPath expressions. For example: ##### Simple NDJSON schema definition with default values id Int32 `json:$.id` DEFAULT 1, name String `json:$.name` DEFAULT 'Unknown' ## Create an NDJSON Data Source from a schema using JSON type¶ You can create an NDJSON Data Source using the JSON column type through the `schema` parameter. For example: curl \\ -H "Authorization: Bearer " \\ -X POST "https://api.tinybird.co/v0/datasources" \\ -d "name=example" \\ -d "format=ndjson" \\ -d "schema=data JSON `json:$`" ## Append a file into an existing Data Source¶ If you already have a Data Source, you can append the contents of a file to the existing data. This operation supports CSV, NDJSON, and Parquet files. You can append data from local or remote files. When appending CSV files, you can improve performance by excluding the CSV Header line. However, in this case, you must ensure the CSV columns are ordered. If you can't guarantee the order of column in your CSV, include the CSV Header. For example, to append data into an existing Data Source from a local file using cURL: ##### Appending data to a Data Source from a local CSV file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=append&name=my_datasource_name" \ -F csv=@local_file.csv From a remote file: ##### Appending data to a Data Source from a remote CSV file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=append&name=my_datasource_name" \ -d url='https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv' If the Data Source has dependent Materialized Views, data is appended in cascade. ## Replace data in an existing Data Source with a file¶ If you already have a Data Source, you can replace existing data with the contents of a file. You can replace all data or a selection of data. This operation supports CSV, NDJSON, and Parquet files. You can replace with data from local or remote files. For example, to replace all the data in a Data Source with data from a local file using cURL: ##### Replacing a Data Source from a URL curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv" \ -F csv=@local_file.csv From a remote file: ##### Replacing a Data Source from a URL curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv" \ --data-urlencode "url=http://example.com/file.csv" Rather than replacing all data, you can also replace specific partitions of data. This operation is atomic. To do this, use the `replace_condition` parameter. This parameter defines the filter that's applied, where all matching rows are deleted before finally ingesting the new file. Only the rows matching the condition are ingested. If the source file contains rows that don't match the filter, the rows are ignored. Replacements are made by partition, so make sure that the `replace_condition` filters on the partition key of the Data Source. To replace filtered data in a Data Source with data from a local file using cURL, you must URL encode the `replace_condition` as a query parameter. For example: ##### Replace filtered data in a Data Source with data from a local file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv&replace_condition=my_partition_key%20%3E%20123" \ -F csv=@local_file.csv From a remote file: ##### Replace filtered data in a Data Source with data from a remote file curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv" \ -d replace_condition='my_partition_key > 123' \ --data-urlencode "url=http://example.com/file.csv" All the dependencies of the Data Source, for example Materialized Views, are recalculated so that your data is consistent after the replacement. If you have n-level dependencies, they're also updated by this operation. Taking the example `A --> B --> C` , if you replace data in A, Data Sources B and C are automatically updated. The Partition Key of Data Source C must also be compatible with Data Source A. You can find more examples in the [Replace and delete data](https://www.tinybird.co/docs/docs/get-data-in/data-operations/replace-and-delete-data#replace-data-selectively) guide. Although replacements are atomic, Tinybird can't assure data consistency if you continue appending data to any related Data Source at the same time the replacement takes place. The new incoming data is discarded. ## Creating an empty Data Source from a schema¶ When you want to have more granular control about the Data Source schema, you can manually create the Data Source with a specified schema. For example, to create an empty Data Source with a set schema using cURL: ##### Create an empty Data Source with a set schema curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/datasources?name=stocks" \ -d "schema=symbol String, date Date, close Float32" To create an empty Data Source, you must pass a `schema` with your desired column names and types and leave the `url` parameter empty. ## Generate schemas with the Analyze API¶ The Analyze API can analyze a given NDJSON or Parquet file to produce a valid schema. The column names, types, and JSONPaths are inferred from the file. For example, to analyze a local NDJSON file using cURL: ##### analyze an NDJSON file to get a valid schema curl \ -H "Authorization: Bearer " \ -X POST "https://api.tinybird.co/v0/analyze" \ -F "ndjson=@local_file_path" The response contains a `schema` field that can be used to create your Data Source. For example: ##### Successful analyze response { "analysis": { "columns": [{ "path": "$.a_nested_array.nested_array[:]", "recommended_type": "Array(Int16)", "present_pct": 3, "name": "a_nested_array_nested_array" }, { "path": "$.an_array[:]", "recommended_type": "Array(Int16)", "present_pct": 3, "name": "an_array" }, { "path": "$.field", "recommended_type": "String", "present_pct": 1, "name": "field" }, { "path": "$.nested.nested_field", "recommended_type": "String", "present_pct": 1, "name": "nested_nested_field" } ], "schema": "a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`, an_array Array(Int16) `json:$.an_array[:]`, field String `json:$.field`, nested_nested_field String `json:$.nested.nested_field`" }, "preview": { "meta": [{ "name": "a_nested_array_nested_array", "type": "Array(Int16)" }, { "name": "an_array", "type": "Array(Int16)" }, { "name": "field", "type": "String" }, { "name": "nested_nested_field", "type": "String" } ], "data": [{ "a_nested_array_nested_array": [ 1, 2, 3 ], "an_array": [ 1, 2, 3 ], "field": "test", "nested_nested_field": "bla" }], "rows": 1, "statistics": { "elapsed": 0.00032175, "rows_read": 2, "bytes_read": 142 } } } ## Error handling¶ Most errors return an HTTP Error code, for example `HTTP 4xx` or `HTTP 5xx`. However, if the imported file is valid, but some rows failed to ingest due to an incompatible schema, you might still receive an `HTTP 200` . In this case, the Response body contains two keys, `invalid_lines` and `quarantine_rows` , which tell you how many rows failed to ingest. Additionally, an `error` key is present with an error message. ##### Successful ingestion with errors { "import_id": "e9ae235f-f139-43a6-7ad5-a1e17c0071c2", "datasource": { "id": "t_0ab7a11969fa4f67985cec481f71a5c2", "name": "your_datasource_name", "cluster": null, "tags": {}, "created_at": "2019-03-12 17:45:04", "updated_at": "2019-03-12 17:45:04", "statistics": { "bytes": 1397, "row_count": 4 }, "replicated": false, "version": 0, "project": null, "used_by": [] }, "error": "There was an error with file contents: 2 rows in quarantine and 2 invalid lines", "quarantine_rows": 2, "invalid_lines": 2 } --- URL: https://www.tinybird.co/docs/get-data-in/ingest-apis/events-api Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Events API · Tinybird Docs" theme-color: "#171612" description: "Documentation for the Tinybird Events API" --- # Events API¶ The Events API enables high-throughput streaming ingestion into Tinybird from an easy-to-use HTTP API. This page gives examples of how to use the Events API to perform various tasks. For more information, read the [Events API Reference](https://www.tinybird.co/docs/docs/api-reference/events-api) docs. ## Send individual JSON events¶ You can send individual JSON events to the Events API by including the JSON event in the Request body. Supported event formats are JSON and NDJSON (newline delimited JSON). For example, to send an individual NDJSON event using cURL: ##### Sending individual NDJSON events curl \ -H "Authorization: Bearer " \ -d '{"date": "2020-04-05 00:05:38", "city": "Chicago"}' \ 'https://api.tinybird.co/v0/events?name=events_test' The `name` parameter defines the name of the Data Source in which to insert events. If the Data Source doesn't exist, Tinybird creates the Data Source by inferring the schema of the JSON. The Token used to send data to the Events API needs the appropriate scopes. To append data to an existing Data Source, the `DATASOURCE:APPEND` scope is required. If the Data Source doesn't already exist, the `DATASOURCE:CREATE` scope is required to create the new Data Source. ### Define the schema¶ Defining your schema allows you to set data types, sorting key, TTL and more. Read the [schema definition docs here](https://www.tinybird.co/docs/docs/get-data-in#define-the-schema-yourself). ## Send batches of JSON events¶ Sending batches of events enables you to achieve much higher total throughput than sending individual events. You can send batches of JSON events to the Events API by formatting the events as NDJSON (newline delimited JSON). Each individual JSON event should be separated by a newline ( `\n` ) character. ##### Sending batches of JSON events curl \ -H "Authorization: Bearer " \ -d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' \ 'https://api.tinybird.co/v0/events?name=events_test' The `name` parameter defines the name of the Data Source in which to insert events. If the Data Source doesn't exist, Tinybird creates the Data Source by inferring the schema of the JSON. The Token used to send data to the Events API must have the appropriate scopes. To append data to an existing Data Source, the `DATASOURCE:APPEND` scope is required. If the Data Source doesn't already exist, the `DATASOURCE:CREATE` scope is required to create the new Data Source. ## Limits¶ The Events API delivers a default capacity of: - Up to 1000 requests/second per Data Source - Up to 20MB/s per Data Source - Up to 10MB per request per Data Source Throughput beyond these limits is offered as best-effort. The Events API is able to scale beyond these limits. If you are reaching these limits, contact [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). **Rate limit headers** | Header Name | Description | | --- | --- | | `X-RateLimit-Limit` | The maximum number of requests you're permitted to make in the current limit window. | | `X-RateLimit-Remaining` | The number of requests remaining in the current rate limit window. | | `X-RateLimit-Reset` | The time in seconds after the current rate limit window resets. | | `Retry-After` | The time to wait before making a another request. Only present on 429 responses. | Events API is a high-throughput streaming ingestion and as a distributed system, the values in these headers are offered as best-effort. ## Compression¶ NDJSON events sent to the Events API can be compressed with Gzip. However, it's only recommended to do this when necessary, such as when you have big events that are grouped into large batches. Compressing events adds overhead to the ingestion process, which can introduce latency, although it's typically minimal. Here is an example of sending a JSON event compressed with Gzip from the command line: echo '{"timestamp":"2022-10-27T11:43:02.099Z","transaction_id":"8d1e1533-6071-4b10-9cda-b8429c1c7a67","name":"Bobby Drake","email":"bobby.drake@pressure.io","age":42,"passport_number":3847665,"flight_from":"Barcelona","flight_to":"London","extra_bags":1,"flight_class":"economy","priority_boarding":false,"meal_choice":"vegetarian","seat_number":"15D","airline":"Red Balloon"}' | gzip > body.gz curl \ -X POST 'https://api.tinybird.co/v0/events?name=gzip_events_example' \ -H "Authorization: Bearer ." \ -H "Content-Encoding: gzip" \ --data-binary @body.gz ## Write acknowledgements¶ When you send data to the Events API, you usually receive a `HTTP202` response, which indicates that the request was successful - however it doesn't confirm that the data has been committed into the underlying database. This is useful when guarantees on writes aren't strictly necessary. Typically, it should take under 2 seconds to receive a response from the Events API in this case. curl \ -X POST 'https://api.tinybird.co/v0/events?name=events_example' \ -H "Authorization: Bearer " \ -d $'{"timestamp":"2022-10-27T11:43:02.099Z"}' < HTTP/2 202 < content-type: application/json < content-length: 42 < {"successful_rows":2,"quarantined_rows":0} However, if your use case requires absolute guarantees that data is committed, use the `wait` parameter. The `wait` parameter is a boolean that accepts a value of `true` or `false` . A value of `false` is the default behavior, equivalent to omitting the parameter entirely. Using `wait=true` with your request will ask the Events API to wait for acknowledgement that the data you sent has been committed into the underlying database. You will receive a `HTTP200` response that confirms data has been committed. Note that adding `wait=true` to your request can result in a slower response time. Use a time-out of at least 10 seconds when waiting for the response. For example: curl \ -X POST 'https://api.tinybird.co/v0/events?name=events_example&wait=true' \ -H "Authorization: Bearer " \ -d $'{"timestamp":"2022-10-27T11:43:02.099Z"}' < HTTP/2 200 < content-type: application/json < content-length: 42 < {"successful_rows":2,"quarantined_rows":0} It is good practice to log your requests to, and responses from, the Events API. This will help to give you visibility into any failures for reporting or recovery. --- URL: https://www.tinybird.co/docs/get-data-in/migrate Content: --- title: "Migrate data into Tinybird · Tinybird Docs" theme-color: "#171612" description: "Learn how to migrate data into Tinybird from other data platforms." --- # Migrate data into Tinybird¶ Tinybird provides several options for migrating data from external platforms. Whether you're moving from a managed service like DoubleCloud, a real-time analytics platform like Rockset, or a traditional database like PostgreSQL, Tinybird offers migration paths to help you transition your data and workloads. Each migration guide provides detailed, step-by-step instructions for: - Moving your existing data into Tinybird. - Understanding how concepts and features map between platforms. - Setting up equivalent functionality in Tinybird. - Maintaining data consistency during the migration. The guides also cover important considerations like: - Data volume and performance requirements. - Schema management and data types. - Authentication and security. - Monitoring and observability. Select your current platform to get started with your migration to Tinybird. - [ Migrate from DoubleCloud](https://www.tinybird.co/docs/docs/get-data-in/migrate/migrate-from-doublecloud) - [ Migrate from Postgres](https://www.tinybird.co/docs/docs/get-data-in/migrate/migrate-from-postgres) - [ Migrate from Rockset](https://www.tinybird.co/docs/docs/get-data-in/migrate/migrate-from-rockset) --- URL: https://www.tinybird.co/docs/get-data-in/migrate/migrate-from-doublecloud Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Migrate from DoubleCloud · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to migrate from DoubleCloud to Tinybird, and the overview of how to quickly & safely recreate your setup." --- # Migrate from DoubleCloud¶ In this guide, you'll learn how to migrate from DoubleCloud to Tinybird, and the overview of how to quickly & safely recreate your setup. DoubleCloud, a managed data services platform that offers ClickHouse® as a service, is [shutting down operations](https://double.cloud/blog/posts/2024/10/doublecloud-final-update/) . As of October 1, 2024 you can't create new DoubleCloud accounts, and all existing DoubleCloud services must be migrated by March 1, 2025. Tinybird offers a solution that can be a suitable alternative for existing users of DoubleCloud's ClickHouse service. Follow this guide to learn two approaches for migrating data from your DoubleCloud instance to Tinybird: 1. Option 1: Use the S3 table function to export data from DoubleCloud Managed ClickHouse to Amazon S3, then use the Tinybird S3 Connector to import data from S3. 2. Option 2: Export your ClickHouse tables locally, then import files into Tinybird using the Datasources API. Wondering how to create a Tinybird account? It's free! [Start here](https://www.tinybird.co/signup) . Need DoubleCloud migration assistance? Please [contact us](https://www.tinybird.co/doublecloud). ## Prerequisites¶ You don't need an active Tinybird Workspace to read through this guide, but it's good idea to understand the foundational concepts and how Tinybird integrates with your team. If you're new to Tinybird, read the [team integration guide](https://www.tinybird.co/docs/docs/get-started/administration/team-integration-governance). ## At a high level¶ Tinybird is a great alternative to DoubleCloud's managed ClickHouse implementation. Tinybird is a data platform built for data and engineering teams to solve complex real-time, operational, and user-facing analytics use cases at any scale, with end-to-end latency in milliseconds for streaming ingest and high QPS workloads. It offers the same or comparable performance as DoubleCloud, with additional features such as native, managed ingest connectors, multi-node SQL notebooks, and scalable REST APIs for public use or secured with JWTs. Tinybird is a managed platform that scales transparently, requiring no cluster operations, shard management or worrying about replicas. See how Tinybird is used by industry-leading companies today in the [Customer Stories](https://www.tinybird.co/customer-stories). ## Migrate from DoubleCloud to Tinybird using Amazon S3¶ In this approach, you'll use the `s3` table function to export tables to an Amazon S3 bucket, and then import them into Tinybird with the S3 Connector. This guide assumes that you already have the necessary IAM Roles with the necessary permissions to write to (from DoubleCloud) and read from (to Tinybird) the S3 bucket. ### Export your table to Amazon S3¶ In this guide, we're using a table on our DoubleCloud ClickHouse Cluster called `timeseriesdata` . The data has 3 columns and 1M rows. Export your table to Amazon S3 In this guide, we're using a table on our DoubleCloud ClickHouse Cluster called `timeseriesdata` . The data has 3 columns and 1M rows. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-1.png&w=3840&q=75) <-figcaption-> Example timeseries data table in DoubleCloud You can export data in your DoubleCloud ClickHouse tables to Amazon S3 with the `s3` table function. Note: If you don't want to expose your AWS credentials in the query, use a named collection. INSERT INTO FUNCTION s3( 'https://tmp-doublecloud-migration.s3.us-east-1.amazonaws.com/exports/timeseriesdata.csv', 'AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'CSV' ) SELECT * FROM timeseriesdata SETTINGS s3_create_new_file_on_insert = 1 ### Import to Tinybird with the S3 Connector¶ Once your table is exported to Amazon S3, import it to Tinybird using the [Amazon S3 Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/s3). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-2.png&w=3840&q=75) <-figcaption-> Select the S3 Connector in the Tinybird UI. The basic steps for using the S3 Connector are: 1. Define an S3 Connection with IAM Policy and Role that allow Tinybird to read from S3. Tinybird will automatically generated the JSON for these policies. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-3.png&w=3840&q=75) <-figcaption-> Create an S3 Connection with automatically generated IAM policies 1. Supply the file URI (with wildcards as necessary) to define the file(s) containing the contents of your ClickHouse table(s). <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-4.png&w=3840&q=75) <-figcaption-> Specify the file URI for the files containing your ClickHouse tables 1. Create an On Demand (one-time) sync. 2. Define the schema of the resulting table in Tinybird. You can do this within the S3 Connector UI… <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-5.png&w=3840&q=75) <-figcaption-> Define the schema of your tables within the Tinybird UI ...or by creating a .datasource file and pushing it to Tinybird. An example .datasource file for `timeseriesdata` table to match the DoubleCloud schema and create the import job from the existing S3 Connection would look like this: SCHEMA > `tank_id` String, `volume` Float32, `usage` Float32 ENGINE "MergeTree" ENGINE_SORTING_KEY "tank_id" IMPORT_SERVICE 's3_iamrole' IMPORT_CONNECTION_NAME 'DoubleCloudS3' IMPORT_BUCKET_URI 's3://tmp-doublecloud-migration/timeseriesdata.csv' IMPORT_STRATEGY 'append' IMPORT_SCHEDULE '@on-demand' 1. Tinybird will then create and run a batch import job to ingest the data from Amazon S3 and create a new table that matches your table in DoubleCloud. You can monitor the job from the `datasource_ops_log` Service Data Source. ## Migrate from DoubleCloud to Tinybird using local exports¶ Depending on the size of your tables, you might be able to simply export your tables to a local file using `clickhouse-client` and ingest them to Tinybird directly. ### Export your tables from DoubleCloud using clickhouse-client¶ First, use `clickhouse-client` to export your tables into local files. Depending on the size of your data, you can choose to compress as necessary. Tinybird can ingest CSV (including Gzipped CSV), NDJSON, and Parquet files. ./clickhouse client --host your_doublecloud_host --port 9440 --secure --user your_doublecloud_user --password your_doublecloud_password --query "SELECT * FROM timeseriesdata" --format CSV > timeseriesdata.csv ### Import your files to Tinybird¶ You can drag and drop files into the Tinybird UI… <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fmigrate-from-doublecloud-6.png&w=3840&q=75) <-figcaption-> Drag and drop a file into the Tinybird UI to create a file-based Data Source or upload them using the Tinybird CLI: tb datasource generate timeseriesdata.csv tb push datasources/timeseriesdata.datasource tb datasource append timeseriesdata timeseriesdata.csv Note that Tinybird will automatically infer the appropriate schema from the supplied file, but you may need to change the column names, data types, table engine, and sorting key to match your table in DoubleCloud. ## Migration support¶ If your migration is more complex, involving many or very large tables, materialized views + populates, or other complex logic, please [contact us](https://www.tinybird.co/doublecloud) and we will assist with your migration. ## Tinybird Pricing vs DoubleCloud¶ Tinybird's Free plan is free, with no time limit or credit card required. The Free plan includes 10 GB of data storage (compressed) and 1,000 published API requests per day. Tinybird's paid plans are available with both infrastructure-based pricing and usage-based pricing. DoubleCloud customers will likely be more familiar with infrastructure-based pricing. For more information about infrastructure-based pricing and to get a quote based on your existing DoubleCloud cluster, [contact us](https://www.tinybird.co/doublecloud). If you are interested in usage-based pricing, you can learn more about [usage-based billing here](https://www.tinybird.co/docs/docs/get-started/plans/billing). ### ClickHouse Limits¶ Note that Tinybird takes a different approach to ClickHouse deployment than DoubleCloud. Rather than provide a full interface to a hosted ClickHouse cluster, Tinybird provides a serverless ClickHouse implementation and abstracts the database interface via our [APIs](https://www.tinybird.co/docs/docs/api-reference) , UI, and CLI, only exposing the SQL Editor within our [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) interface. Additionally, not all ClickHouse SQL functions, data types, and table engines are supported out of the box. You can find a full list of [supported engines and settings here](https://www.tinybird.co/docs/docs/get-data-in/data-sources#supported-engines-settings) . If your use case requires engines or settings that aren't listed, please [contact us](https://www.tinybird.co/doublecloud). ## Useful resources¶ Migrating to a new tool, especially at speed, can be challenging. Here are some helpful resources to get started on Tinybird: - [ Read how Tinybird compares to ClickHouse (especially ClickHouse Cloud)](https://www.tinybird.co/blog-posts/tinybird-vs-clickhouse) . - [ Read how Tinybird compares to other Managed ClickHouse offerings](https://www.tinybird.co/blog-posts/managed-clickhouse-options) . - Join our[ Slack Community](https://www.tinybird.co/community) for help understanding Tinybird concepts. - [ Contact us](https://www.tinybird.co/doublecloud) for migration assistance. ## Next steps¶ If you'd like assistance with your migration, [contact us](https://www.tinybird.co/doublecloud). - Set up a free Tinybird account and build a working prototype:[ Sign up here](https://www.tinybird.co/signup) . - Run through a quick example with your free account: Tinybird[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read the[ billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) to understand plans and pricing on Tinybird. Tinybird is not affiliated with, associated with, or sponsored by ClickHouse, Inc. ClickHouse® is a registered trademark of ClickHouse, Inc. --- URL: https://www.tinybird.co/docs/get-data-in/migrate/migrate-from-postgres Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Migrate from Postgres · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to migrate events from Postgres to Tinybird so that you can begin building performant, real-time analytics over your event data." --- # Migrate from Postgres¶ In this guide, you'll learn how to migrate events from Postgres to Tinybird so that you can begin building performant, real-time analytics over your event data. Need to create a Tinybird account? It's free! [Start here](https://www.tinybird.co/signup). ## Prerequisites¶ You'll need a [free Tinybird account](https://www.tinybird.co/signup) and a Workspace. ## At a high level¶ Postgres is an incredible general purpose database, and it can even be extended to support columnar functionality for analytics. Tinybird is a data platform for data and engineering teams to solve complex real-time, operational, and user-facing analytics use cases at any scale, with end-to-end latency in milliseconds for streaming ingest and high QPS workloads. It's a SQL-first analytics engine, purpose-built for the cloud, with real-time data ingest and full JOIN support. Native, managed ingest connectors make it easy to ingest data from a variety of sources. SQL queries can be published as production-grade, scalable REST APIs for public use or secured with JWTs. Tinybird is a managed platform that scales transparently, requiring no cluster operations, shard management, or worrying about replicas. See how Tinybird is used by industry-leading companies today in the [Customer Stories](https://www.tinybird.co/customer-stories) hub. ## Follow these steps to migrate from Postgres to Tinybird¶ Below you'll find an example walkthrough migrating 100M rows of events data from Postgres to Tinybird. You can apply the same workflow to your existing Postgres instance. If at any point you get stuck and would like assistance with your migration, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Slack Community](https://www.tinybird.co/docs/docs/community). ### The Postgres table¶ Suppose you have a table in Postgres that looks like this: postgres=# CREATE TABLE events ( id SERIAL PRIMARY KEY, timestamp TIMESTAMPTZ NOT NULL, user_id TEXT NOT NULL, session_id TEXT NOT NULL, action TEXT NOT NULL, version TEXT NOT NULL, payload TEXT NOT NULL ); The table contains 100 million rows totalling about 15GB of data: postgres=# SELECT pg_size_pretty(pg_relation_size('events')) AS size; size ------- 15 GB (1 row) The table stores website click events, including an unstructured JSON `payload` column. ### Setup¶ Within your Postgres, create a user with read only permissions over the table (or tables) you need to export: postgres=# CREATE USER tb_read_user WITH PASSWORD ''; postgres=# GRANT CONNECT ON DATABASE test_db TO tb_read_user; postgres=# GRANT USAGE ON SCHEMA public TO tb_read_user; postgres=# GRANT SELECT ON TABLE events TO tb_read_user; #### Limits¶ To perform this migration, we'll be running a series of [Copy Jobs](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes) to incrementally migrate the events from Postgres to Tinybird. We break it up into chunks so as to remain under the limits of both Tinybird and Postgres. There are two limits to take into account: 1. [ Copy Pipe limits](https://www.tinybird.co/docs/docs/get-started/plans/limits#copy-pipe-limits) : Copy Pipes have a default max execution time of 20s for Build plans, 30s for Pro plans, 30m for Enterprise plans. If you're on a Build or Pro plan and need to temporarily extend your limits to perform the migration, please reach out to us at[ support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) . 2. The max execution time of queries in Postgres. This is controlled by the `statement_timeout` setting. We recommendation that you set the value in Postgres equal or similar to the max execution time of the Copy Pipe in Tinybird. For this example, we'll use three minutes: postgres=# ALTER ROLE tb_read_user SET statement_timeout = '180000'; -- 3 minutes #### Create a local Tinybird project¶ Install [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/install) , then create a new Data Project: export TB_ADMIN_TOKEN= export TB_HOST=https://api.us-east.aws.tinybird.co #replace with your host tb auth --host $TB_HOST --token $TB_ADMIN_TOKEN tb init Create the target Data Source in Tinybird: touch datasources/events.datasource Define a schema that matches your Postgres schema, keeping in mind that Tinybird may use different data types. For our example: # datasources/events.datasource SCHEMA > `id` Int32, `timestamp` DateTime64(6), `user_id` String, `session_id` String, `action` String, `version` String, `payload` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYear(timestamp)" ENGINE_SORTING_KEY "timestamp, session_id, user_id" Push the Data Source to the Tinybird server: tb push datasources/events.datasource ### Backfilling your existing Postgres data¶ We're going to create a parameterized Copy Pipe to perform the initial backfill in chunks. We'll use a script to run the Copy Job on demand. #### Storing secrets in Tinybird¶ Start by adding two secrets to Tinybird using the [Environment Variables API](https://www.tinybird.co/docs/docs/api-reference/environment-variables-api) . This will prevent hard-coded credentials in your Copy Pipe. Create one for your Postgres username: curl \ -X POST "${TB_HOST}/v0/variables" \ -H "Authorization: Bearer ${TB_ADMIN_TOKEN}" \ -d "type=secret" \ -d "name=tb_read_user" \ -d "value=tb_read_user" And one for the password: curl \ -X POST "${TB_HOST}/v0/variables" \ -H "Authorization: Bearer ${TB_ADMIN_TOKEN}" \ -d "type=secret" \ -d "name=tb_read_password" \ -d "value=" #### Define the Copy Pipe¶ Create a new Pipe: touch pipes/backfill_postgres.pipe And paste the following code, changing the url/port, name, and table name of your Postgres based on your specific setup: NODE migrate SQL > % SELECT * FROM postgresql( 'https://your.postgres.url::port', 'your_postgres_instance_name', 'your_postgres_table name', {{tb_secret('tb_read_user')}}, {{tb_secret('tb_read_password')}}, 'public' ) WHERE timestamp > {{DateTime(from_date, '2020-01-01 00:00:00')}} --adjust based on your data AND timestamp <= {{DateTime(to_date, '2020-01-01 00:00:01')}} --use a small default range TYPE COPY TARGET_DATASOURCE events This uses the [PostgreSQL Table Function](https://www.tinybird.co/docs/docs/get-data-in/guides/postgresql) to select data from the remote Postgres table. It pushes the timestamp filters down to Postgres, incrementally querying your Postgres table and copying them into your `events` Data Source in Tinybird. Push this Pipe to the server: tb push pipes/backfill_postgres.pipe ### Backfill in one go¶ Depending on the size of your Postgres table, you may be able to perform the migration in a single Copy Job. For example, get the minimum timestamp from Postgres (and the current datetime): postgres=# SELECT min(timestamp) FROM events; min ------------------------ 2023-01-01 00:00:00+00 (1 row) ❯ date -u +"%Y-%m-%d %H:%M:%S" 2024-08-29 10:20:57 And run the Copy Job with those parameters: tb pipe copy run migrate_pg_to_events --param from_date="2023-01-01 00:00:00" --param to_date="2024-08-29 10:20:57" --wait --yes If it succeeds, you'll see something like this: ** Running migrate_pg_to_events ** Copy to 'events' job created: https://api.us-east.aws.tinybird.co/v0/jobs/4dd482f9-168b-44f7-a4c9-d1b64fc9665d ** Copying data [████████████████████████████████████] 100% ** Data copied to 'events' And you'll be able to query the resulting Data Source: tb sql "select count() from events" ------------- | count() | ------------- | 100000000 | ------------- tb sql "select count() as c, action from events group by action order by c asc" --stats ** Query took 0.228730096 seconds ** Rows read: 100,000,000 ** Bytes read: 1.48 GB ----------------------- | c | action | ----------------------- | 19996881 | logout | | 19997421 | signup | | 20000982 | purchase | | 20001649 | view | | 20003067 | click | ----------------------- Note that Copy operations in Tinybird are atomic, so a bulk backfill will either succeed or fail completely with some error. For instance, if the `statement_timeout` in Postgres isn't large enough to export the table with a single query, you'll get an error like this: ** Copy to 'copy_migrate_events_from_pg' job created: https://api.us-east.aws.tinybird.co/v0/jobs/ec58749a-f4c3-4302-9236-f8036f0cb67b ** Copying data Error: ** Failed creating copy job: ** Error while running job: There was a problem while copying data: [Error] Query cancelled due to statement timeout in postgres. Make sure you use a user with a proper statement timeout to run this type of query. In this case you can try to increaste the `statement_timeout` or try the backfilling in chunks. As a reference, copying 100M rows from Postgres to Tinybird takes about 150s if Postgres and Tinybird are in the same cloud and region. The Tinybird PostgreSQL Table Function uses internally a PostgreSQL `COPY TO` statement. You can tweak some other settings in Postgres if necessary, but usually it's not needed, so refer to your Postgres provider or admin. ### Backfilling in chunks¶ If you find that you're hitting the limits of either your Postgres or Tinybird's Copy Pipes, you can backfill in chunks. First of all, make sure your Postgres table is indexed by the column you are filtering on, in this case `timestamp`: postgres=# CREATE INDEX idx_events_timestamp ON events (timestamp); postgres=# VACUUM ANALYZE events; And make sure a query like the one sent from Tinybird will use the indexes (see the Index Scan below): postgres=# explain select * from events where timestamp > '2024-01-01 00:00:00' and timestamp <= '2024-01-02 00:00.00'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Index Scan using idx_events_timestamp on events (cost=0.57..607150.89 rows=151690 width=115) Index Cond: (("timestamp" > '2024-01-01 00:00:00+00'::timestamp with time zone) AND ("timestamp" <= '2024-01-02 00:00:00+00'::timestamp with time zone)) JIT: Functions: 2 Options: Inlining true, Optimization true, Expressions true, Deforming true (5 rows) Then run multiple Copy jobs, adjusting the amount of data copied to stay within your Postgres statement timeout and Tinybird max execution time. This is a trial and error process depending on the granularity of data. For example, here's a migration script that first tries a full backfill, and if it fails uses daily chunks: #!/bin/bash HOST="YOUR_TB_HOST" TOKEN="YOUR_TB_TOKEN" PIPE_NAME="backfill_postgres" FROM_DATE="2023-01-01 00:00:00" TO_DATE="2024-08-31 00:00:00" LOG_FILE="pipe_copy.log" run_command() { local from_date="$1" local to_date="$2" echo "Copying from $from_date to $to_date" | tee -a $LOG_FILE if output=$(tb --host $HOST --token $TOKEN pipe copy run $PIPE_NAME --param from_date="$from_date" --param to_date="$to_date" --wait --yes 2>&1); then echo "Success $from_date - $to_date" | tee -a $LOG_FILE return 0 else echo "Error $from_date - $to_date" | tee -a $LOG_FILE echo "Error detail: $output" | tee -a $LOG_FILE return 1 fi } iterate_chunks() { local from_date="$1" local to_date="$2" local current_from="$from_date" local next_to="" while [[ "$(date -d "$current_from" +"%s")" -lt "$(date -d "$to_date" +"%s")" ]]; do # End of current day (23:59:59) next_to=$(date -d "$current_from +1 day -1 second" +"%Y-%m-%d")" 23:59:59" # Adjust next_to if it's bigger than to_date if [[ "$(date -d "$next_to" +"%s")" -ge "$(date -d "$to_date" +"%s")" ]]; then next_to="$to_date" fi # Create copy job for one single day if ! run_command "$current_from" "$next_to"; then echo "Error processing $current_from to $next_to" return 1 fi # Go to next day (starting at 00:00:00) current_from=$(date -d "$(date -d "$current_from" +'%Y-%m-%d') +1 day $(date -d "$current_from" +'%H:%M:%S')" +'%Y-%m-%d %H:%M:%S') done } # Step 1: Try full backfill echo "Running full backfill..." | tee -a $LOG_FILE if ! run_command "$FROM_DATE" "$TO_DATE"; then echo "Full backfill failed, iterating in daily chunks..." | tee -a $LOG_FILE iterate_chunks "$FROM_DATE" "$TO_DATE" fi echo "Process completed." | tee -a $LOG_FILE Using either a full backfill or backfilling in chunks, you can successfully migrate your data from Postgres to Tinybird. ### Syncing new events from Postgres to Tinybird¶ The next step is keeping your Tinybird Data Source in sync with events in your Postgres as new events arrive. The steps below will show you how to use Tinybird's PostgreSQL Table Function and scheduled Copy Jobs to continually sync data from Postgres to Tinybird, however, you should consider sending future events Tinybird directly using either the [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) or another streaming Data Source connector, as this will be more resource efficient (and more real-time). #### Create the incremental Copy Pipe¶ Create another Copy Pipe to perform the incremental syncs: touch pipes/sync_events_from_pg.pipe Paste in this code, again updating your Postgres details as well as the desired schedule to sync. Note the Copy limits apply here. NODE sync_from_pg SQL > % SELECT * FROM postgresql( 'https://your.postgres.url::port', 'your_postgres_instance_name', 'your_postgres_table name', {{tb_secret('tb_read_user')}}, {{tb_secret('tb_read_password')}}, 'public' ) WHERE timestamp > (SELECT max(timestamp) FROM events) TYPE COPY TARGET_DATASOURCE events COPY_SCHEDULE */5 * * * * Push this to the Tinybird server: tb push pipes/sync_events_from_pg.pipe It's important to first complete the backfill operation before pushing the sync Pipe. The sync Pipe uses the latest timestamp in the Tinybird copy to perform a filtered select from Postgres. Failure to backfill will result in a full scan of your Postgres table on your configured schedule. Once you've pushed this Pipe, Tinybird will sync with your Postgres updates based on the schedule you set. ## Next steps¶ If you'd like assistance with your migration, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). - Set up a free Tinybird account and build a working prototype:[ Sign up here](https://www.tinybird.co/signup) . - Run through a quick example with your free account: Tinybird[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read the[ billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) to understand plans and pricing on Tinybird. --- URL: https://www.tinybird.co/docs/get-data-in/migrate/migrate-from-rockset Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Migrate from Rockset · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to migrate from Rockset to Tinybird, and the overview of how to quickly & safely recreate your setup." --- # Migrate from Rockset¶ In this guide, you'll learn how to migrate from Rockset to Tinybird, and the overview of how to quickly & safely recreate your setup. Rockset will [no longer be active](https://docs.rockset.com/documentation/docs/faq) after September 30th, 2024. This guide explains the parallels between Rockset and Tinybird features, and how to migrate to using Tinybird. Wondering how to create an account? It's free! [Start here](https://www.tinybird.co/signup). ## Prerequisites¶ You don't need an active Tinybird Workspace to read through this guide, but it's good idea to understand the foundational concepts and how Tinybird integrates with your team. If you're new to Tinybird, read the [team integration guide](https://www.tinybird.co/docs/docs/get-started/administration/team-integration-governance). ## At a high level¶ Tinybird is a great alternative to Rockset's analytical capabilities. Tinybird is a data platform for data and engineering teams to solve complex real-time, operational, and user-facing analytics use cases at any scale, with end-to-end latency in milliseconds for streaming ingest and high QPS workloads. It's a SQL-first analytics engine, purpose-built for the cloud, with real-time data ingest and full JOIN support. Native, managed ingest connectors make it easy to ingest data from a variety of sources. SQL queries can be published as production-grade, scalable REST APIs for public use or secured with JWTs. Tinybird is a managed platform that scales transparently, requiring no cluster operations, shard management or worrying about replicas. See how Tinybird is used by industry-leading companies today in the [Customer Stories](https://www.tinybird.co/customer-stories) hub. ## Concepts¶ A lot of concepts are the same between Rockset and Tinybird, and there are a handful of others that have a 1:1 mapping. In Tinybird: - Data Source: Where data is ingested and stored. - Pipe: How data is transformed. - Workspace: How data projects are organized, containing Data Sources and Pipes. - Shared Data Source: A Data Source shared between Workspaces. - Roles: Each Workspace has Admin, Guest, and Viewer roles. - Organizations: Contain all Workspaces and members. ### Key concept comparison¶ #### Data Sources¶ Super similar. Rockset and Tinybird both support ingesting data from many types of data sources. You ingest into Tinybird and create a Tinybird **Data Source** that you then have control over - you can iterate the schema, monitor your ingestion, and more. See the [Data Sources docs](https://www.tinybird.co/docs/docs/get-data-in/data-sources). #### Workspaces¶ Again, very similar. In Rockset, Workspaces contain resources like Collections, Aliases, Views, and Query Lambdas. In Tinybird, **Workspaces** serve the same purpose (holding resources), and you can also share Data Sources between *multiple* Workspaces. Enterprise users monitor and manage Workspaces using the [Organizations feature](https://www.tinybird.co/docs/docs/get-started/administration/organizations) . See the [Workspace docs](https://www.tinybird.co/docs/docs/get-started/administration/workspaces#what-is-a-workspace). #### Ingest Transformations¶ These are analogous to Tinybird's **Pipes** . It's where you transform your data. The difference is that Rockset does this on initial load (on raw data), whereas Tinybird lets you create and manage a Data Source first, then transform it however you need. See the [Pipes docs](https://www.tinybird.co/docs/docs/work-with-data/query/pipes). #### Views¶ Similar to Tinybird's **nodes** - the modular, chainable "bricks" of SQL queries that compose a Pipe. Like Views, nodes can reference resources like other nodes, Pipes, Data Sources, and more. See the [Pipes > Nodes docs](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#nodes). #### Rollups¶ The Tinybird equivalent of rollups is **Materialized Views** . Materialized Views give you a way to pre-aggregate and pre-filter large Data Sources incrementally, adding simple logic using SQL to produce a more relevant Data Source with significantly fewer rows. Put simply, Materialized Views shift computational load from query time to ingestion time, so your API Endpoints stay fast. See the [Materialized Views docs](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views). #### Query Lambdas¶ The Tinybird equivalent of Query Lambdas is **API Endpoints** . You can publish the result of any SQL query in your Tinybird Workspace as an HTTP API Endpoint. See the [API Endpoint docs](https://www.tinybird.co/docs/docs/publish/api-endpoints). ### Schemaless ingestion¶ You can do schemaless/variable schema event ingestion on Tinybird by storing the whole JSON in a column. Use the following schema in your Data Source definition and use [JSONExtract functions](https://www.tinybird.co/docs/docs/sql-reference/functions/json-functions#jsonextract-functions) to parse the result afterwards. ##### schemaless.datasource SCHEMA > `root` String `json:$` ENGINE "MergeTree" If your data has some common fields, be sure to extract them and add them to the sorting key. It's definitely possible to do schemaless, but having a defined schema is a great idea. Tinybird provides you with an easy way to manage your schema [using .datasource schema files](https://www.tinybird.co/docs/docs/get-data-in#create-your-schema). Read the docs on using the [JSONPath syntax in Tinybird](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-ndjson-data#jsonpaths-and-the-root-object) for more information. ## Ingest data and build a POC¶ Tinybird allows you to ingest your data from a variety of sources, then create Tinybird Data Sources in your Workspace that can be queried, published, materialized, and more. Just like Rockset, Tinybird supports ingestion from: - Data streams (Kafka, Kinesis). - OLTP databases (DynamoDB, MongoDB, MySQL, PostgreSQL). - Data lakes (S3, GCS). A popular option is connecting DynamoDB to Tinybird. Follow [the guide here](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) or pick another source from the side nav under "Ingest". Materialized Views give you a way to pre-aggregate and pre-filter large Data Sources incrementally, adding simple logic using SQL to produce a more relevant Data Source with significantly fewer rows. Put simply, Materialized Views shift computational load from query time to ingestion time, so your API Endpoints stay fast. ## Useful resources¶ Migrating to a new tool, especially at speed, can be challenging. Here are some helpful resources to get started on Tinybird: - Set up a[ DynamoDB Data Source](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) to start streaming data today. - Read the blog post[ "Migrating from Rockset? See how Tinybird features compare"](https://www.tinybird.co/blog-posts/migrating-from-rockset-feature-comparison) . - Read the blog post[ "A practical guide to real-time CDC with MongoDB"](https://www.tinybird.co/blog-posts/mongodb-cdc) . ## Billing and limits¶ Read the [billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) to understand how Tinybird charges for different data operations. Remember, [UI usage is free](https://www.tinybird.co/docs/docs/get-started/plans/billing#exceptions) (Pipes, Playgrounds, Time Series - anywhere you can hit a "Run" button) as is anything on a [Free plan](https://www.tinybird.co/docs/docs/get-started/plans) so get started today for free and iterate ***fast***. Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Next steps¶ If you'd like assistance with your migration, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). - Set up a free Tinybird account and build a working prototype:[ Sign up here](https://www.tinybird.co/signup) . - Run through a quick example with your free account: Tinybird[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read the[ billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) to understand plans and pricing on Tinybird. --- URL: https://www.tinybird.co/docs/get-started Content: --- title: "Get started · Tinybird Docs" theme-color: "#171612" description: "Get started with Tinybird, the data platform for building real-time data applications." --- # Get started¶ The following resources will help you get started with Tinybird: - Follow the Quick start guide. See[ Quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Learn about the different plans and pricing options. See[ Plans](https://www.tinybird.co/docs/docs/get-started/plans) . - Discover all the integrations available in Tinybird. See[ Integrations](https://www.tinybird.co/docs/docs/get-started/integrations) . - Familiarize yourself with how organizations and workspaces work in Tinybird. See[ Administration](https://www.tinybird.co/docs/docs/get-started/administration) . --- URL: https://www.tinybird.co/docs/get-started/administration Content: --- title: "Administration · Tinybird Docs" theme-color: "#171612" description: "Manage your organization and workspace settings, create tokens, invite users, and more." --- # Administration¶ Effective administration of your Tinybird account involves managing several key components: - [ Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations) : Create and manage organizations to group workspaces and team members. - [ Workspaces](https://www.tinybird.co/docs/docs/get-started/administration/workspaces) : Set up isolated environments for your data projects. - [ Auth Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) : Generate and control access tokens for secure API authentication. - [ Governance](https://www.tinybird.co/docs/docs/get-started/administration/team-integration-governance) : Implement best practices for team collaboration and security --- URL: https://www.tinybird.co/docs/get-started/administration/auth-tokens Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Tokens · Tinybird Docs" theme-color: "#171612" description: "Tokens allow you to secure your Tinybird resources, providing fine-grained access control to your users. Learn all about Tokens here!" --- # Tokens¶ Tokens protect access to your Tinybird resources. Any operations to manage your Tinybird resources using the CLI or REST API require a valid Token with the necessary permissions. Access to the APIs you publish in Tinybird are also protected with Tokens. Tokens can have different scopes. This means you can limit which operations a specific Token can do. You can create Tokens that are, for example, only able to do admin operations on Tinybird resources, or only have `READ` permission for a specific Data Source. Tinybird represents tokens using the icon. ## Token types¶ There are two types of Tokens: - Static[ Tokens](https://www.tinybird.co/docs/about:blank#tokens) : Used when performing operations on your account, like importing data, creating Data Sources, or publishing APIs using the CLI or REST API. - [ JWT Tokens](https://www.tinybird.co/docs/about:blank#json-web-tokens-jwts) : Used when publishing an API that exposes your data to an application. ## Tokens¶ Tinybird Tokens, also known as static Tokens, are permanent and long-term. They're stored inside Tinybird and don't have an expiration date or time. They're useful for backend-to-backend integrations, where you call Tinybird as another service. ### Token scopes¶ When a Token is created, you can give it a set of zero or more scopes that define which tables can be accessed by that Token, and which methods can be used to access them. A `READ` Token can be augmented with an SQL filter. A filter allows you to further restrict what data a Token grants access to. Using a filter, you can also [implement row-level security](https://www.tinybird.co/blog-posts/row-level-security-in-tinybird) on a `READ` -scoped Token. The following scopes are available: | Value | Description | | --- | --- | | `DATASOURCES:CREATE` | Enables your Token to create and append data to Data Sources. | | `DATASOURCES:APPEND:datasource_name` | Allows your Token to append data to the defined Data Sources. | | `DATASOURCES:DROP:datasource_name` | Allows your Token to delete the specified Data Sources. | | `DATASOURCES:READ:datasource_name` | Gives your Token read permissions for the specified Data Sources. Also gives read for the quarantine Data Source. | | `DATASOURCES:READ:datasource_name:sql_filter` | Gives your Token read permissions for the specified table with the `sql_filter` applied. | | `PIPES:CREATE` | Allows your Token to create new Pipes and manipulate existing ones. | | `PIPES:DROP:pipe_name` | Allows your Token to delete the specified Pipe. | | `PIPES:READ:pipe_name` | Gives your Token read permissions for the specified Pipe. | | `PIPES:READ:pipe_name:sql_filter` | Gives your Token read permissions for the specified Pipe with the `sql_filter` applied. | | `TOKENS` | Gives your Token the capacity of managing Tokens. | | `ADMIN` | All permissions are granted. Do not use this Token except in really specific cases. | | `ORG_DATASOURCES:READ` | Gives your Token the capacity of reading[ organization service datasources](https://www.tinybird.co/docs/docs/monitoring/service-datasources#organization-service-data-sources) , without using an org admin-level token. | When adding the `DATASOURCES:READ` scope to a Token, it automatically gives read permissions to the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources#the-quarantine-data-source) associated with it. Applying Tokens with filters to queries that use the FINAL clause isn't supported. If you need to apply auth filters to deduplications, use an alternative strategy. See [deduplication strategies](https://www.tinybird.co/docs/docs/work-with-data/strategies/deduplication-strategies#different-alternatives-based-on-your-requirements). ### Default Workspace Tokens¶ All Workspaces are created with a set of basic Tokens that you can add to by creating additional Tokens: - `admin token` for that Workspace, used for signing JWTs. - `admin token` for that Workspace that belongs specifically to your user account for CLI usage. - `create datasource token` for creating Data Sources in that Workspace. - `user token` for creating new Workspaces or deleting ones where are an admin. Some Tokens are created automatically by Tinybird during certain operations like scheduled copies, and [can be updated](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes#change-copy-pipe-token-reference). ### User Token¶ Your User Token is specific to your user account. It's a permanent Token that allows you to perform operations that aren't limited to a single Workspace, such as creating new Workspaces. You can only obtain your User Token from your Workspace by going to **Tokens** and retrieving the `user token`. ### Create a Token¶ In the Tinybird UI, navigate to **Tokens** > **Plus (+) icon** . Rename the new Token and update its scopes using the previous table as a guide. ## JSON Web Tokens (JWTs) BETA¶ JWTs are currently in public beta. They aren't feature-complete and may change in the future. If you have any feedback or suggestions, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). JWTs are signed tokens that allow you to securely authorize and share data between your application and Tinybird. Unlike static Tokens, JWTs are not stored in Tinybird. They're created by you, inside your application, and signed with a shared secret between your application and Tinybird. Tinybird validates the signature of the JWT, using the shared secret, to ensure it's authentic. ### When to use JWTs¶ The primary purpose for JWTs is to allow your app to call Tinybird API Endpoints from the frontend without proxying through your backend. If you are building an application where a frontend component needs data from a Tinybird API Endpoint, you can use JWTs to authorize the request directly from the frontend. The typical pattern looks like this: 1. A user starts a session in your application. 2. The frontend requests a JWT from your backend. 3. Your backend generates a new JWT, signed with the Tinybird shared secret, and returns to the frontend. 4. The frontend uses the JWT to call the Tinybird API Endpoints directly. ### JWT payload¶ The payload of a JWT is a JSON object that contains the following fields: | Key | Example Value | Required | Description | | --- | --- | --- | --- | | workspace_id | workspaces_id | Yes | The UUID of your Tinybird Workspace, found in the Workspace list. See[ Workspace ID](https://www.tinybird.co/docs/docs/get-started/administration/workspaces#workspace-id) . | | name | frontend_jwt | Yes | Used to identify the token in the `tinybird.pipe_stats_rt` table, useful for analytics. Doesn't need to be unique. | | exp | 123123123123 | Yes | The Unix timestamp (UTC) showing the expiry date & time. After a token has expired, Tinybird returns a 403 HTTP status code. | | scopes | [{"type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": {"org_id": "testing"}}] | Yes | Used to pass data to Tinybird, including the Tinybird scope, resources and fixed parameters. | | scopes.type | PIPES:READ | Yes | The type of scope, for example `READ` . See[ JWT scopes](https://www.tinybird.co/docs/about:blank#jwt-scopes) for supported scopes. | | scopes.resource | t_b9427fe2bcd543d1a8923d18c094e8c1 or top_airlines | Yes | The ID or name of the Pipe that the scope applies to, like which API Endpoint the token can access. | | scopes.fixed_params | {"org_id": "testing"} | No | Pass arbitrary fixed values to the API Endpoint. These values can be accessed by Pipe templates to supply dynamic values at query time. | | limits | {"rps": 10} | No | You can limit the number of requests per second the JWT can perform. See[ JWT rate limit](https://www.tinybird.co/docs/about:blank#rate-limits-for-jwt-tokens) . | Check out the [JWT example](https://www.tinybird.co/docs/about:blank#jwt-example) to see what a complete payload looks like. ### JWT algorithm¶ Tinybird always uses HS256 as the algorithm for JWTs and doesn't read the `alg` field in the JWT header. You can skip the `alg` field in the header. ### JWT scopes¶ | Value | Description | | --- | --- | | `PIPES:READ:pipe_name` | Gives your Token read permissions for the specified Pipe | ### JWT expiration¶ JWTs can have an expiration time that gives each Token a finite lifespan. Setting the `exp` field in the JWT payload is mandatory, and not setting it results in a 403 HTTP status code from Tinybird when requesting the API Endpoint. Tinybird validates that a JWT hasn't expired before allowing access to the API Endpoint. If a Token has expired, Tinybird returns a 403 HTTP status code. ### JWT fixed parameters¶ Fixed parameters allow you to pass arbitrary values to the API Endpoint. These values can be accessed by Pipe templates to supply dynamic values at query time. For example, consider the following fixed parameter: ##### Example fixed parameters { "fixed_params": { "org_id": "testing" } } This passes a parameter called `org_id` with the value `testing` to the API Endpoint. You can then use this value in your SQL queries: ##### Example SQL query SELECT fieldA, fieldB FROM my_pipe WHERE org_id = '{{ String(org_id) }}' This is particularly useful when you want to pass dynamic values to an API Endpoint that are set by your backend and must be safe from user tampering. A good example is multi-tenant applications that require row-level security, where you need to filter data based on a user or tenant ID. The value `org_id` is always the one specified in the `fixed_params` . Even if you specify a new value in the URL when requesting the endpoint, Tinybird always uses the one specified in the JWT. You can use JWT fixed parameters in combination with Pipe [dynamic parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters). ### JWT example¶ Consider the following payload with all [required and optional fields](https://www.tinybird.co/docs/about:blank#jwt-payload): ##### Example payload { "workspace_id": "workspaces_id", "name": "frontend_jwt", "exp": 123123123123, "scopes": [ { "type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": { "org_id": "testing" } } ], "limits": { "rps": 10 } } Use the Admin Token from your Workspace to sign the payload, for example: ##### Example Workspace Admin Token p.eyJ1IjogIjA1ZDhiYmI0LTdlYjctNDAzZS05NGEyLWM0MzFhNDBkMWFjZSIsICJpZCI6ICI3NzUxMDUzMC0xZjE4LTRkNzMtOTNmNS0zM2MxM2NjMDUxNTUiLCAiaG9zdCI6ICJldV9zaGFyZWQifQ.Xzh4Qjz0FMRDXFuFIWPI-3DWEC6y-RFBfm_wE3_Qp2M With the payload and Admin Token, the signed JWT payload would look like this: ##### Example JWT eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ3b3Jrc3BhY2VfaWQiOiIzMTA0OGI3Ni01MmU4LTQ5N2ItOTBhNC0wYzZhNTUxMzkyMGQiLCJuYW1lIjoiZnJvbnRlbmRfand0IiwiZXhwIjoxMjMxMjMxMjMxMjMsInNjb3BlcyI6W3sidHlwZSI6IlBJUEVTOlJFQUQiLCJyZXNvdXJjZSI6ImVhNDdmZDlkLWJjNDgtNDIwZC1hNmY2LTk1NDgxZmJiM2Y3YyIsImZpeGVkX3BhcmFtcyI6eyJvcmdfaWQiOiJ0ZXN0aW5nIn19XSwiaWF0IjoxNzE3MDYzNzQwfQ.t-9BRLI6MrhOAuvt1mBSTBTU7TOdJFunBjr78TuqpVg ### JWT limitations¶ The following limitations apply to JWTs: - You can't refresh JWTs individually from inside Tinybird as they aren't stored in Tinybird. You must do this from your application, or you can globally invalidate all JWTs by refreshing your Admin Token. - If you refresh your Admin Token, all the tokens are invalidated. - If your token expires or is invalidated, you get a 403 HTTP status code from Tinybird when requesting the API Endpoint. ### Create a JWT in production¶ There is wide support for creating JWTs in many programming languages and frameworks. Any library that supports JWTs should work with Tinybird. A common library to use with Python is [PyJWT](https://github.com/jpadilla/pyjwt/tree/master) . Common libraries for JavaScript are [jsonwebtoken](https://github.com/auth0/node-jsonwebtoken#readme) and [jose](https://github.com/panva/jose). - JavaScript (Next.js) - Python ##### Create a JWT in Python using pyjwt import jwt import datetime import os TINYBIRD_SIGNING_KEY = os.getenv('TINYBIRD_SIGNING_KEY') def generate_jwt(): expiration_time = datetime.datetime.utcnow() + datetime.timedelta(hours=3) payload = { "workspace_id": "workspaces_id", "name": "frontend_jwt", "exp": expiration_time, "scopes": [ { "type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": { "org_id": "testing" } } ] } return jwt.encode(payload, TINYBIRD_SIGNING_KEY, algorithm='HS256') ### Create a JWT using the CLI or the API¶ If for any reason you don't want to generate a JWT on your own, Tinybird provides an API and a CLI utility to create JWTs. - API - CLI ##### Create a JWT with the Tinybird CLI tb token create jwt my_jwt --ttl 1h --scope PIPES:READ --resource my_pipe --filters "column_name=value" ### Error handling¶ There are many reasons why a request might return a `403` status code. When a `403` is received, check the following: 1. Confirm the JWT is valid and hasn't expired. The expiration time is in the `exp` field in the JWT's payload. 2. The generated JWTs can only read Tinybird API Endpoints. Confirm you're not trying to use the JWT to access other APIs. 3. Confirm the JWT has a scope to read the endpoint you are trying to read. Check the payload of the JWT at[ https://jwt.io/](https://jwt.io/) . 4. If you generated the JWT outside of Tinybird, without using the API or the CLI, make sure you are using the** Workspace** `admin token` , not your personal one. ### Rate limits for JWTs¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. When you specify a `limits.rps` field in the payload of the JWT, Tinybird uses the name specified in the payload of the JWT to track the number of requests being done. If the number of requests goes beyond the limit, Tinybird starts rejecting new requests and returns an "HTTP 429 Too Many Requests" error. See [limits docs](https://www.tinybird.co/docs/docs/get-started/plans/limits) for more information. The following example shows the tracking of all requests done by `frontend_jwt` . Once you reach 10 requests per second, Tinybird would start rejecting requests: ##### Example payload with global rate limit { "workspace_id": "workspaces_id", "name": "frontend_jwt", "exp": 123123123123, "scopes": [ { "type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": { "org_id": "testing" } } ], "limits": { "rps": 10 } } If `rps <= 0` , Tinybird ignores the limit and assumes there is no limit. As the `name` field doesn't have to be unique, all the tokens generated using the `name=frontend_jwt` would be under the same umbrella. This can be useful if you want to have a global limit in one of your apps or components. If you want to limit for each specific user, you can generate a JWT using the following payload. In this case, you would specify a unique name so the limits only apply to each user: ##### Example of a payload with isolated rate limit { "workspace_id": "workspaces_id", "name": "frontend_jwt_user_", "exp": 123123123123, "scopes": [ { "type": "PIPES:READ", "resource": "requests_per_day", "fixed_params": { "org_id": "testing" } } ], "limits": { "rps": 10 } } ## Monitor Token usage¶ You can monitor Token usage using Tinybird's Service Data Sources. See ["Monitor API Performance"](https://www.tinybird.co/docs/docs/monitoring/analyze-endpoints-performance#example-4-monitor-usage-of-tokens). ## Next steps¶ - Follow a walkthrough guide:[ How to consume APIs in a Next.js frontend with JWTs](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-apis-nextjs) . - Read about the Tinybird[ Tokens API](https://www.tinybird.co/docs/docs/api-reference/token-api) . - Understand[ Branches](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/branches) in Tinybird. --- URL: https://www.tinybird.co/docs/get-started/administration/organizations Last update: 2024-12-13T10:17:28.000Z Content: --- title: "Organizations · Tinybird Docs" theme-color: "#171612" description: "Tinybird Organizations provide enterprise customers with a single pane of glass to monitor usage across multiple Workspaces." --- # Organizations¶ The Tinybird organizations feature is only available to Enterprise or Dedicated plan customers. Organizations provide a centralized way of managing Workspaces and members in a region. From the Organizations section you can monitor resource usage and check your current plan's usage and billing if you're on an Enterprise or Dedicated plan. See [Plans](https://www.tinybird.co/docs/docs/get-started/plans). The Organizations screen consists of the following areas: - Billing - Overview - Workspaces - Members - Monitoring Tinybird represents organizations using the icon. ## Access the organizations section¶ To access the Organizations section, log in as an administrator and select your Workspace name, then select your organization name. ## Usage overview¶ The **Usage** page shows details about your platform usage against your billing plan commitment followed by a detailed breakdown of your consumption. Only billable Workspaces are in this view. Find non-billable Workspaces in the **Workspaces** tab. ### Processed data¶ The first metric shows an aggregated summary of your processed data. This is aggregated across all billable Workspaces included in your plan. Processed data is cumulative over the plan's billing period. ### Storage¶ The second metric shows an aggregated summary of your current storage. This is aggregated across all billable Workspaces included in your plan. Storage is the maximum storage used in the past day. ### Contract¶ The third metric shows the details of your current contract, including the plan type and start/end dates of your plan period. If your plan includes usage limits, for example commitments on an Enterprise plan, your commitment details are also shown here. For both the **Processed data** and **Storage** metrics, the summary covers the current billing period. For **Enterprise** plans this covers the term of your current contract. For monthly plans, it's the current month. After the summary, the page shows a breakdown of Processed data and Storage per Workspace and Data Source. The calculation of these metrics is the same as previously explained for the summary section, but on an individual basis. ## Workspaces¶ This page displays details of all your Workspaces, their consumption, and whether they're billable or not. Using the date range selector at the top of the page, you can adjust the time of the data displayed in the table. The table shows the following information: - ** Workspace name** - ** Processed data** : Processed data is cumulative over the selected time range. - ** Storage** : Storage is the maximum storage used on the last day of the selected time range. - ** Plan type** : Billable or free. Usage in a billable Workspace counts towards your billing plan. New Workspaces that are created by a user with an email domain linked to (or matching) an Organization are automatically added to that Organization. The new Workspace then automatically shows up here in your Organization's Consumption metrics and listed Workspaces. To delete a Workspace, select the checkbox of a Workspace name, followed by the **Delete** button. You don't need to be a user in that Workspace to delete it. ## Members¶ **Members** shows details of your Organization members, the Workspaces they belong to, and their roles. User roles: - ** Admins** can do everything in the Workspace. - ** Guests** can do most things, but they can't delete Workspaces, invite or remove users, or share Data Sources across Workspaces. - ** Viewers** can't edit anything in the main Workspace[ Branch](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/branches) , but they can use[ Playgrounds](https://www.tinybird.co/docs/docs/work-with-data/query#use-the-playground) to query the data, as well as create or edit Branches. The table shows the following information: - Email - Workspaces and roles To view the detail of a member's Workspaces and roles, select the arrow next to the Workspace count. A menu shows all the Workspaces that user is part of, plus their role in each Workspace. To change a user's role or remove them from a Workspace, hover over the Workspace name and follow the arrow. Select a new role from **Admin**, **Guest** , or **Viewer** , or remove them from the Workspace. You don't need to be a user in that Workspace to make changes to its users. As mentioned, you can also make a user an organization admin from this page. To remove a user from the organization, select **Remove member** in the menu. You can see if there are Workspaces where that user is the only admin and if the Token associated to the email has had activity in the last 7 days. ### Add an organization admin¶ To add another user as an organization administrator, follow these steps: 1. Navigate to the** Your organization** page. 2. Go to the** Members** section. 3. Locate the user you want to make an administrator. 4. Select** Organization Admin** next to their name. This grants administrator access to the selected users. ## Monitoring endpoints¶ To monitor the usage of your Organization use the [Organization Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources#organization-service-data-sources). The endpoints page shows details about the APIs that allow you to export, or integrate external tools with, your usage data. There are two APIs available: Processed data and Storage. ### Processed data ¶ The first API shows a daily aggregated summary of your processed data per Workspaces. ### Storage ¶ The last API shows a daily aggregated summary of your current storage per Workspaces. You can select the time by editing the parameters in the URL. **Processed data** | Field | Type | Description | | --- | --- | --- | | day | DateTime | Day of the record | | workspace_id | String | ID of the Workspace. | | read_bytes | UInt64 | Bytes read in the Workspace that day | | written_bytes | UInt64 | Bytes written in the Workspace that day | **Storage** | Field | Type | Description | | --- | --- | --- | | day | DateTime | Day of the record | | workspace_id | String | ID of the Workspace. | | bytes | UInt64 | Maximum Bytes stored in the Workspace that day | | bytes_quarantine | UInt64 | Maximum Bytes stored in the Workspace quarantine that day | ## Dedicated infrastructure monitoring¶ The following features are in public beta and may change without notice. If you have feedback or suggestions, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). If your organization is on an infrastructure commitment plan, Tinybird offers two ways of monitoring the state of your dedicated clusters: using the `organization.metrics_logs` service Data Source, or through the Prometheus endpoint `/v0/metrics` , which you can integrate with the observability platform of your choice. ### Billing dashboard¶ You can track your credits usage from the **Billing** section under **Your organization** . The dashboard shows your cumulative credits usage and estimated trend against the total, and warns you if you're about to run out of credits. For more details, you can access your customer portal using the direct link. <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcredits-usage.png&w=3840&q=75) ### Cluster load chart¶ You can check the current load of your clusters using the chart under **Your organization**, **Usage** . Select a cluster in the menu to see all its hosts, then select a time. Each line represents the CPU usage of the host. When you select a host, the dotted line represents the total amount of CPUs available to the cluster. <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcluster-load-chart.png&w=3840&q=75) --- URL: https://www.tinybird.co/docs/get-started/administration/team-integration-governance Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Team integration and data governance · Tinybird Docs" theme-color: "#171612" description: "Learn how different teams work with Tinybird, and how Tinybird supports your team to manage data." --- # Team integration and data governance¶ Tinybird supports a wide range of industries and products. Customers organize themselves and their businesses in different ways, but there are overarching principles you can adopt and adapt. Knowing how to integrate your team with Tinybird is important to get the most value out of the platform. ## Before you start¶ You need to be familiar with the following [core Tinybird concepts](https://www.tinybird.co/docs/docs/get-started/quick-start/core-concepts): - Data Source: Where data is ingested and stored. - Pipe: How data is transformed. - Workspace: How data projects are organized, containing Data Sources and Pipes. - Shared Data Source: A Data Source shared between Workspaces. - Roles: Each Workspace has Admin, Guest, and Viewer roles. - Organizations: Contain all Workspaces and members. ## Roles and responsibilities¶ Tinybird allows you to share Data Sources across Workspaces. This means you can create Workspaces that map your organization, and not have to duplicate the Data Sources. In general, most Tinybird users have an ingestion Workspace where the owners of the data ingest the data, clean the data, and prepare it for onward consumption. They then share these Data Sources with other internal teams using the [Sharing Data Sources](https://www.tinybird.co/docs/docs/get-started/administration/workspaces#sharing-data-sources-between-workspaces) feature. You can have as many ingestion Workspaces as you need: bigger organizations group their Workspaces by domain, some organizations group them by team. For instance, in a bank, you might find different teams managing their own data and therefore several ingestion Workspaces where data is curated and exposed to other teams. In this case, each team maps to a domain: <-figure-> ![Diagram showing each ingestion team mapping to a specific domain](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguide-team-integration-governance-team-1.png&w=3840&q=75) However, in other companies where data is centralized and managed by the data platform, you might find a single ingestion Workspace where all the data is ingested and shared with other onward Workspaces where specific domain users build their own use cases: <-figure-> ![Diagram showing each ingestion team mapping to a specific domain](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguide-team-integration-governance-team-2.png&w=3840&q=75) Some organizations rely on a hybrid solution, where data is provided by the data platform but each domain group also ingest their own data: <-figure-> ![Diagram showing each ingestion team mapping to a specific domain](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguide-team-integration-governance-team-3.png&w=3840&q=75) Whatever your approach, it's an established pattern to have an ingestion or data-platform Workspace or team who own ingestion and data preparation, and share the desired Data Sources with other teams. These downstream, domain-specific teams then create the Pipe logic specific to their own area, usually in a Workspace specifically for that domain. That way, the responsibilities reflect a manageable, clear separation of concerns. ## Enforcing data governance¶ Tinybird supports your data governance efforts in the following ways: - ** Availability** is assured by the platform's uptime[ SLA](https://www.tinybird.co/terms-and-conditions) . Tinybird wants all your teams to be able to access all the data they need, any time they need. You can monitor availability using[ monitoring tools](https://www.tinybird.co/docs/about:blank#monitoring) , which includes monitoring ingestion, API Endpoints, and quarantined rows. Tinybird offers a straightforward way to reingest quarantined rows and maintain Materialized Views to automatically reingest data. - ** Control** over data access is managed through a single Organization page in Tinybird. You can enforce the principle of least privilege by assigning different roles to Workspace members, and easily check data quality and consumption using Tinybird's Service Data Sources. Tinybird also supports schema evolution and you can[ keep multiple schema versions running](https://www.tinybird.co/docs/docs/get-data-in/data-operations/iterate-a-data-source) at the same time so consumers can adjust at their own pace. - ** Usability** is maximized by having ingestion Workspaces. They allow you to share cleaned, curated data, with specific and adjusted schemas giving consumers precisely what they need. Workspace members have the flexibility to create as many Workspaces as they need, and[ use the Playground feature](https://www.tinybird.co/docs/docs/work-with-data/query#use-the-playground) to sandbox new ideas. - ** Consistency** : Data owners have responsibility over what they want to share with others. You can monitor which Workspace is ingesting data. - ** Data integrity and quality** , especially at scale and at speed, is essential. Just like availability, it's a use case for leveraging Tinybird's monitoring capabilities. See[ Additional ecosystem tools](https://www.tinybird.co/docs/about:blank#additional-ecosystem-tools) . Ingestion teams can build Pipes to monitor everything about their inbound data and create alerts. Alerts can be technical or business-related. - ** Data security** : This information is available at the top-level Organizations page and also in individual Workspaces. ## Additional ecosystem tools¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. Tinybird is built around the idea of data that changes or grows continuously, and provides the following tools as part of Tinybird. These tools help you get insights on and monitor your Workspaces, data, and resources. ### Operations log¶ The [Operations log](https://www.tinybird.co/docs/docs/monitoring/health-checks#operations-log) shows information on each individual Data Source, including its size, the number of rows, the number of rows in the quarantine Data Source (if any), and when it was last updated. The Operations log contains details of the events for the Data Source, which are displayed as the results of the query. Use it to see every single call made to the system (API call, ingestion, jobs). This is helpful if you're concerned about one specific Data Source and need to investigate. ### Monitoring¶ You can use the [Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations) feature for managing Workspaces and Members, and monitoring their entire consolidated Tinybird consumption in one place. For example, you can track costs and usage for each individual Workspace. ### Testing¶ To ensure that all production load is efficient and accurate, all Tinybird API Endpoints that you create in your Workspaces can be tested before going to production. You can do this by [using version control](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work). ### Alerts and health checks¶ To ensure everything is working as expected once you're in production, any team can create alerts and health checks on top of Tinybird's [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources). ## Next steps¶ - Watch this video to understand how Factorial sets domain boundaries and[ organizes their teams](https://youtu.be/8rctUKRXcdw?t=574) . - Build something fast and fun - follow the[ quick start tutorial](https://www.tinybird.co/docs/docs/get-started/quick-start) . --- URL: https://www.tinybird.co/docs/get-started/administration/workspaces Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Workspaces · Tinybird Docs" theme-color: "#171612" description: "Workspaces are containers for all of your Tinybird resources. Learn all about Workspaces here!" --- # Workspaces¶ A Workspace is a set of Tinybird resources, like Data Sources, Pipes, nodes, API Endpoints, and Tokens. Workspaces are always created inside organizations. See [Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations). You can use Workspaces to manage separate projects, use cases, and dev, staging, or production environments in Tinybird. Each Workspace has administrators and members who can view and edit resources. Tinybird represents Workspaces using the icon. ## Create a Workspace¶ To create a new Workspace, select the name of an existing Workspace. In the menu, select **Create Workspace (+)**. Complete the dialog with the details of your new Workspace, and select "Create Workspace". Workspaces must have unique names within a region. ### Create a Workspace using the CLI¶ To create a new Workspace using the CLI, use the following command: tb workspace create You can use this command interactively or by providing the required inputs with flags. To use it interactively, run the command without any flags. For example, `tb workspace create`. Supply [your user Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#your-user-token) by pasting it into the prompt. Next, supply a name for your Workspace. Workspaces must have unique names within a region. Workspace name [new_workspace_9479]: internal_creating_new_workspaces_example When Tinybird creates the Workspace, a message similar to the following appears: ** Workspace 'internal_creating_new_workspaces_example' has been created If you are using the CLI in an automated system, you can instead pass each value using flags. For example: tb workspace create --user_token --starter-kit 1 internal_creating_new_workspaces_example ## Workspace ID¶ The Workspace ID is a unique identifier for each Workspace. You can copy the Workspace ID from the list of Workspaces in the UI, or by selecting the More actions (⋯) icon and selecting **Copy ID** in the Workspace settings. To find the Workspace ID using the Tinybird CLI, run `tb workspace current` from the CLI: tb workspace current ** Current workspace: -------------------------------------------------------------------------------------------- | name | id | role | plan | current | -------------------------------------------------------------------------------------------- | tinybird_web_analytics | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | guest | Custom | True | -------------------------------------------------------------------------------------------- ## Delete a Workspace¶ Deleting a Workspace deletes all resources within the Workspace, including Data Sources, ingested data, Pipes, and published API Endpoints. Deleted Workspaces can't be recovered. To delete a Workspace, select the **Settings** icon. In the dialog that appears, select **Advanced Settings** and then select **Delete Workspace** . Confirm the deletion. ### Delete a Workspace using the CLI¶ To delete a Workspace using the CLI, use the following command: tb workspace delete Provide the name of the Workspace and [your user Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#your-user-token) . For example: tb workspace delete my_workspace --user_token ## Manage Workspace members¶ You can invite as many members to a Workspace as you want. A member can belong to multiple Workspaces. Members always have a role assigned. You can modify the role of a user at any time. Tinybird has the following member roles for Workspaces: | Role | Manage resources | Manage users | Access to billing information | Create a branch | | --- | --- | --- | --- | --- | | `Admin` | Yes | Yes | Yes | Yes | | `Guest` | Yes | No | No | Yes | | `Viewer` | No | No | No | Yes | ### Manage Workspace members in the UI¶ In the top right corner of the Tinybird UI, select the Cog icon. In the modal, navigate to "Members" to review any members already part of your Workspace. Add a new member by entering their email address and confirming their role from the dropdown options. You can invite multiple users at the same time by adding multiple email addresses separated by a comma. The users you invite will get an email notifying them that they have been invited. If they don't already have a Tinybird account, they will be prompted to create one to accept your invite. Invited users appear in the user management modal and by default have the **Guest** role. If the user loses their invite link, you can resend it here too, or copy the link to your clipboard. You can also remove members from here using the "..." menu and selecting "Remove". ### Adding Workspace users in the CLI¶ To add new users, use the following command: tb workspace members add Supply your [user Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#your-user-token) using the `--user_token` flag. Add the email address of the user you want to invite as the final argument to the command. tb workspace members add --user_token my_new_team_member@example.com A successful invite returns the following output: ** User my_new_team_member@example.com added to workspace 'internal_getting_started_guide' ## Share Data Sources between Workspaces¶ Sometimes you might want to share resources between different teams or projects. For example, you might have a Data Source of events data that multiple teams want to use to build features with. Rather than duplicating this data, you can share it between multiple Workspaces. To share a Data Source, follow these steps: 1. Open the Data Source that you want to share. 2. When you hover over the Data Source, select the** More actions (⋯)** icon. 3. In the menu, select** Share** . 4. In the** Search** box, type the name of the Workspace that you want to share the Data Source with. You can only share Data Sources with Workspaces that you are a member of. 5. Select the Workspace. The Workspace appears in the** Shared** section. 6. Select** Done** to close the dialog. You can't share Data Sources between Workspaces in different regions. ### Share Data Sources using the CLI¶ To share a Data Source, use the following command: tb datasource share Supply [your user Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#your-user-token) using the `--user_token` flag. Then, supply the Data Source name and target Workspace name as the following arguments: tb datasource share --user_token shopping_data my_second_workspace The Data Source that you want to share must exist in the Workspace that your Tinybird CLI is authenticated against. To check which Workspace your CLI is currently authenticated with, use: tb auth info You can also run `tb push --user_token ` to push a Data Source with a `SHARED_WITH` parameter to share it with another Workspace. ## Regions¶ A Workspace belongs to one region. The following table lists the available regions and their corresponding API base URLs: | Region | Provider | Provider region | API base URL | | --- | --- | --- | --- | | Europe | GCP | europe-west3 | [ https://api.tinybird.co](https://api.tinybird.co/) | | US East | GCP | us-east4 | [ https://api.us-east.tinybird.co](https://api.us-east.tinybird.co/) | | Europe | AWS | eu-central-1 | [ https://api.eu-central-1.aws.tinybird.co](https://api.eu-central-1.aws.tinybird.co/) | | US East | AWS | us-east-1 | [ https://api.us-east.aws.tinybird.co](https://api.us-east.aws.tinybird.co/) | | US West | AWS | us-west-2 | [ https://api.us-west-2.aws.tinybird.co](https://api.us-west-2.aws.tinybird.co/) | Additional regions for GCP, AWS, and Azure are available for Enterprise customers. Tinybird documentation uses `https://api.tinybird.co` as the default example API base URL. If you aren't using the Europe GCP region, replace the URL with the API base URL for your region. ## Single Sign-On (SSO)¶ Tinybird provides email and OAuth providers for logging in into the platform. If you have a requirement for SSO integration, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Secure cloud connections¶ Tinybird supports TLS across all ingest connectors, providing encryption on the wire for incoming data. If you have a requirement for secure cloud connections, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). --- URL: https://www.tinybird.co/docs/get-started/architecture Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Architecture · Tinybird Docs" theme-color: "#171612" description: "Frequently asked questions about Tinybird architecture" --- # Architecture¶ Tinybird is the analytical backend for your applications: it consumes data from any source and exposes it through API Endpoints. Tinybird can sit parallel to the Data Warehouse or in front of it. The Data Warehouse allows to explore use cases like BI and data science, while Tinybird unlocks action use cases like operational applications, embedded analytics, and user-facing analytics. Read the guide ["Team integration and data governance"](https://www.tinybird.co/docs/docs/get-started/administration/team-integration-governance) to learn more about implementing Tinybird in your existing team. ## Is data stored in Tinybird?¶ The data you ingest into Tinybird is stored in a high performance, columnar OLAP database. Depending on the use case, you can add a TTL to control how long data is stored. ## Cloud environment vs on-premises¶ Tinybird is a managed SaaS solution. Tinybird doesn't provide a "Bring Your Own Cloud" (BYOC) or on-premises deployments. If you are interested in a BYOC deployment, join the [Tinybird BYOC Waitlist](https://faster.tinybird.co/byoc-waitlist). ## Tinybird Local container¶ You can run Tinybird locally for testing and development purposes using the [Tinybird Local container](https://www.tinybird.co/docs/docs/cli/local-container). ## Next steps¶ - Explore[ Tinybird's Customer Stories](https://www.tinybird.co/customer-stories) and see what people have built on Tinybird. - Start building using the[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read the guide[ "Team integration and data governance"](https://www.tinybird.co/docs/docs/get-started/administration/team-integration-governance) to learn more about implementing Tinybird in your existing team. --- URL: https://www.tinybird.co/docs/get-started/compliance Last update: 2024-12-12T09:38:12.000Z Content: --- title: "Compliance and certifications · Tinybird Docs" theme-color: "#171612" description: "Tinybird is committed to the highest data security and safety. See what compliance certifications are available." --- # Compliance and certifications¶ Data security and privacy are paramount in today's digital landscape. Tinybird's commitment to protecting your sensitive information is backed by the following compliance certifications, which ensure that we meet rigorous industry standards for data security, privacy, and operational excellence. ## SOC 2 Type II¶ Tinybird has obtained a SOC 2 Type II certification, in accordance with attestation standards established by the American Institute of Certified Public Accountants (AICPA), that are relevant to security, availability, processing integrity, confidentiality, and privacy for Tinybird's real-time platform for user-facing analytics. Compliance is monitored continually—with reports published annually—to confirm the robustness of Tinybird's data security. This independent assessment provides Tinybird users with assurance that their sensitive information is being handled responsibly and securely. ## HIPAA¶ Tinybird supports its customers' Health Insurance Portability and Accountability Act (HIPAA) compliance efforts by offering Business Associate Agreements (BAAs). Additionally, Tinybird's offering allows customers to process their data constituting personal health information (PHI) in AWS, Azure, or Google Cloud—entities which themselves have entered into BAAs with Tinybird. ## Trust center¶ To learn more about Tinybird security controls and certifications, visit the [Tinybird Trust Center](https://trust.tinybird.co/). --- URL: https://www.tinybird.co/docs/get-started/integrations Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Integrations · Tinybird Docs" theme-color: "#171612" description: "Connect Tinybird to your database, data warehouse, streaming platform, devtools, and other applications." --- # Integrations¶ You can integrate Tinybird with various data sources, data sinks, and devtools to support your use case. - ** Native Integrations** are built and maintained by Tinybird, and integrated into the Tinybird product. - ** Guided Integrations** aren't built and maintained by Tinybird, but utilize native Tinybird APIs and/or functionality in the external tool. ## List of integrations¶ Native ## Amazon DynamoDB Use the DynamoDB Connector to ingest historical and change stream data from Amazon... [Docs](https://www.tinybird.co/docs/get-data-in/connectors/dynamodb) Guided ## Amazon Kinesis Learn how to send data from AWS Kinesis to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-aws-kinesis) Native ## Amazon S3 Use the S3 Connector to ingest files from your Amazon S3 buckets into Tinybird. [Source Docs](https://www.tinybird.co/docs/get-data-in/connectors/s3), [Sink Docs](https://www.tinybird.co/docs/publish/sinks/s3-sink) Guided ## Amazon SNS SNS is a popular pub/sub messaging system for AWS users. Here's how to use SNS to send... [Guide](https://www.tinybird.co/blog-posts/use-aws-sns-to-send-data-to-tinybird) Native ## Apache Kafka Use the Kafka Connector to ingest data streams from your Kafka cluster into Tinybird. [Source Docs](https://www.tinybird.co/docs/get-data-in/connectors/kafka), [Sink Docs](https://www.tinybird.co/docs/publish/sinks/kafka-sink) Guided ## Auth0 Log Streams Learn how to send Auth0 Logs Streams to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-auth0-logs) Guided ## Clerk With Clerk you can easily manage user auth. By integrating Clerk with Tinybird, you can... [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-clerk) Native ## Confluent Cloud Connect Tinybird to your Confluent Cloud cluster, select a topic, and Tinybird... [Docs](https://www.tinybird.co/docs/get-data-in/connectors/confluent) Guided ## Dub Learn how to connect Dub webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-dub) Guided ## Estuary Learn how to use Estuary to push data streams to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-with-estuary) Guided ## GitHub Learn how to connect GitHub Webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-github) Guided ## GitLab Learn how to connect GitLab Webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-gitlab) Native ## Google BigQuery Use the BigQuery Connector to load data from BigQuery into Tinybird. [Docs](https://www.tinybird.co/docs/get-data-in/connectors/bigquery) Guided ## Google Cloud Storage Learn how to automatically synchronize all the CSV files in a Google GCS bucket to a... [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-google-gcs) Guided ## Google Pub/Sub Learn how to send data from Google Pub/Sub to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-google-pubsub) Guided ## Grafana Learn how to create Grafana Dashboards and Alerts consuming Tinybird API Endpoints. [Guide](https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-api-endpoints-in-grafana) Guided ## Knock Learn how to connect Knock outbound webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-knock) Guided ## Mailgun Learn how to connect Mailgun webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-mailgun) Guided ## MongoDB Learn how to ingest data into Tinybird from MongoDB [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-mongodb) Guided ## MySQL A step-by-step guide to setting up Change Data Capture (CDC) with MySQL, Confluent Cloud,... [Guide](https://www.tinybird.co/blog-posts/mysql-cdc) Guided ## Orb Events Learn how to configure a Orb webhook to send events to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-orb) Guided ## PagerDuty Learn how to connect PagerDuty Webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-pagerduty) Native ## PostgreSQL The Tinybird postgresql() table function allows you to read data from your existing... [Docs](https://www.tinybird.co/docs/get-data-in/guides/postgresql) Native ## Prometheus Learn how to consume API Endpoints in Prometheus format. [Guide](https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format) Guided ## Python logs Learn how to send Python logs to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/python-sdk) Native ## Redpanda The Redpanda Connector allows you to ingest data from your existing Redpanda cluster and... [Docs](https://www.tinybird.co/docs/get-data-in/connectors/redpanda) Guided ## Resend With Resend you can send and receive emails programmatically. By integrating Resend with... [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-resend) Guided ## Rudderstack Learn two different methods to send events from RudderStack to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-rudderstack) Guided ## Sentry Learn how to connect Sentry Webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-sentry) Native ## Snowflake The Snowflake Connector is fully managed and requires no additional tooling. [Docs](https://www.tinybird.co/docs/get-data-in/connectors/snowflake) Guided ## Stripe Learn how to connect Stripe webhooks to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-stripe) Guided ## Trigger.dev Learn how to reliably trigger Tinybird jobs with Trigger.dev. [Guide](https://www.tinybird.co/docs/publish/api-endpoints/guides/reliable-scheduling-with-trigger) Guided ## Vercel Integration This integration will allow you to link your Tinybird Workspaces with your Vercel... [Guide](https://www.tinybird.co/docs/work-with-data/organize-your-work/integrating-vercel) Guided ## Vercel Log Drains Learn how to connect Vercel Log Drains to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-vercel-logdrains) Guided ## Vercel Webhooks Learn how to send Vercel events to Tinybird. [Guide](https://www.tinybird.co/docs/get-data-in/guides/ingest-from-vercel) --- URL: https://www.tinybird.co/docs/get-started/plans Content: --- title: "Plans · Tinybird Docs" theme-color: "#171612" description: "The three Tinybird plans explained in one place. Get building today!" --- # Tinybird plans¶ Tinybird has the following plan options: Build, Professional, and Enterprise. You can [upgrade your plan](https://www.tinybird.co/docs/about:blank#upgrade-your-plan) at any time. ## Build¶ The Build plan is free. It provides you with a full-featured, production-grade instance of the Tinybird platform, including all managed ingest connectors, real time querying, and managed API Endpoints. There is no time limit to the Build plan, meaning you can develop using this plan for as long as you want. There are no limits on the number of team seats, Data Sources, or API Endpoints. Support is available through the [Community Slack](https://www.tinybird.co/docs/docs/community) , which is monitored by the Tinybird team. Build plan usage limits: - ** Up to 10 GB of compressed data storage.** This is the total amount of compressed data you're storing, including Data Sources and Materialized Views. - ** Up to 1,000 requests per day to your API Endpoints.** This limit applies to the[ API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints) that you publish from your SQL queries, and queries executed using the[ Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) . The limit doesn't apply to the[ Tinybird REST API](https://www.tinybird.co/docs/docs/api-reference) or[ Events API](https://www.tinybird.co/docs/docs/api-reference/events-api) . The Build plan is suited for development and experimentation. Many Professional and Enterprise customers use Build plan Workspaces to develop and test new use cases before deploying to their production billed Workspaces. See the [billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) for more information. ## Professional¶ The Professional plan is a usage-based plan that scales with you as you grow. When your application is ready for production, you can upgrade any Workspace on the Build plan to the Professional plan. The Professional plan includes all the Tinybird product features as the Build plan, and removes the usage limits for data storage, processed bytes, and API Endpoint requests. This means that you can store as much data, handle as many API requests, and process as much data as you need with no artificial limits. In addition to the [Community Slack](https://www.tinybird.co/docs/docs/community) , Professional customers can also contact the Tinybird support team through email at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). The Professional plan requires a valid payment method, such as a credit card. Billing is as follows: - ** Data storage is billed at US$0.34 per GB** , with no limit on the amount of data storage. - ** Processed data is billed at US$0.07 per GB** , with no limit on the amount of processed data. - ** Transferred data is billed at US$0.01 - $0.10 per GB** , depending on cloud provider or region, with no limit on the amount of transferred data. See the [billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) for more information. ### Upgrade your plan¶ As you approach the usage limits of the Build plan, you might receive emails and see dashboard banners about upgrading. As a Workspace admin: 1. View your usage indicators, like monthly processed and stored data, by selecting the cog icon in the navigation pane and selecting the** Usage** tab. 2. Select the** Upgrade to pro** button to enter your card details and upgrade your plan to Professional. The following screenshot shows to access the **Usage** tab and the location of the **Upgrade to pro** button: <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fplans-usage-and-upgrade.png&w=3840&q=75) ## Enterprise¶ As the scale of your Tinybird storage and processing grows, you can customize an Enterprise plan to meet your needs. Enterprise plans can include volume discounts, service-level agreements (SLA), dedicated infrastructure, and a direct Slack-connect support channel. If you're interested in discussing the Enterprise plan, [contact the Tinybird Sales team](https://www.tinybird.co/contact-us) for more information. ## Dedicated¶ The Dedicated plan provides you with a Tinybird cluster with at least two database servers. On a Dedicated plan, you're the only customer on your cluster. Your queries and outputs are more performant as a result. ### Understand billing¶ Dedicated plans are billed every month according to the amount of credits you've used. Credits are a way of tracking your usage of Tinybird's infrastructure and features. The following table shows how Tinybird calculates credits usage for each resource: | Resource | Explanation | | --- | --- | | Clusters | Cluster size, tracked every 15 minutes. Cluster size changes are detected automatically and billed accordingly. | | Storage | Compressed disk storage of all your data. Calculated daily, in terabytes, using the maximum value of the day. | | Data transfer | When using[ Sinks](https://www.tinybird.co/docs/docs/api-reference/sink-pipes-api) , usage is billed depending on the destination, which can be the same cloud provider and region as your Tinybird cluster, or a different one. | | Support | Premier or Enterprise monthly support fee. | | Private Link | Billed monthly. | ### Rate limiter¶ In Dedicated plans, the rate limiter monitors the status of the cluster and limits the number of concurrent requests to prevent the cluster from crashing due to insufficient memory. This allows the cluster to continue working, albeit with a rate limit. The rate limiter activates when the following situation occurs: - When total memory usage in a host in the cluster is over 70% in clusters with less than 64GB of memory per host and 80% in the rest. - Percentage of 408 Timeout Exceeded and 500 Internal Server Error due to memory limits for a Pipe endpoint exceeds 10% of the total requests. If both conditions are met, the maximum number of concurrent requests to the Pipe endpoint is limited proportionally to the percentage of errors. Workspace administrators receive an email indicating the affected Pipe endpoints and the concurrency limit. The rate limiter rolls back after 5 minutes and it's activated again if the previously described conditions repeat. For example, if a Pipe endpoint is receiving 10 requests per second and 5 failed during a high memory usage scenario due to a timeout or memory error, the number of concurrent queries is limited to a half, that is, 5 concurrent requests for that specific Pipe endpoint. While the Rate limiter is active, endpoints return a 429 HTTP status code. You can retry those requests using a backoff mechanism. For example, you can space requests 1 second between each other. ### Track invoices¶ In Dedicated plans, invoices are issued upon credits purchase, which can happen when signing the contract or when purchasing additional credits. You can check your invoices from the customer portal. ### Monitor usage¶ You can monitor credits usage, including remaining credits, cluster usage, and current commitment through your organization's dashboard. See [Dedicated infrastructure monitoring](https://www.tinybird.co/docs/docs/get-started/administration/organizations#dedicated-infrastructure-monitoring) . You can also check usage using the monthly usage receipts. ## Next steps¶ - Explore[ Tinybird's Customer Stories](https://www.tinybird.co/customer-stories) and see what people have built on Tinybird. - Start building now using the[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read the[ billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) to understand which data operations count towards your bill, and how to optimize your usage. --- URL: https://www.tinybird.co/docs/get-started/plans/billing Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Billing, plans, and pricing · Tinybird Docs" theme-color: "#171612" description: "Information about billing, what it's based on, as well as Tinybird pricing plans." --- # Billing¶ Tinybird billing is based on the pricing of different data operations, such as storage, processing, and transfer. If you are on a Professional plan, read on to learn how billing works. If you're an Enterprise customer, contact us at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) to reduce unit prices as part of volume discounts and Enterprise plan commitments. See [Tinybird plans](https://www.tinybird.co/docs/docs/get-started/plans). ## At a glance¶ - Data storage: US$0.34 per GB - Data processing (read or write): US$0.07 per GB - Data transfer (outbound): US$0.01 to US$0.10 per GB To see the full breakdown for each individual operation, skip to [billing breakdown](https://www.tinybird.co/docs/about:blank#billing-breakdown). ## Data storage¶ Data storage refers to the disk storage of all the data you keep in Tinybird. Data storage is priced at **US$0.34 per GB** , regardless of the region. Data storage is usually the smallest part of your Tinybird bill. Your [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources) use the largest percentage of storage. ### Compression¶ Data storage pricing is based on the volume of storage used after compression, calculated on the last day of every month. The exact rate of compression varies depending on your data. You can expect a compression factor of between 3x to 10x. For example, with a compression factor of 3.5x, if you import 100 GB of uncompressed data, that translates to approximately 28.6 GB compressed. In that case, your bill would be based on the final 28.6 GB of stored data. ### Version control¶ If your Workspace uses the Tinybird Git integration, only data storage associated with the production Workspace, and not Branches, is included when determining the storage bill. Remove historical data to lower your storage bill. You can configure a [time-to-live (TTL)](https://www.tinybird.co/docs/docs/get-data-in/data-sources#setting-data-source-ttl) on any Data Source, which deletes data older than a given time. This gives you control over how much data is retained in a Data Source. A common pattern is to ingest raw data, materialize it, and clear out the raw data with a TTL to reduce storage. ## Data processing¶ Data processing is split into write and read activities. All processed data is priced at **US$0.07 per GB**. ### Write activities¶ You write data whenever you ingest into Tinybird. When you create, append, delete, or replace data in a [Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources) , or write data to a Materialized View, you are writing data. ### Read activities¶ You read data when you run queries against your Data Sources to generate responses to API Endpoint requests. You also read data when you make requests to the [Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) . The only exception is when you're manually running a query. See [Exceptions](https://www.tinybird.co/docs/about:blank#exceptions) for more information. Read activities also include the amount of data fetched to generate API Endpoint responses. For example, if 10 MB of data is processed to generate one API Endpoint response, you would be billed for 10 MB. If the same API Endpoint is called 10 times, that would be 10 x 10 MB, and you would be billed for 100 MB of processed data in total. Even if there are no rows in a response, you could be billed for it, so create your queries with care. For example, if you read 1 billion rows but the query returns no rows because of the endpoint filters, you have still read 1 billion rows. Bad 4xx Copy Pipes, API and Query Endpoint requests like timeouts or memory usage errors are also billed. You can check these errors using the `pipe_stats_rt`, `pipe_stats` and `datasources_ops_log` [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-pipe-stats-rt). ### Materialized Views¶ [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) involve both read and write operations, plus data storage operations. Whenever you add new data to a Materialized View, you are writing to it. However, there is no charge when you first create and populate a Materialized View. Only incremental updates are billed. Because Materialized Views typically process and store only a fraction of the data that you ingest into Tinybird, the cost of Materialized Views is usually minimal. ### Compression¶ You data processing bill might be impacted by compression. Depending on the operation being performed, data is handled in different ways and it isn't always possible to predict exact levels of read or written bytes in advance for all customers. The best option is to query the [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) and analyze your results. ### Version control¶ If your Workspace uses the Tinybird Git integration feature, only data processing associated with the production Workspace, and not Branches, is included when determining the amount of processed data. Typically, data processing is the largest percentage of your Tinybird bill. This is why, as you scale, you should [optimize your queries](https://www.tinybird.co/docs/docs/work-with-data/query/sql-best-practices) , understand [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) , and [analyze the performance of your API Endpoints](https://www.tinybird.co/docs/docs/monitoring/analyze-endpoints-performance). Tinybird works with customers on a daily basis to help optimize their queries and reduce data processing, sometimes reducing their processed data by over 10x. If you need support to optimize your use case, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or through the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Data transfer¶ Currently, the only service to incur data transfer costs is Tinybird [AWS S3 Sink](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink) . If you're not using this Sink, you aren't charged any data transfer costs. Tinybird S3 Sink incurs both data transfer and data processing (read) costs. See [AWS S3 Sink Billing](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink#billing). Data transfer depends on your environment. There are two possible scenarios: - Destination bucket is in the same cloud provider and region as your Tinybird Workspace: US$0.01 per GB - Destination bucket is in a different cloud provider or region as your Tinybird Workspace: US$0.10 per GB ## Exceptions¶ The following operations are free and don't count towards billing: - Anything on a Build plan. - Any operation that doesn't involve processing, storing, or transferring data: - API calls to the Tokens, Jobs, or Analyze Endpoints. - Management operations over resources like Sinks or Pipes (create, update, delete, get details), or Data Sources (create, get details; update & delete incur cost). - Populating a Materialized View with historical data (only inserting new data into an existing MV is billed). - Manual query executions made inside the UI (Pipes, Time Series, Playground). Anywhere you can press the "Run" button, that's free. - Queries to Service Data Sources. - Any time data is deleted as a result of TTL operations. ## Monitor your usage¶ Users on any plan can monitor their usage. To see an at-a-glance overview, select the cog icon in the navigation and select the **Usage** tab: <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fbilling-plans-usage.png&w=3840&q=75) You can also check your usage by querying the data available in the [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) . These Data Sources contain all the internal data about your Tinybird usage, and you can query them using Pipes like any other Data Source. This means you can publish the results as an API Endpoint, and [build charts in Grafana](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-api-endpoints-in-grafana), [export to DataDog](https://www.tinybird.co/blog-posts/how-to-monitor-tinybird-using-datadog-with-vector-dev) , and more. Queries made to Service Data Sources are free of charge and don't count towards your usage. However, calls to API Endpoints that use Service Data Sources do count towards API rate limits. Users on any plan can use the strategies outlined in the ["Monitor your ingestion"](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) guide. If you're an Enterprise customer, check your [Consumption overview in the Organizations UI](https://www.tinybird.co/docs/docs/get-started/administration/organizations#consumption-overview). ## Reduce your bill¶ The way you reduce your Tinybird overall bill is by reducing your stored, processed, and transferred data. | Type of data | How to reduce | | --- | --- | | Stored data | To reduce stored data, pick the right sorting keys based on your queries, and use ([ Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) to process data on ingestion. | | Processed data | To reduce processed data, use Materialized Views and implement a[ TTL on raw data](https://www.tinybird.co/docs/docs/get-data-in/data-sources#setting-data-source-ttl) . | | Transferred data | To reduce** transferred data** costs, make sure you're transferring data in the same cloud region. | See the [Optimization guide](https://www.tinybird.co/docs/docs/work-with-data/optimization) to learn how to optimize your projects and queries and reduce your bill. ## Billing breakdown¶ The following tables provide details on each operation, grouped by main user action. ### Data ingestion¶ | Service | Operation | Processing fee | Description | | --- | --- | --- | --- | | Data Sources API | Write | US$0.07 per GB | Low frequency: Append data to an existing Data Source (imports, backfilling, and so on). | | Events API | Write | US$0.07 per GB | High frequency: Insert events in real-time (individual or batched). | | Connectors | Write | US$0.07 per GB | Any connector that ingests data into Tinybird (Kafka, S3, GCS, BigQuery, and so on). | ### Data manipulation¶ | Service | Operation | Processing fee | Description | | --- | --- | --- | --- | | Pipes API | Read | US$0.07 per GB | Interactions with Pipes to retrieve data from Tinybird generate read operations. | | Query API | Read | US$0.07 per GB | Interactions with the Query API to retrieve data from Tinybird. | | Materialized Views (Populate) | Read/Write | Free | Executed as soon as you create the MV to populate it. Tinybird doesn't charge any processing fee. Data is written into a new or existing Data Source. | | Materialized Views (Append) | Read/Write | US$0.07 per GB | New data is read from an origin Data Source, filtered, and written to a destination Data Source. | | Copy Pipes | Read/Write | US$0.07 per GB | On-demand or scheduled operations. Data is read from the Data Source, filtered, and written to a destination Data Source. | | Replace | Read/Write | US$0.07 per GB | Replacing data entirely or selectively. | | Delete data | Read/Write | US$0.07 per GB | Selective data delete from a Data Source. | | Delete an entire Data Source | Read/Write | US$0.07 per GB | Delete all the data inside a Data Source. | | Truncate | Write | US$0.07 per GB | Delete all the data from a Data Source. | | Time-to-live (TTL) operations | Write | Free | Anytime Tinybird deletes data as a result of a TTL. | | BI Connector | Read | US$0.07 per GB | Data read from Tinybird using the BI connector. | ### Data transfer¶ | Service | Operation | Processing fee | Data transfer fee | Description | | --- | --- | --- | --- | --- | | S3 Sink | Read/Transfer (no write fees) | US$0.07 per GB | Same region: US$0.01 per GB. Different region: US$0.10 per GB | Data is read, filtered, and then transferred to the destination bucket. This is an on-demand or scheduled operation. Data transfer fees apply. | ## Next steps¶ - [ Sign up for a free Tinybird account and follow the quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Get the most from your Workspace, for free: Learn more about[ using the Playground and Time Series](https://www.tinybird.co/docs/docs/work-with-data/query#use-the-playground) . - Explore different[ Tinybird plans](https://www.tinybird.co/docs/docs/get-started/plans) and find the right one for you. --- URL: https://www.tinybird.co/docs/get-started/plans/limits Last update: 2025-01-20T09:54:04.000Z Content: --- title: "Limits · Tinybird Docs" theme-color: "#171612" description: "Tinybird has limits on certain operations and processes to ensure the highest performance." --- # Limits¶ Tinybird has limits on certain operations and processes to ensure the highest performance. ## Workspace limits¶ | Description | Limit | | --- | --- | | Number of Workspaces | Default 90 (soft limit; ask to increase) | | Number of seats | Default 90 (soft limit; ask to increase) | | Number of branches (including `main` ) | Default 4 (soft limit; ask to increase) | | Number of Data Sources | Default 100 (soft limit; ask to increase) | | Number of Tokens | 100,000 (If you need more you should take a look at[ JWT tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#json-web-tokens-jwts) ) | | Number of secrets | 100 | | Queries per second | Default 20 (Soft limit. Contact Tinybird Support to increase it.) | See [Rate limits for JWTs](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#json-web-tokens-jwts) for more detail specifically on JWT limits. ## Ingestion limits¶ | Description | Limit | | --- | --- | | Data Source max columns | 500 | | Full body upload | 8MB | | Multipart upload - CSV and NDJSON | 500MB | | Multipart upload - Parquet | 50MB | | Max file size - Parquet - Build plan | 1GB | | Max file size - Parquet - Pro and Enterprise plan | 5GB | | Max file size (uncompressed) - Build plan | 10GB | | Max file size (uncompressed) - Pro and Enterprise plan | 32GB | | Kafka topics | Default 5 (soft limit; ask to increase) | | Max parts created at once - NDJSON/Parquet jobs and Events API | 12 | ### Ingestion limits (API)¶ Tinybird throttles requests based on the capacity. So if your queries are using 100% resources you might not be able to run more queries until the running ones finish. | Description | Limit and time window | | --- | --- | | Request size - Events API | 10MB | | Response size | 100MB | | Create Data Source from schema | 25 times per minute | | Create Data Source from file or URL* | 5 times per minute | | Append data to Data Source* | 5 times per minute | | Append data to Data Source using v0/events | 1,000 times per second | | Replace data in a Data Source* | 5 times per minute | - The quota is shared at Workspaces level when creating, appending data, or replacing data. For example, you can't do 5 requests of each type per minute, for a total of 15 requests. You can do at most a grand total of 5 requests of those types combined. The number of rows in append requests doesn't impact the ingestion limit; each request counts as a single ingestion. If you exceed your rate limit, your request will be throttled and you will receive *HTTP 429 Too Many Requests* response codes from the API. Each response contains a set of HTTP headers with your current rate limit status. | Header Name | Description | | --- | --- | | `X-RateLimit-Limit` | The maximum number of requests you're permitted to make in the current limit window. | | `X-RateLimit-Remaining` | The number of requests remaining in the current rate limit window. | | `X-RateLimit-Reset` | The time in seconds after the current rate limit window resets. | | `Retry-After` | The time to wait before making a another request. Only present on 429 responses. | ### BigQuery Connector limits¶ The import jobs run in a pool, with capacity for up to 2 concurrent jobs. If more scheduled jobs overlap, they're queued. | Description | Limit and time window | | --- | --- | | Maximum frequency for the scheduled jobs | 5 minutes | | Maximum rows per append or replace | 50 million rows. Exports that exceed this number of rows are truncated to this amount | You can't pause a Data Source with an ongoing import. You must wait for the import to finish before pausing the Data Source. ### DynamoDB Connector limits¶ | Description | Limit and time window | | --- | --- | | Storage | 500 GB | | Throughput | 250 Write Capacity Units (WCU), equivalent to 250 writes of at most 1 KB per second | ### Snowflake Connector limits¶ The import jobs run in a pool, with capacity for up to 2 concurrent jobs. If more scheduled jobs overlap, they're queued. | Description | Limit and time window | | --- | --- | | Maximum frequency for the scheduled jobs | 5 minutes | | Maximum rows per append or replace | 50 million rows. Exports that exceed this number of rows are truncated to this amount | You can't pause a Data Source with an ongoing import. You must wait for the import to finish before pausing the Data Source. ## Query limits¶ | Description | Limit | | --- | --- | | SQL length | 8KB | | Result length | 100 MB | | Query execution time | 10 seconds | If you exceed your rate limit, your request will be throttled and you will receive *HTTP 429 Too Many Requests* response codes from the API. Each response contains a set of HTTP headers with your current rate limit status. | Header Name | Description | | --- | --- | | `X-RateLimit-Limit` | The maximum number of requests you're permitted to make in the current limit window. | | `X-RateLimit-Remaining` | The number of requests remaining in the current rate limit window. | | `X-RateLimit-Reset` | The time in seconds after the current rate limit window resets. | | `Retry-After` | The time to wait before making a another request. Only present on 429 responses. | ### Query timeouts¶ If query execution time exceeds the default limit of 10 seconds, an error message appears. Long execution times hint at issues that need to be fixed in the query or the Data Source schema. To avoid query timeouts, optimize your queries to remove inefficiencies and common mistakes. See [Optimizations](https://www.tinybird.co/docs/docs/work-with-data/optimization) for advice on how to detect and solve issues in your queries that might cause timeouts. If you still need to increase the timeout limit, contact support. See [Get help](https://www.tinybird.co/docs/docs/get-started/plans/support#get-help). Only paid accounts can raise the timeout limit. ## Publishing limits¶ ### Materialized Views limits¶ No numerical limits, certain operations are [inadvisable when using Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views#limitations). ### Sink limits¶ Sink Pipes have the following limits, depending on your billing plan: | Plan | Sink Pipes per Workspace | Execution time | Frequency | Memory usage per query | Active jobs (running or queued) | | --- | --- | --- | --- | --- | --- | | Pro | 3 | 30s | Up to every 10 min | 10 GB | 3 | | Enterprise | 10 | 300s | Up to every minute | 10 GB | 6 | ### Copy Pipe limits¶ Copy Pipes have the following limits, depending on your billing plan: | Plan | Copy Pipes per Workspace | Execution time | Frequency | Active jobs (running or queued) | | --- | --- | --- | --- | --- | | Build | 1 | 20s | Once an hour | 1 | | Pro | 3 | 30s | Up to every 10 minutes | 3 | | Enterprise | 10 | 50% of the scheduling period, 30 minutes max | Up to every minute | 6 | ## Delete limits¶ Delete jobs have the following limits, depending on your billing plan: | Plan | Active delete jobs per Workspace | | --- | --- | | Build | 1 | | Pro | 3 | | Enterprise | 6 | ## Next steps¶ - Understand how Tinybird[ plans and billing work](https://www.tinybird.co/docs/docs/get-started/plans/billing) . - Explore popular use cases for user-facing analytics (like dashboards) in Tinybird's[ Use Case Hub](https://www.tinybird.co/docs/docs/use-cases) . --- URL: https://www.tinybird.co/docs/get-started/plans/support Last update: 2024-12-18T21:03:56.000Z Content: --- title: "Support · Tinybird Docs" theme-color: "#171612" description: "Tinybird is here to help. Learn about our support options." --- # Support¶ Tinybird provides support through different channels depending on your plan. See [Plans](https://www.tinybird.co/docs/docs/get-started/plans). Read on to learn more about support options and common solutions. ## Channels¶ Tinybird provides support through the following channels depending on your plan: | Plan | Channels | | --- | --- | | Free | Support for the Free plan is available through the[ Community Slack](https://www.tinybird.co/docs/docs/community) , which is monitored by the Tinybird team. | | Dev | In addition to the Community Slack, priority support is provided through[ support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) . | | Enterprise | In addition to priority email support, Enterprise customers can request a dedicated Slack channel for direct support. | ## Integrated troubleshooting¶ Tinybird tries to give you direct feedback and notifications if it spots anything going wrong. Use Tinybird's [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) to get more details on what's going on in your data and queries. ## Recover deleted items¶ Tinybird creates and backs up daily snapshots and retains them for 7 days. ## Copy or move data¶ To copy data between Data Sources, use [Copy Pipes](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes). ## Recover data from quarantine¶ The quickest way to recover rows from quarantine is to fix the cause of the errors and then reingest the data. See [Recover data from quarantine](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine#recovering-rows-from-quarantine). You can also use the [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) , like `datasources_ops_log`. ## Get help¶ If you haven't been able to solve the issue, or it looks like there is a problem on Tinybird's side, get in touch. You can always contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). If you have an Enterprise account with Tinybird, contact us using your shared Slack channel. --- URL: https://www.tinybird.co/docs/get-started/quick-start Content: --- title: "Get started with Tinybird · Tinybird Docs" theme-color: "#171612" description: "Get started with Tinybird as quickly as possible. Ingest, query, and publish data in minutes." --- # Get started with Tinybird¶ With Tinybird, you can ingest data from anywhere, query and transform it using SQL, and publish it as high-concurrency, low-latency REST API endpoints. Read on to learn how to create a Workspace, ingest data, create a query, publish an API, and confirm your setup works properly using the Tinybird user interface. 1 ## Create your Tinybird account¶ [Create a Tinybird account](https://www.tinybird.co/signup) . It's free and no credit card is required. See [Tinybird pricing plans](https://www.tinybird.co/docs/docs/get-started/plans/billing) for more information. [Sign up for Tinybird](https://www.tinybird.co/signup) 2 ## Select your cloud provider and region¶ When logging in to Tinybird, select the cloud provider and region you want to work in. ![Select your region](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-region-select.png&w=3840&q=75) 3 ## Create your Workspace¶ A [Workspace](https://www.tinybird.co/docs/docs/get-started/administration/workspaces) is an area that contains a set of Tinybird resources, including Data Sources, Pipes, nodes, API Endpoints, and Tokens. Create a Workspace named `customer_rewards` . The name must be unique. ![Create a Workspace](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-create-workspace.png&w=3840&q=75) 4 ## Download and ingest sample data¶ Download the following sample data from a fictitious online coffee shop: [Download data file](https://www.tinybird.co/docs/docs/assets/sample-data-files/orders.ndjson) Select **File Upload** and follow the instructions to load the file. ![Upload a file with data](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-file-upload.png&w=3840&q=75) Select **Create Data Source** to automatically create the `orders` [Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources). ![Create a Data Source](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-create-data-source.png&w=3840&q=75) 5 ## Query data using a Pipe¶ You can create [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) to query your data using SQL. To create a Pipe, select **Pipes** and then **Create Pipe**. Name your Pipe `rewards` and add the following SQL: select count() from orders Select the node name and change it to `rewards_count`. ![Create a Pipe](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-create-pipe.png&w=3840&q=75) Select **Run** to preview the result of your Pipe. 6 ## Publish your query as an API¶ You can turn any Pipe into a high-concurrency, low-latency API Endpoint. Select **Create API Endpoint** and then select the `rewards_count` node in the menu. ![Create an API Endpoint](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-create-api.png&w=3840&q=75) 7 ## Call your API¶ You can test your API endpoint using a curl command. Go to the **Output** section of the API page and select the **cURL** tab. Copy the curl command into a Terminal window and run it. ![Test your API Endpoint](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fquickstartui-test-api.png&w=3840&q=75) Congratulations! You have created your first API Endpoint in Tinybird. ## Next steps¶ - Check the[ Tinybird CLI Quick start](https://www.tinybird.co/docs/docs/cli/quick-start) . - Try a template to get started quickly. See[ Templates](https://www.tinybird.co/templates) . - Learn more about[ User-Facing Analytics](https://www.tinybird.co/docs/docs/use-cases) in the Use Case Hub. - Learn about[ Tinybird Charts](https://www.tinybird.co/docs/docs/publish/charts) and build beautiful visualizations for your API endpoints. --- URL: https://www.tinybird.co/docs/get-started/quick-start/core-concepts Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Core concepts · Tinybird Docs" theme-color: "#171612" description: "Find Tinybird-related core terms and their definitions." --- # Core concepts¶ Familiarize yourself with Tinybird's core concepts and terminology to get a better understanding of how Tinybird works and how you can make the most of its features. ## Workspaces¶ Workspaces help you organize and collaborate on your Tinybird projects. You can have more than one Workspace. A Workspace contains the project resources, data, and state. You can share resources, such as Pipes or Data Sources, between Workspaces. You can also invite users to your Workspaces and define their role and permissions. A typical usage of Workspaces is to provide a team or project with a space to work in. See [Workspaces](https://www.tinybird.co/docs/docs/get-started/administration/workspaces) for more information. ## Data Sources¶ Data Sources are how you ingest and store data in Tinybird. All your data lives inside a Data Source, and you write SQL queries against Data Sources. You can bulk upload or stream data into a Data Source, and they support several different incoming data formats, such as CSV, JSON, and Parquet. See [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources) for more information. ## Pipes¶ Pipes are how you write SQL logic in Tinybird. Pipes are a collection of one or more SQL queries chained together and compiled into a single query. Pipes let you break larger queries down into smaller queries that are easier to read. You can publish Pipes as API Endpoints, copy them, and create Materialized Views. See [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) for more information. ## Nodes¶ A node is a single SQL `SELECT` statement that selects data from a Data Source or another node or API Endpoint. Nodes live within [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes). ## API Endpoints¶ You can build your SQL logic inside a Pipe and then publish the result of your query as an HTTP API Endpoint. See [API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints) for more information. ## Charts¶ Charts visualize your data. You can create and publish Charts in Tinybird from your published API Endpoints. See [Charts](https://www.tinybird.co/docs/docs/publish/charts) for more information. ## Tokens¶ Tokens authorize requests. Tokens can be static for back-end integrations, or custom JWTs for front-end applications. See [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) for more information. ## Branches¶ Branches let you create a copy of your Workspace where you can make changes, run tests, and develop new features. You can then merge the changes back into the original Workspace. See [Branches](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/branches) for more information. ## CLI¶ Use the Tinybird command line interface (CLI) to interact with Tinybird from the terminal. You can install it on your local machine or embed it into your CI/CD pipelines. See [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/quick-start) for more information. ## Next steps¶ - Understand Tinybird's[ underlying architecture](https://www.tinybird.co/docs/docs/get-started/architecture) . - Check out the[ Tinybird Quick Start](https://www.tinybird.co/docs/docs/get-started/quick-start) . --- URL: https://www.tinybird.co/docs/get-started/quick-start/leaderboard Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build a real-time game leaderboard · Tinybird Docs" theme-color: "#171612" description: "Learn how to build a real-time leaderboard using Tinybird." --- # Build a real-time game leaderboard¶ Read on to learn how to build a real-time leaderboard using Tinybird. Leaderboards are a visual representation that ranks things by one or more attributes. For gaming use cases, commonly-displayed attributes include total points scored, high game scores, and number of games played. Leaderboards are used for far more than games. For example, app developers use leaderboards to display miles biked, donations raised, documentation pages visited most often, and countless other examples - basically, anywhere there is some user attribute that can be ranked to compare results. This tutorial is a great starting point for building your own leaderboard. [GitHub Repository](https://github.com/tinybirdco/demo-user-facing-leaderboard) <-figure-> ![A fast, fun leaderboard built on Tinybird](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fleaderboard-tutorial-2.png&w=3840&q=75) The tutorial consists of the following steps: 1. Generate a mock game event stream that mimics a high-intensity Flappybird global tournament. 2. Post these mock events to your Tinybird Workspace using the Events API. 3. Transform (rank) this data using Tinybird Pipes and SQL. 4. Optimize your data handling with a Materialized View. 5. Publish the results as a Tinybird API Endpoint. 6. Generate a leaderboard that makes calls to your API Endpoint securely and directly from the browser. Each time a leaderboard request is made, up-to-the-second results are returned for the leaderboard app to render. When embedded in the game, a `leaderboard` API Endpoint is requested when a game ends. The app makes requests on a specified interval and has a button for ad-hoc requests. Game events consist of three values: - `core` - Generated when a point is scored. - `game_over` - Generated when a game ends. - `purchase` - Generated when a 'make-the-game-easier' coupon is redeemed. Each event object has the following JSON structure: ##### Example JSON event object { "session_id": "1f2c8bcf-8a5b-4eb1-90bf-8726e63d81b7", "name": "Marley", "timestamp": "2024-06-20T19:06:15.373Z", "type": "game_over", "event": "Mockingbird" } Here's how it all fits together: <-figure-> ![Events and process of the leaderboard](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fleaderboard-tutorial-1.png&w=3840&q=75) ## Prerequisites¶ To complete this tutorial, you need the following: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. An empty Tinybird Workspace 3. Node.js 20.11 or higher 4. Python 3.8 or higher 1 ## Create a Tinybird Workspace¶ Go to ( [app.tinybird.co](https://app.tinybird.co/) ) and create an empty Tinybird Workspace called `tiny_leaderboard` in your preferred region. 2 ## Create a Data Source for events¶ You can create a Data Source based on a schema that you define or rely on the [Mockingbird](https://mockingbird.tinybird.co/docs) tool used to stream mock data to create the Data Source for you. While the Mockingbird method is faster, building your own Data Source gives you more control and introduces some fundamental concepts along the way. In the Tinybird UI, add a new Data Source and use the `Write schema` option. In the schema editor, use the [following schema](https://github.com/tinybirdco/demo-user-facing-leaderboard/blob/main/tinybird/datasources/game_events.datasource): ##### Data Source schema SCHEMA > `name` String `json:$.name`, `session_id` String `json:$.session_id`, `timestamp` DateTime64(3) `json:$.timestamp`, `type` LowCardinality(String) `json:$.type`, `event` String `json:$.event` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYear(timestamp)" ENGINE_SORTING_KEY "event, name, timestamp" Name the Data Source `game_events` and select **Create Data Source**. This schema definition shows how the incoming JSON events are parsed and assigned to each of schema fields. The definition also defines [database table ‘engine’ details](https://www.tinybird.co/docs/docs/get-data-in/data-sources#supported-engines-settings) . Tinybird projects are made of Data Source and Pipe definition files like this example, and they can be managed like any other code project using Git. 3 ## Create a mock data stream¶ In a real-life scenario, you'd stream your game events into the `game_events` Data Source. For this tutorial, you use [Mockingbird](https://mockingbird.tinybird.co/docs) , an open source mock data stream generator, to stream mock events instead. Mockingbird generates a JSON payload based on a predefined schema and posts it to the [Tinybird Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) , which then writes the data to your Data Source. Use [this Mockingbird link](https://mockingbird.tinybird.co/?host=eu_gcp&datasource=game_events&eps=10&withLimit=on&generator=Tinybird&endpoint=eu_gcp&limit=-1&generatorName=Tinybird&template=Flappybird&schema=Preset) to generate fake data for the `game_events` Data Source. Using the link provides a preconfigured schema. Enter your Workspace admin Token and select the Host region that matches your Workspace region. Select **Save** , then scroll down and select **Start Generating**. Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. In the Tinybird UI, confirm that the `game_events` Data Source is successfully receiving data. Leaderboards typically leverage a concise data schema with just a user/item name, the ranked attribute, and a timestamp. This tutorial is based on this schema: - `name` String - `session_id` String - `timestamp` DateTime64(3) - `type` LowCardinality(String) - `event` String Ranking algorithms can be based on a single score, time-based metrics, or weighted combinations of factors. 4 ## Transform and publish your data¶ Your Data Source is collecting events, so now it's time to create some [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) . Pipes are made up of chained, reusable SQL [nodes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#nodes) and form the logic that ranks the results. Start by creating a `leaderboard` Pipe with two nodes. The first node returns all 'score' events. The second node takes those results and counts these events by player and session (which defines a single game), and returns the top 10 results. In the Tinybird UI, create a new Pipe called `leaderboard` . Paste in the following SQL and rename the first node `get_all_scores`: ##### get\_all\_scores Node % SELECT name AS player_id, timestamp, session_id, event FROM game_events WHERE type = 'score' AND event == {{ String(event_param, 'Mockingbird', description="Event to filter on") }} This query returns all events where the type is `score`. This node creates a query parameter named `event_param` using the [Tinybird templating syntax](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) . This instance of Flappybird supports an ‘event’ attribute that supports organizing players, games, and events into separate groups. As shown previously, incoming Mockingbird game events have a `"event": "Mockingbird"` attribute. Select **Run** and add a new node underneath, called `endpoint` . Paste in: ##### endpoint Node SELECT player_id, session_id, event, count() AS score FROM get_all_scores GROUP BY player_id, session_id, event ORDER BY score DESC LIMIT 10 Select **Run** , then select **Create API Endpoint** . Your data is now ranked, published, and available for consuming. 5 ## Optimize with Materialized Views¶ Before you run the frontend for your leaderboard, there are a few optimizations to make. Even with small datasets, it's a great habit to get into. [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) are updated as data is ingested, and create intermediate states that are merged with already-processed data. The Materialized View (MV) continuously re-evaluates queries as new events are inserted, reducing both latency and processed-data-per-query. In this case, the MV you create precalculates the top scores, and merges those with recently-received events. This significantly improves query performance by reducing the amount of data that needs to be processed for each leaderboard request. To create a new Materialized View, begin by adding a new Pipe and call it `user_stats_mv` . Then paste the following SQL into the first Node: SELECT event, name AS player_id, session_id, countIfState(type = 'score') AS scores, countIfState(type = 'game_over') AS games, countIfState(type = 'purchase') AS purchases, minState(timestamp) AS start_ts, maxState(timestamp) AS end_ts FROM games_events GROUP BY event, player_id, session_id This query relies on the `countIfState` function, which includes the `-State` operator to maintain immediate states containing recent data. When triggered by a `-Merge` operator, these intermediate states are combined with the precalculated data. The `countIfState` function is used to maintain counts of each type of game event. Name this node `populate_mv` , then [publish it as a Materialized View](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) . Name your Materialized View `user_stats`. You now have a new Data Source called `user_stats` , which is a Materialized View that is continuously updated with the latest game events. The `-State` modifier that maintains intermediate states as new data arrives is paired with a `-Merge` modifier in Pipes that pull from the `user_stats` Data Source. 6 ## Update leaderboard Pipe¶ Now that `user_stats` is available, rebuild the `leaderboard` Pipe to take advantage of this more efficient Data Source. This step helps prepare your leaderboard feature to handle massive amounts of game events while serving requests to thousands of users. The updated leaderboard Pipe consists of three nodes: - `rank_games` - Applies the countMerge(scores) function to get the current total from the user_stats Data Source. - `last_game` - Retrieves the score from the player's most recent game and determines the player's rank. - `endpoint` - Combines the results of these two nodes and ranks by score. The `last_game` node introduces the user-facing aspect of the leaderboard. This node retrieves a specific user's data and blends it into the leaderboard results. To get started, update the `leaderboard` Pipe to use the `user_stats` Materialized View. Return to the `leaderboard` Pipe and un-publish it. Now, change the name of the first node to `rank_games` and update the SQL to: ##### rank\_games Node % SELECT ROW_NUMBER() OVER (ORDER BY total_score DESC, t) AS rank, player_id, session_id, countMerge(scores) AS total_score, maxMerge(end_ts) AS t FROM user_stats GROUP BY player_id, session_id ORDER BY rank A few things to notice: 1. The `rank_games` node now uses the `user_stats` Materialized View instead of the `game_events` Data Source. 2. The use of the `countMerge(scores)` function. The `-Merge` operator triggers the MV-based `user_stats` Data Source to combine any intermediate states with the pre-calculated data and return the results. 3. The use of the `ROW_NUMBER()` window function that returns a ranking of top scores. These rankings are based on the merged scores (aliased as `total_scores` ) retrieved from the `user_stats` Data Source. Next, change the name of the second node to `last_game` and update the SQL to: ##### last\_game Node % SELECT argMax(rank, t) AS rank, player_id, argMax(session_id, t) AS session_id, argMax(total_score, t) AS total_score FROM rank_games WHERE player_id = {{ String(player_id, 'Jim', description="Player to filter on", required=True) }} GROUP BY player_id This query returns the highest rank of a specified player and introduces a `player_id` query parameter. To combine these results, add a new node called `endpoint` and paste the following SQL: ##### endpoint Node SELECT * FROM ( SELECT rank, player_id, session_id, total_score FROM rank_games WHERE (player_id, session_id) NOT IN (SELECT player_id, session_id FROM last_game) LIMIT 10 UNION ALL SELECT rank, player_id, session_id, total_score FROM last_game ) ORDER BY rank ASC This query applies the `UNION ALL` statement to combine the two result sets. The selected attribute data types must match to be combined. This completes the `leaderboard` Pipe. Publish it as an API Endpoint. Now that the final version of the 'leaderboard' Endpoint has been published, create one last Pipe in the UI. This one gets the overall stats for the leaderboard, the number of players and completed games. Name the Pipe `get_stats` and create a single node named `endpoint`: ##### endpoint node in the get\_stats Pipe WITH player_count AS ( SELECT COUNT(DISTINCT player_id) AS players FROM user_stats ), game_count AS ( SELECT COUNT(*) AS games FROM game_events WHERE type == 'game_over' ) SELECT players, games FROM player_count, game_count Publish this node as an API Endpoint. You're ready to get it all running! 7 ## Run your app¶ Clone the `demo-user-facing-leaderboard` repo locally. Install the app dependencies by running this command from the `app` dir of the cloned repo: npm install ### Add your Tinybird settings as environment variables¶ Create a new `.env.local` file: touch .env.local Copy your Tinybird Admin Token, Workspace UUID, obtained from **Workspace** > **Settings** > **Advanced settings** >** `...`** , and API host URL from your Tinybird Workspace into the new `.env.local`: TINYBIRD_SIGNING_TOKEN="YOUR SIGNING TOKEN" # Use your Admin Token as the signing token TINYBIRD_WORKSPACE="YOUR WORKSPACE ID" # The UUID of your Workspace NEXT_PUBLIC_TINYBIRD_HOST="YOUR TINYBIRD API HOST e.g. https://api.tinybird.co" # Your regional API host ### Run your app¶ Run your app locally and navigate to `http://localhost:3000`: npm run dev You now have an optimized gaming leaderboard ingesting real-time data! Have a think about how you'd adapt or extend it for your own use case. ## Next steps¶ - Read the in-depth blog post on[ building a real-time leaderboard](https://www.tinybird.co/blog-posts/building-real-time-leaderboards-with-tinybird) . - Understand today's real-time analytics landscape with[ Tinybird's definitive guide](https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide) . - Learn how to implement[ multi-tenant security](https://www.tinybird.co/blog-posts/multi-tenant-saas-options) in your user-facing analytics. --- URL: https://www.tinybird.co/docs/get-started/quick-start/tinybird-101-tutorial Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Tinybird 101 Tutorial · Tinybird Docs" theme-color: "#171612" description: "Tinybird provides you with an easy way to ingest and query large amounts of data with low-latency, and instantly create API Endpoints to consume those queries. This makes it extremely easy to build fast and scalable applications that query your data; no backend needed!" --- # Tinybird 101¶ Tinybird provides you with a simple way to ingest and query large amounts of data with low latency, and instantly create API Endpoints to consume those queries. This means you can easily build fast and scalable applications that query your data. ## Example use case: ecommerce¶ This walkthrough demonstrates how to build an API Endpoint that returns the top 10 most searched products in an ecommerce website. It follows the process of "ingest > query > publish". 1. First, you ingest a set of ecommerce events based on user actions, such as viewing an item, adding items to their cart, or going through the checkout. This data is available as a CSV file with 50 million rows. 2. Next, you write queries to filter, aggregate, and transform the data into the top 10 list. 3. Finally, you publish that top 10 result as an HTTP Tinybird API Endpoint. ## Your first Workspace¶ After [creating your account](https://www.tinybird.co/signup) , select a region, and name your Workspace. You can call the Workspace whatever you want. Leave the template menu blank. 1 ## Create a Data Source¶ Tinybird can import data from many different sources. Start with a CSV file that Tinybird has posted online for you. In your Workspace, find the **Data Sources** section and select the **+** icon to add a new Data Source. In the dialog that opens, select the **Remote URL** connector. Make sure that `csv` is selected, then paste the following URL into the text box: https://storage.googleapis.com/tinybird-assets/datasets/guides/events_50M_1.csv Select **Add** and give the Data Source a name and description. Tinybird also shows you a preview of the schema and data. Change the name to something more descriptive, for example `shopping_data`. ### Start the data import¶ After setting the name of your first Data Source, select **Create Data Source** to start importing the data. You've ingested your first data. Now you can move on to creating your first Pipe. 2 ## Create a Pipe¶ In Tinybird, SQL queries are written inside Pipes. One Pipe can be made up of many individual SQL queries called nodes. Each node is a single SQL SELECT statement. A node can query the output of another Nnde in the same Pipe. This means that you can break large queries down into a multiple smaller, more modular, queries and chain them together. Add a new Pipe by selecting the **+** icon next to the Pipes category. This adds a new Pipe with an auto-generated default name. Select the name and description to change it. Call this Pipe `top_10_searched_products`. ### Filter the data¶ At the top of your new Pipe is the first node, which is prepopulated with a simple SELECT over the data in your Data Source. Before you start modifying the query in the node, select **Run** . Hitting **Run** executes the query in the Node, and shows a preview of the query result. You can execute any node in your Pipe to see the result. In this Pipe, you want to create a list of the top 10 most searched products. If you take a look at the data, you might notice an `event` column, which describes what kind of event happened. This column has various values, including `view`, `search` , and `buy` . You are only interested in rows where the `event` is `search` , so modify the query to filter the rows. Replace the node SQL with the following query: SELECT * FROM shopping_data WHERE event == 'search' Select **Run** again. The node is now applying a filter to the data, so you only see the rows of interest. Call this node `search_events`. ### Aggregate the data¶ Next, you want to work out how many times each individual product has been searched for. To do this, you need to count and aggregate by the product id. To keep your queries simpler, create a second node to do this aggregation. Use the following query for the next node: SELECT product_id, count() as total FROM search_events GROUP BY product_id ORDER BY total DESC Select **Run** again to see the results of the query. Call this node `aggregate_by_product_id`. 3 ## Transform the result¶ Finally, create the last node that you are going to use to publish as a Tinybird API Endpoint and limit the results to the top 10 products. Create a third node and use the following query: SELECT product_id, total FROM aggregate_by_product_id LIMIT 10 Follow the common convention to name this node `endpoint` . Select **Run** to preview the results. 4 ## Publish and use your API Endpoint¶ Tinybird API Endpoints are published directly from selected nodes. API Endpoints come with an extensive feature set, including support for dynamic query parameters and auto-generated docs complete with code samples. To publish a node as an API Endpoint, select **Create API Endpoint** , then select **Create API Endpoint**. ### Test the API Endpoint¶ On your API overview page, scroll down to the **Sample** usage section, and copy the HTTP URL from the snippet box. Open this URL in a new tab in your browser. Hitting the API Endpoint triggers your Pipe to execute, and you get a JSON formatted response with the results. 5 ## Build Charts showing your data¶ On your API overview page, select **Create Chart**. You can also build Charts to embed in your own application. See the [Charts documentation](https://www.tinybird.co/docs/docs/publish/charts) for more. 6 ## Celebrate¶ Congrats! You have finished creating your first API Endpoint in Tinybird! You have imported 50 million events, built a variety of queries with latencies measured in milliseconds, and stood up an API Endpoint that can serve thousands of concurrent requests. ## Next steps¶ Tinybird provides built-in connectors to easily ingest data from Kafka, Confluent Cloud, Big Query, Amazon S3, and Snowflake. If you want to stream data over HTTP, you can send data directly to Tinybird's [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) with no additional infrastructure. --- URL: https://www.tinybird.co/docs/get-started/quick-start/user-facing-web-analytics Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build a user-facing web analytics dashboard · Tinybird Docs" theme-color: "#171612" description: "Learn how to build a user-facing web analytics dashboard using Tinybird for real-time, user-facing analytics." --- # Build a user-facing web analytics dashboard¶ Read on to learnhow to build a user-facing web analytics dashboard. Use Tinybird to capture web clickstream events, process the data in real-time, and expose metrics as APIs. Then, deploy a Next.js app to visualize your metrics. [GitHub Repository](https://github.com/tinybirdco/demo-user-facing-web-analytics) The guide is divided into the following steps: 1. Stream unstructured events data to Tinybird with the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) . 2. Parse those events with a global SQL node that you can reuse in all your subsequent[ Tinybird Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) . 3. Build performant queries to calculate user-facing analytics metrics. 4. Optimize query performance with[ Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) . 5. Publish your metrics as[ API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints) and integrate them into a user-facing Next.js app. ## Prerequisites¶ To complete this tutorial, you need the following: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. An empty Tinybird Workspace 3. Node.js 20.11 or higher 4. Python 3.8 or higher This tutorial includes a [Next.js](https://nextjs.org/) app for frontend visualization. For more information about how the Next.js app is designed and deployed, read the [repository README](https://github.com/tinybirdco/demo-user-facing-web-analytics/tree/main/app/README.md). The steps in this tutorial are completed using the Tinybird Command Line Interface (CLI). If you're not familiar with it, [read the CLI docs](https://www.tinybird.co/docs/docs/cli/install) . You can copy and paste every code snippet and command in this tutorial. 1 ## Create a Tinybird Data Source to store your events¶ First, create a Tinybird Data Source to store your web clickstream events. Create a new directory called `tinybird` in your project folder and install the Tinybird CLI: ##### Install the Tinybird CLI mkdir tinybird cd tinybird python -m venv .venv source .venv/bin/activate pip install tinybird-cli Copy the user admin Token and authenticate the CLI: ##### Authenticate the Tinybird CLI tb auth --token Initialize an empty Tinybird project and navigate to the `/datasources` directory, then create a new file called `analytics_events.datasource`: ##### Create a Data Source tb init cd datasources touch analytics_events.datasource Open the file in your preferred code editor and paste the following contents: ##### analytics\_events.datasource DESCRIPTION > Analytics events landing data source SCHEMA > `timestamp` DateTime `json:$.timestamp`, `session_id` String `json:$.session_id`, `action` LowCardinality(String) `json:$.action`, `version` LowCardinality(String) `json:$.version`, `payload` String `json:$.payload` ENGINE MergeTree ENGINE_PARTITION_KEY toYYYYMM(timestamp) ENGINE_SORTING_KEY timestamp ENGINE_TTL timestamp + toIntervalDay(60) If you pass a non-existent Data Source name to the Events API, Tinybird automatically creates a new Data Source of that name with an [inferred schema](https://www.tinybird.co/docs/docs/get-data-in#get-started) . By creating the Data Source ahead of time in this file, you have more control over the schema definition, including column types and sorting keys. For more information about creating Tinybird Data Sources, see [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources). In the `/tinybird` directory, save and push the file to Tinybird: ##### Push the Data Source to Tinybird cd .. tb push datasources/analytics_events.datasource Confirm that you have a new Data Source: tb datasource ls You should see `analytics_events` in the result. Congrats, you have a Tinybird Data Source! 2 ## Stream mock data to your Data Source¶ This tutorial uses [Mockingbird](https://mockingbird.tinybird.co/docs) , an open source mock data stream generator, to stream mock web clickstream events to your Data Source. Mockingbird generates a JSON payload based on a predefined schema and posts it to the [Tinybird Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) , which then writes the data to your Data Source. You can explore the [Mockingbird web UI](https://mockingbird.tinybird.co/) , or follow the steps to complete the same actions using the Mockingbird CLI. In a separate terminal window, install the Mockingbird CLI: ##### Install Mockingbird npm install -g @tinybirdco/mockingbird-cli Run the following command to stream 50,000 mock web clickstream events to your `analytics_events` Data Source at 50 events per second through the Events API. This command uses the predefined Web Analytics template schema to generate mock web clickstream events. Copy your User Admin Token to the clipboard with `tb token copy dashboard` , and use it in the following command. Change the `endpoint` argument depending on your Workspace region if required: ##### Stream to Tinybird with a template mockingbird-cli tinybird --template "Web Analytics template" --eps 50 --limit 50000 --datasource analytics_events --token --endpoint gcp_europe_west3 Confirm that events are written to the `analytics_events` Data Source by running the following command a few times: tb sql 'select count() from analytics_events' You should see the count incrementing up by 50 every second or so. Congratulations, you're ready to start processing your events data! 3 ## Parse the raw JSON events¶ The `analytics_events` Data Source has a `payload` column which stores a string of JSON data. To begin building your analytics metrics, you need to parse this JSON data using a Tinybird [Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes). When you're dealing with unstructured data that's likely to change in the future, retain the unstructured data as a JSON string in a single column. This gives you flexibility to change your upstream producers without breaking ingestion. You can then parse and materialize this data downstream. Navigate to the `/pipes` directory and create a new file called `analytics_hits.pipe`: ##### Create a Pipe touch analytics_hits.pipe Open the file and paste the following contents: ##### analytics\_hits.pipe DESCRIPTION > Parsed `page_hit` events, implementing `browser` and `device` detection logic. TOKEN "dashboard" READ NODE parsed_hits DESCRIPTION > Parse raw page_hit events SQL > SELECT timestamp, action, version, coalesce(session_id, '0') as session_id, JSONExtractString(payload, 'locale') as locale, JSONExtractString(payload, 'location') as location, JSONExtractString(payload, 'referrer') as referrer, JSONExtractString(payload, 'pathname') as pathname, JSONExtractString(payload, 'href') as href, lower(JSONExtractString(payload, 'user-agent')) as user_agent FROM analytics_events where action = 'page_hit' NODE endpoint SQL > SELECT timestamp, action, version, session_id, location, referrer, pathname, href, case when match(user_agent, 'wget|ahrefsbot|curl|urllib|bitdiscovery|\+https://|googlebot') then 'bot' when match(user_agent, 'android') then 'mobile-android' when match(user_agent, 'ipad|iphone|ipod') then 'mobile-ios' else 'desktop' END as device, case when match(user_agent, 'firefox') then 'firefox' when match(user_agent, 'chrome|crios') then 'chrome' when match(user_agent, 'opera') then 'opera' when match(user_agent, 'msie|trident') then 'ie' when match(user_agent, 'iphone|ipad|safari') then 'safari' else 'Unknown' END as browser FROM parsed_hits This Pipe contains two [nodes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#nodes) . The first node, called `parsed_hits` , extracts relevant information from the JSON `payload` using the `JSONExtractString()` function and filters to only include `page_hit` actions. The second node, called `endpoint` , selects from the `parsed_hits` node and further parses the `user_agent` to get the `device` and `browser` for each event. Additionally, this code gives the Pipe a description, and creates a [Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) called `dashboard` with `READ` scope for this Pipe. Navigate back up to the `/tinybird` directory and push the Pipe to Tinybird: ##### Push the Pipe to Tinybird tb push pipes/analytics_hits.pipe When you push a Pipe file, Tinybird automatically publishes the last node as an API Endpoint unless you specify the Pipe as something else, so it's best practice to call your final node "endpoint". You can unpublish an API Endpoint at any time using `tb pipe unpublish `. You now have a public REST API that returns the results of the `analytics_hits` Pipe. Get your Admin Token again with `tb token copy dashboard` and test your API with the command: curl "https://api.tinybird.co/v0/pipes/analytics_hits.json?token=" You should see a JSON response that looks something like this: ##### Example API response { "meta": [ { "name": "timestamp", "type": "DateTime" }, { "name": "action", "type": "LowCardinality(String)" }, { "name": "version", "type": "LowCardinality(String)" }, { "name": "session_id", "type": "String" }, { "name": "location", "type": "String" }, { "name": "referrer", "type": "String" }, { "name": "pathname", "type": "String" }, { "name": "href", "type": "String" }, { "name": "device", "type": "String" }, { "name": "browser", "type": "String" } ], "data": [ { "timestamp": "2024-04-24 18:24:21", "action": "page_hit", "version": "1", "session_id": "713355c6-6b98-4c7a-82a9-e19a7ace81fe", "location": "", "referrer": "https:\/\/www.kike.io", "pathname": "\/blog-posts\/data-market-whitebox-replaces-4-data-stack-tools-with-tinybird", "href": "https:\/\/www.tinybird.co\/blog-posts\/data-market-whitebox-replaces-4-data-stack-tools-with-tinybird", "device": "bot", "browser": "chrome" }, ... ] "rows": 150, "statistics": { "elapsed": 0.006203411, "rows_read": 150, "bytes_read": 53609 } } 4 ## Calculate aggregates for pageviews, sessions, and sources¶ Next, create three [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) to store aggregates for the following: 1. pageviews 2. sessions 3. sources Later on, you query from the Materialized Views that you're creating here. From the `/datasources` directory in the Tinybird project, create three new Data Source files: touch analytics_pages_mv.datasource analytics_sessions_mv.datasource analytics_sources_mv.datasource Open the `analytics_pages_mv.datasource` file and paste in the following contents: ##### analytics\_pages\_mv.datasource SCHEMA > `date` Date, `device` String, `browser` String, `location` String, `pathname` String, `visits` AggregateFunction(uniq, String), `hits` AggregateFunction(count) ENGINE AggregatingMergeTree ENGINE_PARTITION_KEY toYYYYMM(date) ENGINE_SORTING_KEY date, device, browser, location, pathname Do the same for `analytics_sessions_mv.datasource` and `analytics_sources_mv.datasource` , copying the code from the [GitHub repository](https://github.com/tinybirdco/demo-user-facing-web-analytics/tree/main/tinybird/datasources) for this tutorial. Next, create three Pipes that calculate the aggregates and store the data in the Materialized View Data Sources you've created. From the `/pipes` directory, create three new Pipe files: touch analytics_pages.pipe analytics_sessions.pipe analytics_sources.pipe Open `analytics_pages.pipe` and paste the following: ##### analytics\_pages.pipe NODE analytics_pages_1 DESCRIPTION > Aggregate by pathname and calculate session and hits SQL > SELECT toDate(timestamp) AS date, device, browser, location, pathname, uniqState(session_id) AS visits, countState() AS hits FROM analytics_hits GROUP BY date, device, browser, location, pathname TYPE MATERIALIZED DATASOURCE analytics_pages_mv This code calculates aggregates for page views, and designates the Pipe as a Materialized View with `analytics_pages_mv` as the target Data Source. Do this for the remaining two Pipes, copying the code from the [GitHub repository](https://github.com/tinybirdco/demo-user-facing-web-analytics/tree/main/tinybird/pipes). Back in the `/tinybird` directory, push these new Pipes and Data Sources to Tinybird. This populates the Materialized Views with your Mockingbird data: ##### Push to Tinybird tb push pipes --push-deps --populate Now, as new events arrive in the `analytics_events` Data Source, these Pipes process the data and update the aggregate states in your Materialized Views as new data arrives. 5 ## Generate session count trend for the last 30 minutes¶ The first Pipe you create, called `trend` , calculates the number of sessions over the last 30 minutes, grouped by 1 minute intervals. From the `/pipes` directory, create a file called `trend.pipe`: ##### Create trend.pipe touch trend.pipe Open this file and paste the following: ##### trend.pipe DESCRIPTION > Visits trend over time for the last 30 minutes, filling in the blanks. TOKEN "dashboard" READ NODE timeseries DESCRIPTION > Generate a timeseries for the last 30 minutes, so we call fill empty data points SQL > with (now() - interval 30 minute) as start select addMinutes(toStartOfMinute(start), number) as t from (select arrayJoin(range(1, 31)) as number) NODE hits DESCRIPTION > Get last 30 minutes metrics grouped by minute SQL > select toStartOfMinute(timestamp) as t, uniq(session_id) as visits from analytics_hits where timestamp >= (now() - interval 30 minute) group by toStartOfMinute(timestamp) order by toStartOfMinute(timestamp) NODE endpoint DESCRIPTION > Join and generate timeseries with metrics for the last 30 minutes SQL > select a.t, b.visits from timeseries a left join hits b on a.t = b.t order by a.t This Pipe contains three nodes: 1. The first node, called `timeseries` , generates a simple result set with 1-minute intervals for the last 30 minutes. 2. The second node, called `hits` , calculates total sessions over the last 30 minutes, grouped by 1-minute intervals. 3. The third node, called `endpoint` , performs a left join between the first two nodes, retaining all of the 1-minute intervals from the `timeseries` node. 6 ## Calculate the top pages visited¶ Next, create a Pipe called `top_pages` to calculate a sorted list of the top pages visited over a specified time range. This Pipe queries the `analytics_pages_mv` Data Source you created in the prior steps, and it uses Tinybird's templating language to define [query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) that you can use to dynamically select a time range and implement pagination in the response. From the `/pipes` directory, create the `top_pages.pipe` file: ##### Create top\_pages.pipe touch top_pages.pipe Open the file and paste the following: DESCRIPTION > Most visited pages for a given period. Accepts `date_from` and `date_to` date filter. Defaults to last 7 days. Also `skip` and `limit` parameters for pagination. TOKEN "dashboard" READ NODE endpoint DESCRIPTION > Group by pagepath and calculate hits and visits SQL > % select pathname, uniqMerge(visits) as visits, countMerge(hits) as hits from analytics_pages_mv where {% if defined(date_from) %} date >= {{ Date(date_from, description="Starting day for filtering a date range", required=False) }} {% else %} date >= timestampAdd(today(), interval -7 day) {% end %} {% if defined(date_to) %} and date <= {{ Date(date_to, description="Finishing day for filtering a date range", required=False) }} {% else %} and date <= today() {% end %} group by pathname order by visits desc limit {{ Int32(skip, 0) }},{{ Int32(limit, 50) }} Note the use of the `-Merge` modifiers on the end of the aggregate function. This modifier performs a final merge on the aggregate states in the Materialized View. [Read this Guide](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views/best-practices) for more details. 7 ## Create the remaining API Endpoints¶ In the [GitHub repository](https://github.com/tinybirdco/demo-user-facing-web-analytics/tree/main/tinybird/pipes) , you can find five additional Pipe files that calculate other various user-facing metrics: - `kpis.pipe` - `top_browsers.pipe` - `top_devices.pipe` - `top_locations.pipe` - `top_sources.pipe` Create those into your `/pipes` directory: touch kpis.pipe top_browsers.pipe top_devices.pipe top_locations.pipe top_sources.pipe And copy the file contents from the GitHub examples into your files. Finally, in the `/tinybird` directory, push all these new Pipes to Tinybird: tb push pipes You now have seven API Endpoints that you can integrate into your Next.js app to provide data to your dashboard components. 8 ## Deploy the Next.js app¶ You can deploy the accompanying Next.js app to Vercel by clicking this button: [Deploy with Vercel](https://vercel.com/new/clone?repository-url=https%253A%252F%252Fgithub.com%252Ftinybirdco%252Fdemo-user-facing-web-analytics%252Ftree%252Fmain%252Fapp&env=NEXT_PUBLIC_TINYBIRD_AUTH_TOKEN,NEXT_PUBLIC_TINYBIRD_HOST,NEXT_PUBLIC_BASE_URL&envDescription=Tinybird%2520configuration&project-name=user-facing-web-analytics&repository-name=user-facing-web-analytics) First, select the Git provider where you can clone the Git repository: ![Select Git provider](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-user-facing-web-analytics-deploy-1.png&w=3840&q=75) Next, set the following environment variables: - `NEXT_PUBLIC_TINYBIRD_AUTH_TOKEN` : your[ Tinybird Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) - `NEXT_PUBLIC_TINYBIRD_HOST` : your Tinybird Region (e.g. `https://api.tinybird.co` ) - `NEXT_PUBLIC_BASE_URL` : The URL where you will publish your app (e.g. `https://my-analytics.com` ) ![Set env variables](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-user-facing-web-analytics-deploy-2.png&w=3840&q=75) Select **Deploy** and you're done. Explore your dashboard and have a think about how you'd like to adapt or extend it in the future. ## Next steps¶ - Understand today's real-time analytics landscape with[ Tinybird's definitive guide](https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide) . - Learn how to implement[ multi-tenant security](https://www.tinybird.co/blog-posts/multi-tenant-saas-options) in your user-facing analytics. --- URL: https://www.tinybird.co/docs/get-started/quick-start/vector-search-recommendation Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build a content recommendation API using vector search · Tinybird Docs" theme-color: "#171612" description: "Learn how to compute embeddings in Python and use vector search SQL functions in Tinybird to build a content recommendation API." --- # Build a content recommendation API using vector search¶ Read on to learn how to calculate vector embeddings using HuggingFace models and use Tinybird to perform vector search to find similar content based on vector distances. [GitHub Repository](https://github.com/tinybirdco/demo_vector_search_recommendation/tree/main) <-figure-> ![Tinybird blog related posts uses vector search recommendation algorithm.](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-vector-search-recommendation-1.png&w=3840&q=75) In this tutorial, you learn how to: 1. Use Python to fetch content from an RSS feed. 2. Calculate vector embeddings on long form content (blog posts) using SentenceTransformers in Python. 3. Post vector embeddings to a Tinybird Data Source using the Tinybird Events API. 4. Write a dynamic SQL query to calculate the closest content matches to a given blog post based on vector distances. 5. Publish your query as an API and integrate it into a frontend application. ## Prerequisites¶ To complete this tutorial, you need the following: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. An empty Tinybird Workspace 3. Python 3.8 or higher This tutorial doesn't include a frontend. An example snippet is provided to show how you can integrate the published API into a React frontend. 1 ## Setup¶ Clone the `demo_vector_search_recommendation` repo. Authenticate the Tinybird CLI using your user admin token from your Tinybird Workspace: cd tinybird tb auth --token $USER_ADMIN_TOKEN 2 ## Fetch content and calculate embeddings¶ This tutorial uses the [Tinybird Blog RSS feed](https://www.tinybird.co/blog-posts/rss.xml) to fetch blog posts. You can use any `rss.xml` feed to fetch blog posts and calculate embeddings from their content. You can fetch and parse the RSS feed using the `feedparser` library in Python, get a list of posts, and then fetch each post and parse the content with the `BeautifulSoup` library. Once you've fetched each post, you can calculate an embedding using the HuggingFace `sentence_transformers` library. This demo uses the `all-MiniLM-L6-v2` model, which maps sentences and paragraphs to a 384 dimensional dense vector space: from bs4 import BeautifulSoup from sentence_transformers import SentenceTransformer import datetime import feedparser import requests import json timestamp = datetime.datetime.now().isoformat() url = "https://www.tinybird.co/blog-posts/rss.xml" # Update to your preferred RSS feed feed = feedparser.parse(url) model = SentenceTransformer("all-MiniLM-L6-v2") posts = [] for entry in feed.entries: doc = BeautifulSoup(requests.get(entry.link).content, features="html.parser") if (content := doc.find(id="content")): embedding = model.encode([content.get_text()]) posts.append(json.dumps({ "timestamp": timestamp, "title": entry.title, "url": entry.link, "embedding": embedding.mean(axis=0).tolist() })) 3 ## Post content metadata and embeddings to Tinybird¶ After calculating the embeddings, you can push them along with the content metadata to Tinybird using the [Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api). First, set up some environment variables for your Tinybird host and token with `DATASOURCES:WRITE` scope: export TB_HOST=your_tinybird_host export TB_TOKEN=your_tinybird_token Next, set up a Tinybird Data Source to receive your data. In the `tinybird/datasources` folder of the repository, find a `posts.datasource` file that looks like this: SCHEMA > `timestamp` DateTime `json:$.timestamp`, `title` String `json:$.title`, `url` String `json:$.url`, `embedding` Array(Float32) `json:$.embedding[:]` ENGINE ReplacingMergeTree ENGINE_PARTITION_KEY "" ENGINE_SORTING_KEY title, url ENGINE_VER timestamp This Data Source receives the updated post metadata and calculated embeddings and deduplicates based on the most up to date data retrieval. The `ReplacingMergeTree` is used to deduplicate, relying on the `ENGINE_VER` setting, which in this case is set to the `timestamp` column. This tells the engine that the versioning of each entry is based on the `timestamp` column, and only the entry with the latest timestamp is kept in the Data Source. The Data Source has the `title` column as its primary sorting key, because you filter by title to retrieve the embedding for the current post. Having `title` as the primary sorting key makes that filter more performant. Push this Data Source to Tinybird: cd tinybird tb push datasources/posts.datasource Then, you can use a Python script to push the post metadata and embeddings to the Data Source using the Events API: import os import requests TB_APPEND_TOKEN=os.getenv("TB_APPEND_TOKEN") TB_HOST=os.getenv("TB_HOST") def send_posts(posts): params = { "name": "posts", "token": TB_APPEND_TOKEN } data = "\n".join(posts) # ndjson r = requests.post(f"{TB_HOST}/v0/events", params=params, data=data) print(r.status_code) send_posts(posts) To keep embeddings up to date, you should retrieve new content on a schedule and push it to Tinybird. In the repository, you can find a GitHub Action called [tinybird_recommendations.yml](https://github.com/tinybirdco/demo_vector_search_recommendation/blob/main/.github/workflows/tinybird_recommendations.yml) that fetches new content from the Tinybird blog every 12 hours and pushes it to Tinybird. The Tinybird Data Source in this project uses a ReplacingMergeTree to deduplicate blog post metadata and embeddings as new data arrives. 4 ## Calculate distances in SQL using Tinybird Pipes¶ If you've completed the previous steps, you should have a `posts` Data Source in your Tinybird Workspace containing the last fetched timestamp, title, URL, and embedding for each blog post fetched from your RSS feed. You can verify that you have data from the Tinybird CLI with: tb sql 'SELECT * FROM posts' This tutorial includes a single-node SQL Pipe to calculate the vector distance of each post to specific post supplied as a query parameter. The Pipe config is contained in the `similar_posts.pipe` file in the `tinybird/pipes` folder, and the SQL is copied in the following snippet for reference and explanation. % WITH ( SELECT embedding FROM ( SELECT 0 AS id, embedding FROM posts WHERE title = {{ String(title) }} ORDER BY timestamp DESC LIMIT 1 UNION ALL SELECT 999 AS id, arrayWithConstant(384, 0.0) embedding ) ORDER BY id LIMIT 1 ) AS post_embedding SELECT title, url, L2Distance(embedding, post_embedding) similarity FROM posts FINAL WHERE title <> {{ String(title) }} ORDER BY similarity ASC LIMIT 10 This query first fetches the embedding of the requested post, and returns an array of 0s in the event an embedding can't be fetched. It then calculates the Euclidean vector distance between each additional post and the specified post using the `L2Distance()` function, sorts them by ascending distance, and limits to the top 10 results. You can push this Pipe to your Tinybird server with: cd tinybird tb push pipes/similar_posts.pipe When you push it, Tinybird automatically publishes it as a scalable, dynamic REST API Endpoint that accepts a `title` query parameter. You can test your API Endpoint with a cURL. First, create an envvar with a token that has `PIPES:READ` scope for your Pipe. You can get this token from your Workspace UI or in the CLI with `tb token` [commands](https://www.tinybird.co/docs/docs/cli/command-ref#tb-token). export TB_READ_TOKEN=your_read_token Then request your endpoint: curl --compressed -H "Authorization: Bearer $TB_READ_TOKEN" https://api.tinybird.co/v0/pipes/similar_posts.json?title='Some blog post title' A JSON object appears containing the 10 most similar posts to the post whose title you supplied in the request. 5 ## Integrate into the frontend¶ Integrating your vector search API into the frontend is relatively straightforward. Here's an example implementation: export async function getRelatedPosts(title: string) { const recommendationsUrl = `${host}/v0/pipes/similar_posts.json?token=${token}&title=${title}`; const recommendationsResponse = await fetch(recommendationsUrl).then( function (response) { return response.json(); } ); if (!recommendationsResponse.data) return; return Promise.all( recommendationsResponse.data.map(async ({ url }) => { const slug = url.split("/").pop(); return await getPost(slug); }) ).then((data) => data.filter(Boolean)); } 6 ## See it in action¶ You can see how this looks by checking out any blog post in the [Tinybird Blog](https://www.tinybird.co/blog) . At the bottom of each post, you can find a Related Posts section that's powered by a real Tinybird API ## Next steps¶ - Read more about[ vector search](https://www.tinybird.co/docs/docs/use-cases/vector-search) and[ content recommendation](https://www.tinybird.co/docs/docs/use-cases/content-recommendation) use cases. - Join the[ Tinybird Slack Community](https://www.tinybird.co/community) for additional support. --- URL: https://www.tinybird.co/docs/index Last update: 2024-12-30T16:52:42.000Z Content: --- title: "Overview of Tinybird · Tinybird Docs" theme-color: "#171612" description: "Tinybird is data infrastructure for software teams. Ship analytics features in days, not months. Tinybird abstracts data ingestion, storage, compute, and API development into a single workflow. Fewer dependencies means faster development." --- # Welcome to Tinybird¶ Tinybird is data infrastructure for software teams, giving you the tooling and infra you need to ship analytics features in your application while minimizing external dependencies. Use Tinybird to capture, store, query, and serve analytics data in your application. ## Create an account¶ Tinybird has a time-unlimited free tier with limits generous enough for most proofs of concept (and some production apps!), so you can start building today and scale at your own pace. [Create a free account](https://www.tinybird.co/signup) using a Google, Microsoft, or GitHub account (or your email address). After you create your account, pick the cloud region that works best for you, then create a [Workspace](https://www.tinybird.co/docs/docs/get-started/administration/workspaces). ## Try out your Workspace¶ Follow the [Quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) to ship your first analytics in a few minutes. Learn the basics of ingesting data into Tinybird, writing SQL, publishing APIs, and integrating them in your application, from empty Workspace to working prototype. ## Use a template¶ Use one of the existing templates to get started quickly. See the [Templates](https://www.tinybird.co/templates) page for more information. ## Watch the videos¶ Watch the following videos to get familiar with Tinybird's user interface and CLI. - [ The Tinybird basics in 3 minutes (UI)](https://www.youtube.com/watch?v=cvay_LW685w) - [ Get started with the CLI](https://www.youtube.com/watch?v=OOEe84ly7Cs) - [ Ingest data from a file (UI vs CLI)](https://www.youtube.com/watch?v=1R0G1EolSEM) ## Next steps¶ - Browse the[ Use Case Hub](https://www.tinybird.co/docs/docs/use-cases) and see how to boost your project with real-time, user-facing analytics. - Learn the Tinybird[ core concepts](https://www.tinybird.co/docs/docs/get-started/quick-start/core-concepts) . - Start building using the[ quick start](https://www.tinybird.co/docs/docs/get-started/quick-start) . - Read how to build a[ user-facing web analytics dashboard](https://www.tinybird.co/docs/docs/get-started/quick-start/user-facing-web-analytics) . --- URL: https://www.tinybird.co/docs/monitoring Content: --- title: "Monitoring · Tinybird Docs" theme-color: "#171612" description: "Learn how to monitor your Tinybird data platform." --- # Monitoring¶ Tinybird is built around the idea of data that changes or grows continuously. Use the built-in Tinybird tools to monitor your data ingestion and API Endpoint processes. - [ Analyze endpoint performance](https://www.tinybird.co/docs/docs/monitoring/analyze-endpoints-performance) - [ Health checks](https://www.tinybird.co/docs/docs/monitoring/health-checks) - [ Measure endpoint latency](https://www.tinybird.co/docs/docs/monitoring/latency) - [ Monitor ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) - [ Monitor Workspace jobs](https://www.tinybird.co/docs/docs/monitoring/jobs) - [ Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) --- URL: https://www.tinybird.co/docs/monitoring/analyze-endpoints-performance Last update: 2024-12-18T15:47:50.000Z Content: --- title: "Analyze the performance of your API Endpoints · Tinybird Docs" theme-color: "#171612" description: "Learn more about how to measure the performance of your API Endpoints." --- # Analyze the performance of your API Endpoints¶ You can use the `pipe_stats` and `pipe_stats_rt` Service Data Sources to analyze the performance of your API Endpoints. Read on to see several practical examples that show what you can do with these Data Sources. ## Knowing what to optimize¶ Before you optimize, you need to know what to optimize. The `pipe_stats` and `pipe_stats_rt` Service Data Sources let you see how your API Endpoints are performing, so you can find causes of overhead and improve performance. These Service Data Sources provide performance data and consumption data for every single request. You can also filter and sort results by Tokens to see who is accessing your API Endpoints and how often. The difference between `pipe_stats_rt` and `pipe_stats` is that `pipe_stats` provides aggregate stats, like average request duration and total read bytes, per day, whereas `pipe_stats_rt` offers the same information but without aggregation. Every single request is stored in `pipe_stats_rt` . The examples in this guide use `pipe_stats_rt` , but you can use the same logic with `pipe_stats` if you need more than 7 days of lookback. ## Before you start¶ You need a high-level understanding of Tinybird's [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources). ### Understand the core stats¶ This guide focuses on the following fields in the `pipe_stats_rt` Service Data Source: - `pipe_name` (String): Pipe name as returned in Pipes API. - `duration` (Float): the duration in seconds of each specific request. - `read_bytes` (UInt64): How much data was scanned for this particular request. - `read_rows` (UInt64): How many rows were scanned. - `token_name` (String): The name of the Token used in a particular request. - `status_code` (Int32): The HTTP status code returned for this particular request. You can find the full schema for `pipe_stats_rt` in the [API docs](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-pipe-stats-rt). The value of `pipe_name` is "query_api" in the event as it's a Query API request. The following section covers how to monitor query performance when using the Query API. ### Use the Query API with metadata parameters¶ If you are using the [Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) to run queries in Tinybird you can still track query performance using the `pipe_stats_rt` Service Data Source. Add metadata related to the query as request parameters, as well as any existing parameters used in your query. For example, when running a query against the Query API you can leverage a parameter called `app_name` to track all queries from the "explorer" application. Here's an example using `curl`: ##### Using the metadata parameters with the Query API curl -X POST \ -H "Authorization: Bearer " \ --data "% SELECT * FROM events LIMIT {{Int8(my_limit, 10)}}" \ "https://api.tinybird.co/v0/sql?my_limit=10&app_name=explorer" When you run the following queries, use the `parameters` attribute to access those queries where `app_name` equals "explorer": ##### Simple Parameterized Query SELECT * FROM tinybird.pipe_stats_rt WHERE parameters['app_name'] = 'explorer' ## Detect errors in your API Endpoints¶ If you want to monitor the number of errors per Endpoint over the last hour, you can run the following query: ##### Errors in the last hour SELECT pipe_name, status_code, count() as error_count FROM tinybird.pipe_stats_rt WHERE status_code >= 400 AND start_datetime > now() - INTERVAL 1 HOUR GROUP BY pipe_name, status_code ORDER BY status_code desc If you have errors, the query would return something like: Pipe_a | 404 | 127 Pipe_b | 403 | 32 With one query, you can see in real time if your API Endpoints are experiencing errors, and investigate further if so. ## Analyze the performance of API Endpoints over time¶ You can also use `pipe_stats_rt` to track how long API calls take using the `duration` field, and seeing how that changes over time. API performance is directly related to how much data you are reading per request, so if your API Endpoint is dynamic, request duration varies. For instance, it might be receiving start and end date parameters that alter how long a period is being read. ##### API Endpoint performance over time SELECT toStartOfMinute(start_datetime) t, pipe_name, avg(duration) avg_duration, quantile(.95)(duration) p95_duration, count() requests FROM tinybird.pipe_stats_rt WHERE start_datetime >= {{DateTime(start_date_time, '2022-05-01 00:00:00', description="Start date time")}} AND start_datetime < {{DateTime(end_date_time, '2022-05-25 00:00:00', description="End date time")}} GROUP BY t, pipe_name ORDER BY t desc, pipe_name ## Find the endpoints that process the most data¶ You might want to find Endpoints that repeatedly scan large amounts of data. They are your best candidates for optimization to reduce time and spend. Here's an example of using `pipe_stats_rt` to find the API Endpoints that have processed the most data as a percentage of all processed data in the last 24 hours: ##### Most processed data last 24 hours WITH ( SELECT sum(read_bytes) FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR ) as total, sum(read_bytes) as processed_byte SELECT pipe_id, quantile(0.9)(duration) as p90, formatReadableSize(processed_byte) AS processed_formatted, processed_byte*100/total as percentage FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR GROUP BY pipe_id ORDER BY percentage DESC ### Include consumption of the Query API¶ If you use Tinybird's Query API to query your Data Sources directly, you probably want to include in your analysis which queries are consuming more. Whenever you use the Query API, the field `pipe_name` contain the value `query_api` . The actual query is included as part of the `q` parameter in the `url` field. You can modify the query in the previous section to extract the SQL query that's processing the data. ##### Using the Query API WITH ( SELECT sum(read_bytes) FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR ) as total, sum(read_bytes) as processed_byte SELECT if(pipe_name = 'query_api', normalizeQuery(extractURLParameter(decodeURLComponent(url), 'q')),pipe_name) as pipe_name, quantile(0.9)(duration) as p90, formatReadableSize(processed_byte) AS processed_formatted, processed_byte*100/total as percentage FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTE RVAL 24 HOUR GROUP BY pipe_name ORDER BY percentage DESC ## Monitor usage of Tokens¶ If you use your API Endpoint with different Tokens, for example if allowing different customers to check their own data, you can track and control which Tokens are being used to access these endpoints. The following example shows, for the last 24 hours, the number and size of requests per Token: ##### Token usage last 24 hours SELECT count() requests, formatReadableSize(sum(read_bytes)) as total_read_bytes, token_name FROM tinybird.pipe_stats_rt WHERE start_datetime >= now() - INTERVAL 24 HOUR GROUP BY token_name ORDER BY requests DESC To get this information, request the Token name ( `token_name` column) or id ( `token` column). Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Next steps¶ - Want to optimize further? Read[ Monitor your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . - Want to use Prometheus or Datadog?[ Consume API Endpoints in Prometheus format](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format) - Learn how to[ monitor jobs in your Workspace](https://www.tinybird.co/docs/docs/monitoring/jobs) . - Monitor the[ latency of your API Endpoints](https://www.tinybird.co/docs/docs/monitoring/latency) . - Learn how to[ build Charts of your data](https://www.tinybird.co/docs/docs/publish/charts) . --- URL: https://www.tinybird.co/docs/monitoring/health-checks Last update: 2024-12-18T15:47:50.000Z Content: --- title: "Health checks · Tinybird Docs" theme-color: "#171612" description: "Use the built-in Tinybird tools to monitor your data ingestion and API Endpoint processes." --- # Check the health of your Data Sources¶ After you have fixed all the possible errors in your source files, matched the Data Source schema to your needs, and done on-the-fly transformations, you can start ingesting data periodically. Knowing the status of your ingestion processes helps you to keep your data clean and consistent. ## Data Sources log¶ From the **Data Sources log** in your Workspace overview, you can check whether there are new rows in quarantine, if jobs are failing, or if there is any other problem. ## Operations Log¶ Select a Data Source to see the size of the Data Source, the number of rows, the number of rows in the [quarantine Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-operations/recover-from-quarantine) , and when it was last updated. The Operations log contains details of the events for the Data Source, which appears as the results of the query. ## Service Data Sources for continuous monitoring¶ [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) can help you with ingestion health checks. You can use them like any other Data Source in your Workspace, which means you can create API Endpoints to monitor your ingestion processes. Querying the 'tinybird.datasources_ops_log' directly, you can, for example, lists your ingest processes during the last week: ##### LISTING INGESTIONS IN THE LAST 7 DAYS SELECT * FROM tinybird.datasources_ops_log WHERE toDate(timestamp) > now() - INTERVAL 7 DAY ORDER BY timestamp DESC This query calculates the percentage of quarantined rows for a given period of time: ##### CALCULATE % OF ROWS THAT WENT TO QUARANTINE SELECT countIf(result != 'ok') / countIf(result == 'ok') * 100 percentage_failed, sum(rows_quarantine) / sum(rows) * 100 quarantined_rows FROM tinybird.datasources_ops_log The following query monitors the average duration of your periodic ingestion processes for a given Data Source: ##### CALCULATING AVERAGE INGEST DURATION SELECT avg(elapsed_time) avg_duration FROM tinybird.datasources_ops_log WHERE datasource_id = 't_8417d5126ed84802aa0addce7d1664f2' If you want to configure or build an external service that monitors these metrics, you need to create an API Endpoint and raise an alert when passing a threshold. When you receive an alert, you can check the quarantine Data Source or the Operations log to see what's going on and fix your source files or ingestion processes. ## Monitoring API Endpoints¶ You can use the 'pipe_stats' and 'pipe_stats_rt' [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) to monitor the performance of your API Endpoints. Every request to a Pipe is logged to 'tinybird.pipe_stats_rt' and kept in this Data Source for the last week. The following example API Endpoint aggregates the statistics for each hour for the selected Pipe. ##### PIPE\_STATS\_RT\_BY\_HR SELECT toStartOfHour(start_datetime) as hour, count() as view_count, round(avg(duration), 2) as avg_time, arrayElement(quantiles(0.50)(duration),1) as quantile_50, arrayElement(quantiles(0.95)(duration),1) as quantile_95, arrayElement(quantiles(0.99)(duration),1) as quantile_99 FROM tinybird.pipe_stats_rt WHERE pipe_id = 'PIPE_ID' GROUP BY hour ORDER BY hour 'pipe_stats' contains statistics about your Pipe Endpoints' API calls aggregated per day using intermediate states. ##### PIPE\_STATS\_BY\_DATE SELECT date, sum(view_count) view_count, sum(error_count) error_count, avgMerge(avg_duration_state) avg_time, quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_millis_array FROM tinybird.pipe_stats WHERE pipe_id = 'PIPE_ID' GROUP BY date ORDER BY date You can use these API Endpoints to trigger alerts whenever statistics pass predefined thresholds. [Export API endpoint statistics in Prometheus format](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format) to integrate with your monitoring and alerting tools. To see how you can monitor Pipes and Data Sources health in a dashboard, see [Operational Analytics in Real Time with Tinybird and Retool](https://www.tinybird.co/blog-posts/service-data-sources-and-retool). --- URL: https://www.tinybird.co/docs/monitoring/jobs Last update: 2024-12-17T18:51:47.000Z Content: --- title: "Monitor jobs in your Workspace · Tinybird Docs" theme-color: "#171612" description: "Many of the operations you can run in your Workspace are executed using jobs. jobs_log provides you with an overview of all your jobs." --- # Monitor jobs in your Workspace¶ Many operations in your Tinybird Workspace, like Imports, Copy Jobs, Sinks, and Populates, are executed as background jobs within the platform. This approach ensures that the system can handle a large volume of requests efficiently without causing timeouts or delays in your workflow. Monitoring and managing jobs, for example querying job statuses, types, and execution details, is essential for maintaining a healthy Workspace. The two mechanisms for generic job monitoring are the [Jobs API](https://www.tinybird.co/docs/docs/api-reference/jobs-api) and the `jobs_log` Data Source. You can also track more specific things using dedicated Service Data Sources, such as `datasources_ops_log` for import, replaces, or copy, or `sinks_ops_log` for sink operations, or tracking jobs across [Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations) with `organization.jobs_log` . See [Service Data Sources docs](https://www.tinybird.co/docs/docs/monitoring/service-datasources). The Jobs API and the `jobs_log` return identical information about job execution. However, the Jobs API has some limitations: It reports only on a single Workspace, returns only 100 records, from the last 48 hours. If you want to monitor jobs outside these parameters, use the `jobs_log` Data Source. ## Track a specific job¶ The most elemental use case is to track a specific job. You can do this using the Jobs API or SQL queries. ### Jobs API ¶ The Jobs API is a convenient way to programmatically check the status of a job. By sending a GET request, you can retrieve detailed information about a specific job. This method is particularly useful for integration into scripts or applications. curl \ -X GET "https://$TB_HOST/v0/jobs/{job_id}" \ -H "Authorization: Bearer $TOKEN" Replace `{job_id}` with the actual job ID. Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ### SQL queries ¶ Alternatively, you can use SQL to query the `jobs_log` Data Source from directly within a Tinybird Pipe. This method is ideal for users who are comfortable with SQL and prefer to run queries directly against the data, and then expose them with an endpoint or perform any other actions with it. SELECT * FROM tinybird.jobs_log WHERE job_id='{job_id}' Replace `{job_id}` with the desired job ID. This query retrieves all columns for the specified job, providing comprehensive details about its execution. ## Track specific job types¶ Tracking jobs by type lets you monitor and analyze all jobs of a certain category, such as all `copy` jobs. This can help you understand the performance and status of specific job types across your entire Workspace. ### Jobs API ¶ You can fetch all jobs of a specific type by making a GET request against the Jobs API: curl \ -X GET "https://$TB_HOST/v0/jobs?kind=copy" \ -H "Authorization: Bearer $TOKEN" Replace `copy` with the type of job you want to track. Make sure you have set your Tinybird host ( `$TB_HOST` ) and authorization token ( `$TOKEN` ) correctly. ### SQL queries ¶ Alternatively, you can run an SQL query to fetch all jobs of a specific type from the `jobs_log` Data Source: SELECT * FROM tinybird.jobs_log WHERE job_type='copy' Replace `copy` with the desired job type. This query retrieves all columns for jobs of the specified type. ## Track ongoing jobs¶ To keep track of jobs that are currently running, you can query the status of jobs in progress. This helps in monitoring the real-time workload and managing system performance. ### Jobs API ¶ By making an HTTP GET request to the Jobs API, you can fetch all jobs that are currently in the `working` status: curl \ -X GET "https://$TB_HOST/v0/jobs?status=working" \ -H "Authorization: Bearer $TOKEN" This call retrieves jobs that are actively running. Ensure you have set your Tinybird host ( `$TB_HOST` ) and authorization token ( `$TOKEN` ) correctly. ### SQL queries ¶ You can also use an SQL query to fetch currently running jobs from the `jobs_log` Data Source: SELECT * FROM tinybird.jobs_log WHERE status='working' This query retrieves all columns for jobs with the status `working` , allowing you to monitor ongoing operations. ## Track errored jobs¶ Tracking errored jobs is crucial for identifying and resolving issues that may arise during job execution. Jobs API or SQL queries to `jobs_log` helps you monitor jobs that errored during the execution. ### Jobs API ¶ You can use the Jobs API to fetch details of jobs that have ended in error. Use the following `curl` command to retrieve all jobs that have a status of `error`: curl \ -X GET "https://$TB_HOST/v0/jobs?status=error" \ -H "Authorization: Bearer $TOKEN" This call fetches a list of jobs that are currently in an errored state, providing details that can be used for further analysis or debugging. Make sure you've set your Tinybird host ( `$TB_HOST` ) and authorization token ( `$TOKEN` ) correctly. ### SQL queries¶ Alternatively, you can use SQL to query the `jobs_log` Data Source directly. Use the following SQL query to fetch job IDs, job types, and error messages for jobs that have encountered errors in the past day: SELECT job_id, job_type, error FROM tinybird.jobs_log WHERE status='error' AND created_at > now() - INTERVAL 1 DAY ### Track success rate¶ Extrapolating from errored jobs, you can also use `jobs_log` to calculate the success rate of your Workspace jobs: SELECT job_type, pipe_id, countIf(status='done') AS job_success, countIf(status='error') AS job_error, job_success / (job_success + job_error) as success_rate FROM tinybird.jobs_log WHERE created_at > now() - INTERVAL 1 DAY GROUP BY job_type, pipe_id ## Get job execution metadata¶ In the `jobs_log` Data Source, there is a property called `job_metadata` that contains metadata related to job executions. This includes the execution type, manual or scheduled, for Copy and Sink jobs, or the count of quarantined rows for append operations, along with many other properties. You can extract and analyze this metadata using JSON functions within SQL queries. This allows you to gain valuable information about job executions directly from the `jobs_log` Data Source. The following SQL query is an example of how to extract specific metadata fields from the `job_metadata` property, such as the import mode and counts of quarantined rows and invalid lines, and how to aggregate this data for analysis: SELECT job_type, JSONExtractString(job_metadata, 'mode') AS import_mode, sum(simpleJSONExtractUInt(job_metadata, 'quarantine_rows')) AS quarantine_rows, sum(simpleJSONExtractUInt(job_metadata, 'invalid_lines')) AS invalid_lines FROM tinybird.jobs_log WHERE job_type='import' AND created_at >= toStartOfDay(now()) GROUP BY job_type, import_mode There are many other use cases you can put together with the properties in the `job_metadata` ; see below. ## Advanced use cases¶ Beyond basic tracking, you can leverage the `jobs_log` Data Source for more advanced use cases, such as gathering statistics and performance metrics. This can help you optimize job scheduling and resource allocation. ### Get queue status¶ The following SQL query returns the number of jobs that are waiting to be executed, the number of jobs that are in progress, and how many of them are done already: SELECT job_type, countIf(status='waiting') AS jobs_in_queue, countIf(status='working') AS jobs_in_progress, countIf(status='done') AS jobs_succeeded, countIf(status='error') AS jobs_errored FROM tinybird.jobs_log WHERE created_at > now() - INTERVAL 1 DAY GROUP BY job_type ### Run time statistics grouped by type of job¶ The following SQL query calculates the maximum, minimum, median, and p95 running time (in seconds) grouped by type of job over the past day. This helps in understanding the efficiency of different job types: SELECT job_type, max(date_diff('s', started_at, updated_at)) as max_run_time_in_secs, min(date_diff('s', started_at, updated_at)) as min_run_time_in_secs, median(date_diff('s', started_at, updated_at)) as median_run_time_in_secs, quantile(0.95)(date_diff('s', started_at, updated_at)) as p95_run_time_in_secs FROM tinybird.jobs_log WHERE created_at > now() - INTERVAL 1 DAY GROUP BY job_type ### Statistics on queue time by type of job¶ The following SQL query calculates the average queue time, in seconds, for a specific type of job over the past day. This can help in identifying bottlenecks in job scheduling: SELECT job_type, max(date_diff('s', created_at, started_at)) as max_run_time_in_secs, min(date_diff('s', created_at, started_at)) as min_run_time_in_secs, median(date_diff('s', created_at, started_at)) as median_run_time_in_secs, quantile(0.95)(date_diff('s', created_at, started_at)) as p95_run_time_in_secs FROM tinybird.jobs_log WHERE created_at > now() - INTERVAL 1 DAY GROUP BY job_type ### Get statistics on job completion rate¶ The following SQL query calculates the success rate by type of job (e.g., copy) and Pipe over the past day. This can help you to assess the reliability and efficiency of your workflows by measuring the completion rate of the jobs, and find potential issues and areas for improvement: SELECT job_type, pipe_id, countIf(status='done') AS job_success, countIf(status='error') AS job_error, job_success / (job_success + job_error) as success_rate FROM tinybird.jobs_log WHERE created_at > now() - INTERVAL 1 DAY GROUP BY job_type, pipe_id ### Statistics on the amount of manual vs. scheduled run jobs¶ The following SQL query calculates the percentage rate between manual and scheduled jobs. Understanding the distribution of manually-executed jobs versus scheduled jobs can let you know about some on-demand jobs performed for some specific reasons: SELECT job_type, countIf(JSONExtractString(job_metadata, 'execution_type')='manual') AS job_manual, countIf(JSONExtractString(job_metadata, 'execution_type')='scheduled') AS job_scheduled FROM tinybird.jobs_log WHERE job_type='copy' AND created_at > now() - INTERVAL 1 DAY GROUP BY job_type ## Next steps¶ - Read up on the `jobs_log` Service Data Source specification . - Learn how to[ monitor your Workspace ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . --- URL: https://www.tinybird.co/docs/monitoring/latency Last update: 2024-12-17T18:51:47.000Z Content: --- title: "Measure API Endpoint latency · Tinybird Docs" theme-color: "#171612" description: "Latency is an essential metric to monitor in real-time applications. Learn how to measure and monitor the latency of your API Endpoints in Tinybird." --- # Measure API Endpoint latency¶ Latency is the time it takes for a request to travel from the client to the server and back; the time it takes for a request to be sent and received. Latency is usually measured in seconds or milliseconds (ms). The lower the latency, the faster the response time. Latency is an essential metric to monitor in real-time applications. Read on to learn how latency is measured in Tinybird, and how to monitor and visualize the latency of your [API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints) when data is being retrieved. ## How latency is measured¶ When measuring latency in an end-to-end application, you consider data ingestion, data transformation, and data retrieval. In Tinybird, latency is measured as the time it takes for a request to be sent, and the response to be sent back to the client. When calling an API Endpoint, you can check this metric defined as `elapsed` in the `statistics` object of the response: ##### Statistics object within an example Tinybird API Endpoint call { "meta": [ ... ], "data": [ ... ], "rows": 10, "statistics": { "elapsed": 0.001706275, "rows_read": 10, "bytes_read": 180 } } ## Monitor latency¶ To monitor the latency of your API Endpoints, use the `pipe_stats_rt` and `pipe_stats` [Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources): - `pipe_stats_rt` consists of the real-time statistics of your API Endpoints, and has a `duration` field that encapsulates the latency time in seconds. - `pipe_stats` contains the** aggregated** statistics of your API Endpoints by date, and presents a `avg_duration_state` field which is the average duration of the API Endpoint by day in seconds. Because the `avg_duration_state` field is an intermediate state, you'd need to merge it when querying the Data Source using something like `avgMerge`. Dor details on building Pipes and Endpoints that monitor the performance of your API Endpoints using the `pipe_stats_rt` and `pipe_stats` Data Sources, follow the [API Endpoint performance guide](https://www.tinybird.co/docs/docs/monitoring/analyze-endpoints-performance#example-2-analyzing-the-performance-of-api-endpoints-over-time). ## Visualize latency¶ Tinybird has built-in tools to help you visualize the latency of your API Endpoints: Time Series live internally in your Workspace, while Charts give you the option to embed in an external application. ### Time series¶ In your Workspace, you can create [a Time series](https://www.tinybird.co/docs/docs/work-with-data/query#time-series) to visualize the latency of your API Endpoints over time. Point to `pipe_stats_rt` and select `duration` and `start_datetime` , or point to or `pipe_stats` and select `avgMerge(avg_duration_state)` and `date`. ### Charts¶ If you want to expose your latency metrics in your own application, you can use a Tinybird-generated [Chart](https://www.tinybird.co/docs/docs/publish/charts) to expose the results of an Endpoint that queries the `pipe_stats_rt` or `pipe_stats` Data Source. Then, you can embed the Chart into your application by using the `iframe` code. ## Next steps¶ - Optimize even further by[ monitoring your ingestion](https://www.tinybird.co/docs/docs/monitoring/monitor-data-ingestion) . - Read this blog on[ Monitoring global API latency](https://www.tinybird.co/blog-posts/dev-qa-global-api-latency-chronark) . --- URL: https://www.tinybird.co/docs/monitoring/monitor-data-ingestion Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Monitor ingestion · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn more about how to monitor your data source ingestion in Tinybird." --- # Monitor your ingestion¶ In this guide, you can learn the basics of how to monitor your Data Source ingestion. By being aware of your ingestion pipeline and leveraging Tinybird's features, you can monitor for any issues with the [Data Flow Graph](https://www.tinybird.co/docs/docs/work-with-data/query#data-flow). Remember: Every Tinybird use case is slightly different. This guide provides guidelines and an example scenario. If you have questions or want to explore more complicated ingestion monitoring scenarios, for instance looking for outliers by using the z-score or other anomaly detection processes, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Before you start¶ You don't need an active Workspace to follow this guide, only an awareness of the [core Tinybird concepts](https://www.tinybird.co/docs/docs/get-started/quick-start/core-concepts). ## Key takeaways¶ 1. Understand and visualize your data pipeline. 2. Leverage the Tinybird platform and tools. 3. Be proactive: Build alerts. ### Understand your data pipeline and flow¶ The first step to monitoring your ingestion to Tinybird is to understand what you're monitoring at a high level. When stakeholders complain about outdated data, you and your data engineers start investigating and checking the data pipelines upstream until you find the problem. Understanding how data flows through those pipelines from the origin to the end is essential, and you should always know what your data flow looks like. ### Use built-in tools¶ Tinybird provides several tools to help you: - The[ Data Flow Graph](https://www.tinybird.co/docs/docs/work-with-data/query#data-flow) is Tinybird's data lineage diagram. It visualizes how data flows within your project. It shows all the levels of dependencies, so you can see how all your Pipes, Data Sources, and Materialized Views connect. - [ Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) are logs that allow you to keep track of almost everything happening data-wise within your system. - Use[ Time Series](https://www.tinybird.co/docs/docs/work-with-data/query#time-series) in combination with Service Data Sources to allow you to visualize data ingestion trends and issues over time. ### Build alerts¶ You can create a personalized alert system by integrating your Pipes and Endpoints that point to certain key Service Data Sources with third-party services. ## Identify high data volume resources¶ You can run a query to find the resources that handle the most data by using the `datasources_ops_log` Service Data Source. For example: WITH ( SELECT sum(read_bytes + written_bytes) FROM tinybird.datasources_ops_log WHERE event_type NOT IN ('populateview', 'populateview-queued') -- AND timestamp >= now() - INTERVAL 24 HOUR -- optional time filter ) AS total, sum(read_bytes) AS processed_read, sum(written_bytes) AS processed_written SELECT datasource_name, pipe_name AS materializing_pipe, formatReadableSize(processed_read) AS read, formatReadableSize(processed_written) AS written, formatReadableSize(processed_read + processed_written) AS total_processed, (processed_read + processed_written) / total AS percentage FROM tinybird.datasources_ops_log WHERE event_type NOT IN ('populateview', 'populateview-queued') -- AND timestamp >= now() - INTERVAL 24 HOUR -- optional time filter GROUP BY datasource_name, materializing_pipe ORDER BY percentage DESC ## Example scenario¶ In this example, a user with a passion for ornithology has built a Workspace called `bird_spotter` . They're using it to analyze the number of birds they spot in their garden and when out on hikes. It uses Tinybird's high frequency ingestion (Events API) and an updated legacy table in BigQuery, so the Data Sources are as follows: 1. `bird_records` : A dataset containing bird viewings describing the time and bird details, which the[ Events API](https://www.tinybird.co/docs/docs/get-data-in/ingest-apis/events-api) populates every day: 2. `birds_by_hour_and_country_from_copy` : An aggregated dataset of the bird views per hour and country, which a[ Copy Pipe](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes) populates every hour: 3. `tiny_bird_records` : A dataset with a list of tiny birds (i.e. hummingbirds), which Tinybird's[ BigQuery Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/bigquery) replaces every day: The three Data Sources rely on three different methods of ingestion: appending data using the high frequency API, aggregating and copying, and syncing from BigQuery. To make sure that each of these processes is happening at the scheduled time, and without errors, the user needs to implement some monitoring. ### Monitoring ingestion and spotting errors¶ The user in the example can filter the **Service Data Source** called `datasource_ops_log` by Data Source and ingestion method. By building a quick **Time Series** , they can immediately see the "shape" of their ingestion. The user can then build a robust system for monitoring. Instead of only focusing on the ingestion method, they can create 3 different Pipes that have specific logic, and expose each Pipe as a queryable Endpoint. Each Endpoint aggregates key information about each ingestion method, and count and flag errors. #### Endpoint 1: Check append-hfi operations in bird\_records¶ SELECT toDate(timestamp) as date, sum(if(result = 'error', 1, 0)) as error_count, count() as append_count, if(append_count > 0, 1, 0) as append_flag FROM tinybird.datasources_ops_log WHERE datasource_name = 'bird_records' AND event_type = 'append-hfi' GROUP BY date ORDER BY date DESC #### Endpoint 2: Check copy operations in birds\_by\_hour\_and\_country\_from\_copy¶ SELECT toDate(timestamp) as date, sum(if(result = 'error', 1, 0)) as error_count, count() as copy_count, if(copy_count >= 24, 1, 0) as copy_flag FROM tinybird.datasources_ops_log WHERE datasource_name = 'birds_by_hour_and_country_from_copy' AND event_type = 'copy' GROUP BY date ORDER BY date DESC #### Endpoint 3: Check replace operations in tiny\_bird\_records¶ SELECT toDate(timestamp) as date, sum(if(result = 'error', 1, 0)) as error_count, count() as replace_count, if(replace_count > 0, 1, 0) as replace_flag FROM tinybird.datasources_ops_log WHERE datasource_name = 'tiny_bird_records' AND event_type = 'replace' GROUP BY date ORDER BY date DESC ### Using the output¶ Because the Pipes expose API Endpoints, they can be consumed by any third-party app to build real-time alerts. The preferred way to integrate Tinybird data with your monitoring and alerting tools is by [exporting Pipe API Endpoints in Prometheus format](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format) , as it provides seamless compatibility with tools like Prometheus, Grafana or Datadog. Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ### Example GitHub Actions implementation¶ In the `bird_spotter` example repo , you can see the `scripts` and `workflows` that the user has built: - `ingest.py` and `monitor.py` are Python scripts that run daily. The first ingests data in this case from a sample csv and the second checks if the append, copy, and sync operations have happened and are error-free. Because this guide is an example scenario, there's a function that randomly chooses not to ingest, so there's always an error present. - `ingest.yml` and `monitor.yml` are yaml files that schedule those daily runs. The output of a daily check would look something like this: INFO:__main__:Alert! Ingestion operation missing. Last ingestion date isn't today: 2024-04-16 INFO:__main__:Last copy_count count is equal to 9. All fine! INFO:__main__:Last replace_count count is equal to 1. All fine! INFO:__main__:Alerts summary: INFO:__main__:Append error count: 1 INFO:__main__:Copy error count: 0 INFO:__main__:Replace error count: 0 In this instance, the ingestion script has randomly failed to append new data, and triggers an alert that the user can action. In contrast, copy operations and replace counts have run as expected: 9 copies and 1 BigQuery sync occurred since 00:00. ## Example scenario: Detect out-of-sync Data Sources¶ Some Tinybird Connectors like BigQuery or Snowflake use async jobs to keep your Data Sources up to date. These jobs produce records with the result sent to the `datasources_ops_log` Service Data Source, both for successful and failed runs. The following example configures a new Tinybird Endpoint that reports Data Sources that are out of sync. It's then possible to leverage that data in your monitoring tool of choice, such as Grafana, Datadog, UptimeRobot, and others. ### Endpoint: Get out of sync Data Sources using datasources\_ops\_log¶ To get the Data Sources that haven't been successfully updated in the last hour, check their sync jobs results in the `datasources_ops_log`: select datasource_id, argMax(datasource_name, timestamp) as datasource_name, max(case when result = 'ok' then timestamp end) as last_successful_sync from tinybird.datasources_ops_log where arrayExists(x -> x in ('bigquery','snowflake'), Options.Values) and toDate(timestamp) >= today() - interval 30 days and result = 'ok' group by datasource_id having max(event_type = 'delete') = false and last_successful_sync < now() - interval 1 hour ## Next steps¶ - Read the in-depth docs on Tinybird's[ Service Data Sources](https://www.tinybird.co/docs/docs/monitoring/service-datasources) . - Want to use Prometheus or Datadog?[ Consume API Endpoints in Prometheus format](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format) . - Learn how to[ Optimize your data project](https://www.tinybird.co/docs/docs/work-with-data/optimization) . - Learn about the difference between log* analytics* and log* analysis* in the blog[ "Log Analytics: how to identify trends and correlations that Log Analysis tools can't"](https://www.tinybird.co/blog-posts/log-analytics-how-to-identify-trends-and-correlations-that-log-analysis-tools-cannot) . --- URL: https://www.tinybird.co/docs/monitoring/service-datasources Last update: 2025-01-09T09:53:21.000Z Content: --- title: "Service Data Sources · Tinybird Docs" theme-color: "#171612" description: "In addition to the Data Sources you upload, Tinybird provides other "Service Data Sources" that allow you to inspect what's going on in your account." --- # Service Data Sources¶ Tinybird provides Service Data Sources that you can use to inspect what's going on in your Tinybird account, diagnose issues, monitor usage, and so on. For example, you can get real time stats about API calls or a log of every operation over your Data Sources. This is similar to using system tables in a database, although Service Data Sources contain information about the usage of the service itself. Queries made to Service Data Sources are free of charge and don't count towards your usage. However, calls to API Endpoints that use Service Data Sources do count towards API rate limits. See [Billing](https://www.tinybird.co/docs/docs/get-started/plans/billing). ## Considerations¶ - You can't use Service Data Sources in Materialized View queries. - Pass dynamic[ query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters#leverage-dynamic-parameters) to API Endpoints to then query Service Data Sources. - You can only query Organization-level Service Data Sources if you're an administrator. See[ Consumption overview](https://www.tinybird.co/docs/docs/get-started/administration/organizations#consumption-overview) . ## Service Data Sources¶ The following Service Data Sources are available. ### tinybird.pipe\_stats\_rt¶ Contains information about all requests made to your [API Endpoints](https://www.tinybird.co/docs/docs/publish/api-endpoints) in real time. This Data Source has a TTL of 7 days. If you need to query data older than 7 days you must use the aggregated by day data available at [tinybird.pipe_stats](https://www.tinybird.co/docs/about:blank#tinybird-pipe-stats). | Field | Type | Description | | --- | --- | --- | | `start_datetime` | `DateTime` | API call start date and time. | | `pipe_id` | `String` | Pipe Id as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) ( `query_api` in case it's a Query API request). | | `pipe_name` | `String` | Pipe name as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) ( `query_api` in case it's a Query API request). | | `duration` | `Float` | API call duration in seconds. | | `read_bytes` | `UInt64` | API call read data in bytes. | | `read_rows` | `UInt64` | API call rows read. | | `result_rows` | `UInt64` | Rows returned by the API call. | | `url` | `String` | URL ( `token` param is removed for security reasons). | | `error` | `UInt8` | `1` if query returned error, else `0` . | | `request_id` | `String` | API call identifier returned in `x-request-id` header. Format is ULID string. | | `token` | `String` | API call token identifier used. | | `token_name` | `String` | API call token name used. | | `status_code` | `Int32` | API call returned status code. | | `method` | `String` | API call method POST or GET. | | `parameters` | `Map(String, String)` | API call parameters used. | | `release` | `String` | Semantic version of the release (deprecated). | | `user_agent` | `Nullable(String)` | User Agent HTTP header from the request. | | `resource_tags` | `Array(String)` | Tags associated with the Pipe when the request was made. | | `cpu_time` | `Float` | CPU time used by the query in seconds. | ### tinybird.pipe\_stats¶ Aggregates the request stats in [tinybird.pipe_stats_rt](https://www.tinybird.co/docs/about:blank#tinybird-pipe-stats-rt) by day. | Field | Type | Description | | --- | --- | --- | | `date` | `Date` | Request date and time. | | `pipe_id` | `String` | Pipe Id as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) . | | `pipe_name` | `String` | Name of the Pipe. | | `view_count` | `UInt64` | Request count. | | `error_count` | `UInt64` | Number of requests with error. | | `avg_duration_state` | `AggregateFunction(avg, Float32)` | Average duration state in seconds (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `quantile_timing_state` | `AggregateFunction(quantilesTiming(0.9, 0.95, 0.99), Float64)` | 0.9, 0.95 and 0.99 quantiles state. Time in milliseconds (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `read_bytes_sum` | `UInt64` | Total bytes read. | | `read_rows_sum` | `UInt64` | Total rows read. | | `resource_tags` | `Array(String)` | All the tags associated with the resource when the aggregated requests were made. | ### tinybird.bi\_stats\_rt¶ Contains information about all requests to your [BI Connector interface](https://www.tinybird.co/docs/docs/work-with-data/query/bi-connector) in real time. This Data Source has a TTL of 7 days. If you need to query data older than 7 days you must use the aggregated by day data available at [tinybird.bi_stats](https://www.tinybird.co/docs/about:blank#tinybird-bi-stats). | Field | Type | Description | | --- | --- | --- | | `start_datetime` | `DateTime` | Query start timestamp. | | `query` | `String` | Executed query. | | `query_normalized` | `String` | Normalized executed query. This is the pattern of the query, without literals. Useful to analyze usage patterns. | | `error_code` | `Int32` | Error code, if any. `0` on normal execution. | | `error` | `String` | Error description, if any. Empty otherwise. | | `duration` | `UInt64` | Query duration in milliseconds. | | `read_rows` | `UInt64` | Read rows. | | `read_bytes` | `UInt64` | Read bytes. | | `result_rows` | `UInt64` | Total rows returned. | | `result_bytes` | `UInt64` | Total bytes returned. | ### tinybird.bi\_stats¶ Aggregates the stats in [tinybird.bi_stats_rt](https://www.tinybird.co/docs/about:blank#tinybird-bi-stats-rt) by day. | Field | Type | Description | | --- | --- | --- | | `date` | `Date` | Stats date. | | `database` | `String` | Database identifier. | | `query_normalized` | `String` | Normalized executed query. This is the pattern of the query, without literals. Useful to analyze usage patterns. | | `view_count` | `UInt64` | Requests count. | | `error_count` | `UInt64` | Error count. | | `avg_duration_state` | `AggregateFunction(avg, Float32)` | Average duration state in milliseconds (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `quantile_timing_state` | `AggregateFunction(quantilesTiming(0.9, 0.95, 0.99), Float64)` | 0.9, 0.95 and 0.99 quantiles state. Time in milliseconds (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `read_bytes_sum` | `UInt64` | Total bytes read. | | `read_rows_sum` | `UInt64` | Total rows read. | | `avg_result_rows_state` | `AggregateFunction(avg, Float32)` | Total bytes returned state (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `avg_result_bytes_state` | `AggregateFunction(avg, Float32)` | Total rows returned state (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | ### tinybird.block\_log¶ The Data Source contains details about how Tinybird ingests data into your Data Sources. You can use this Service Data Source to spot problematic parts of your data. | Field | Type | Description | | --- | --- | --- | | `timestamp` | `DateTime` | Date and time of the block ingestion. | | `import_id` | `String` | Id of the import operation. | | `job_id` | `Nullable(String)` | Id of the job that ingested the block of data, if it was ingested by URL. In this case, `import_id` and `job_id` must have the same value. | | `request_id` | `String` | Id of the request that performed the operation. In this case, `import_id` and `job_id` must have the same value. Format is ULID string. | | `source` | `String` | Either the URL or `stream` or `body` keywords. | | `block_id` | `String` | Block identifier. You can cross this with the `blocks_ids` column from the[ tinybird.datasources_ops_log](https://www.tinybird.co/docs/about:blank#tinybird-datasources-ops-log) Service Data Source. | | `status` | `String` | `done` | `error` . | | `datasource_id` | `String` | Data Source consistent id. | | `datasource_name` | `String` | Data Source name when the block was ingested. | | `start_offset` | `Nullable(Int64)` | The starting byte of the block, if the ingestion was split, where this block started. | | `end_offset` | `Nullable(Int64)` | If split, the ending byte of the block. | | `rows` | `Nullable(Int32)` | How many rows it ingested. | | `parser` | `Nullable(String)` | Whether the native block parser or falling back to row by row parsing is used. | | `quarantine_lines` | `Nullable(UInt32)` | If any, how many rows went into the quarantine Data Source. | | `empty_lines` | `Nullable(UInt32)` | If any, how many empty lines were skipped. | | `bytes` | `Nullable(UInt32)` | How many bytes the block had. | | `processing_time` | `Nullable(Float32)` | How long it took in seconds. | | `processing_error` | `Nullable(String)` | Detailed message in case of error. | When Tinybird ingests data from a URL, it splits the download in several requests, resulting in different ingestion blocks. The same happens when the data upload happens with a multipart request. ### tinybird.datasources\_ops\_log¶ Contains all operations performed to your Data Sources. Tinybird tracks the following operations: | Event | Description | | --- | --- | | `create` | A Data Source is created. | | `sync-dynamodb` | Initial synchronization from a DynamoDB table when using the[ DynamoDB Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) | | `append` | Append operation. | | `append-hfi` | Append operation using the[ High-frequency Ingestion API](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api) . | | `append-kafka` | Append operation using the[ Kafka Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/kafka) . | | `append-dynamodb` | Append operation using the[ DynamoDB Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/dynamodb) | | `replace` | A replace operation took place in the Data Source. | | `delete` | A delete operation took place in the Data Source. | | `truncate` | A truncate operation took place in the Data Source. | | `rename` | The Data Source was renamed. | | `populateview-queued` | A populate operation was queued for execution. | | `populateview` | A finished populate operation (up to 8 hours after it started). | | `copy` | A copy operation took place in the Data Source. | | `alter` | An alter operation took place in the Data Source. | Materializations are logged with same `event_type` and `operation_id` as the operation that triggers them. You can track the materialization Pipe with `pipe_id` and `pipe_name`. Tinybird logs all operations with the following information in this Data Source: | Field | Type | Description | | --- | --- | --- | | `timestamp` | `DateTime` | Date and time when the operation started. | | `event_type` | `String` | Operation being logged. | | `operation_id` | `String` | Groups rows affected by the same operation. Useful for checking materializations triggered by an append operation. | | `datasource_id` | `String` | Id of your Data Source. The Data Source id is consistent after renaming operations. You should use the id when you want to track name changes. | | `datasource_name` | `String` | Name of your Data Source when the operation happened. | | `result` | `String` | `ok` | `error` | | `elapsed_time` | `Float32` | How much time the operation took in seconds. | | `error` | `Nullable(String)` | Detailed error message if the result was error. | | `import_id` | `Nullable(String)` | Id of the import operation, if data has been ingested using one of the following operations: `create` , `append` or `replace` | | `job_id` | `Nullable(String)` | Id of the job that performed the operation, if any. If data has been ingested, `import_id` and `job_id` must have the same value. | | `request_id` | `String` | Id of the request that performed the operation. If data has been ingested, `import_id` and `request_id` must have the same value. Format is ULID string. | | `rows` | `Nullable(UInt64)` | How many rows the operations affected. This depends on `event_type` : for the `append` event, how many rows got inserted; for `delete` or `truncate` events, how many rows the Data Source had; for `replace` , how many rows the Data Source has after the operation. | | `rows_quarantine` | `Nullable(UInt64)` | How many rows went into the quarantine Data Source, if any. | | `blocks_ids` | `Array(String)` | List of blocks ids used for the operation. See the[ tinybird.block_log](https://www.tinybird.co/docs/about:blank#tinybird-block-log) Service Data Source for more details. | | `options` | `Nested(Names String, Values String)` | Tinybird stores key-value pairs with extra information for some operations. For the `replace` event, Tinybird uses the `rows_before_replace` key to track how many rows the Data Source had before the replacement happened, the `replace_condition` key shows what condition was used. For `append` and `replace` events, Tinybird stores the data `source` , for example the URL, or body/stream keywords. For `rename` event, `old_name` and `new_name` . For `populateview` you can find there the whole populate `job` metadata as a JSON string. For `alter` events, Tinybird stores `operations` , and dependent pipes as `dependencies` if they exist. | | `read_bytes` | `UInt64` | Read bytes in the operation. | | `read_rows` | `UInt64` | Read rows in the operation. | | `written_rows` | `UInt64` | Written rows in the operation. | | `written_bytes` | `UInt64` | Written bytes in the operation. | | `written_rows_quarantine` | `UInt64` | Quarantined rows in the operation. | | `written_bytes_quarantine` | `UInt64` | Quarantined bytes in the operation. | | `pipe_id` | `String` | If present, materialization Pipe id as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) . | | `pipe_name` | `String` | If present, materialization Pipe name as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) . | | `release` | `String` | Semantic version of the release (deprecated). | | `resource_tags` | `Array(String)` | Tags associated with the Pipe when the request was made. | | `cpu_time` | `Float32` | CPU time used by the operation in seconds. | ### tinybird.datasource\_ops\_stats¶ Data from `datasource_ops_log` , aggregated by day. | Field | Type | Description | | --- | --- | --- | | `event_date` | `Date` | Date of the event. | | `workspace_id` | `String` | Unique identifier for the Workspace. | | `event_type` | `String` | Name of your Data Source. | | `pipe_id` | `String` | Identifier of the Pipe. | | `pipe_name` | `String` | Name of the Pipe. | | `error_count` | `UInt64` | Number of requests with an error. | | `executions` | `UInt64` | Number of executions. | | `avg_elapsed_time_state` | `Float32` | Average time spent in elapsed state. | | `quantiles_state` | `Float32` | 0.9, 0.95 and 0.99 quantiles state. Time in milliseconds (see[ Querying _state columns](https://www.tinybird.co/docs/about:blank#querying-state-columns) ). | | `read_bytes` | `UInt64` | Read bytes in the operation. | | `read_rows` | `UInt64` | Read rows in the Sink operation. | | `written_rows` | `UInt64` | Written rows in the Sink operation. | | `read_bytes` | `UInt64` | Read bytes in the operation. | | `written_bytes` | `UInt64` | Written bytes in the operation. | | `written_rows_quarantine` | `UInt64` | Quarantined rows in the operation. | | `written_bytes_quarantine` | `UInt64` | Quarantined bytes in the operation. | | `resource_tags` | `Array(String)` | Tags associated with the Pipe when the request was made. | ### tinybird.endpoint\_errors¶ It provides the last 30 days errors of your published endpoints. Tinybird logs all errors with additional information in this Data Source. | Field | Type | Description | | --- | --- | --- | | `start_datetime` | `DateTime` | Date and time when the API call started. | | `request_id` | `String` | The id of the request that performed the operation. Format is ULID string. | | `pipe_id` | `String` | If present, Pipe id as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) . | | `pipe_name` | `String` | If present, Pipe name as returned in our[ Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) . | | `params` | `Nullable(String)` | URL query params included in the request. | | `url` | `Nullable(String)` | URL pathname. | | `status_code` | `Nullable(Int32)` | HTTP error code. | | `error` | `Nullable(String)` | Error message. | | `resource_tags` | `Array(String)` | Tags associated with the Pipe when the request was made. | ### tinybird.kafka\_ops\_log¶ Contains all operations performed to your Kafka Data Sources during the last 30 days. | Field | Type | Description | | --- | --- | --- | | `timestamp` | `DateTime` | Date and time when the operation took place. | | `datasource_id` | `String` | Id of your Data Source. The Data Source id is consistent after renaming operations. You should use the id when you want to track name changes. | | `topic` | `String` | Kafka topic. | | `partition` | `Int16` | Partition number, or `-1` for all partitions. | | `msg_type` | `String` | 'info' for regular messages, 'warning' for issues related to the user's Kafka cluster, deserialization or Materialized Views, and 'error' for other issues. | | `lag` | `Int64` | Number of messages behind for the partition. This is the difference between the high-water mark and the last commit offset. | | `processed_messages` | `Int32` | Messages processed for a topic and partition. | | `processed_bytes` | `Int32` | Amount of bytes processed. | | `committed_messages` | `Int32` | Messages ingested for a topic and partition. | | `msg` | `String` | Information in the case of warnings or errors. Empty otherwise. | ### tinybird.datasources\_storage¶ Contains stats about your Data Sources storage. Tinybird logs maximum values per hour, the same as when it calculates storage consumption. | Field | Type | Description | | --- | --- | --- | | `datasource_id` | `String` | Id of your Data Source. The Data Source id is consistent after renaming operations. You should use the id when you want to track name changes. | | `datasource_name` | `String` | Name of your Data Source. | | `timestamp` | `DateTime` | When storage was tracked. By hour. | | `bytes` | `UInt64` | Max number of bytes the Data Source has, not including quarantine. | | `rows` | `UInt64` | Max number of rows the Data Source has, not including quarantine. | | `bytes_quarantine` | `UInt64` | Max number of bytes the Data Source has in quarantine. | | `rows_quarantine` | `UInt64` | Max number of rows the Data Source has in quarantine. | ### tinybird.releases\_log (deprecated)¶ Contains operations performed to your releases. Tinybird tracks the following operations: | Event | Description | | --- | --- | | `init` | First Release is created on Git sync. | | `override` | Release commit is overridden. `tb init --override-commit {{commit}}` . | | `deploy` | Resources from a commit are deployed to a Release. | | `preview` | Release status is changed to preview. | | `promote` | Release status is changed to live. | | `post` | Resources from a commit are deployed to the live Release. | | `rollback` | Rollback is done a previous Release is now live. | | `delete` | Release is deleted. | Tinybird logs all operations with additional information in this Data Source. | Field | Type | Description | | `timestamp` | `DateTime64` | Date and time when the operation took place. | | `event_type` | `String` | Name of your Data Source. | | `semver` | `String` | Semantic version identifies a release. | | `commit` | `String` | Git sha commit related to the operation. | | `token` | `String` | API call token identifier used. | | `token_name` | `String` | API call token name used. | | `result` | `String` | `ok` | `error` | | `error` | `String` | Detailed error message | ### tinybird.sinks\_ops\_log¶ Contains all operations performed to your Sink Pipes. | Field | Type | Description | | `timestamp` | `DateTime64` | Date and time when the operation took place. | | `service` | `LowCardinality(String)` | Type of Sink (GCS, S3, and so on) | | `pipe_id` | `String` | The ID of the Sink Pipe. | | `pipe_name` | `String` | the name of the Sink Pipe. | | `token_name` | `String` | Token name used. | | `result` | `LowCardinality(String)` | `ok` | `error` | | `error` | `Nullable(String)` | Detailed error message | | `elapsed_time` | `Float64` | The duration of the operation in seconds. | | `job_id` | `Nullable(String)` | ID of the job that performed the operation, if any. | | `read_rows` | `UInt64` | Read rows in the Sink operation. | | `written_rows` | `UInt64` | Written rows in the Sink operation. | | `read_bytes` | `UInt64` | Read bytes in the operation. | | `written_bytes` | `UInt64` | Written bytes in the operation. | | `output` | `Array(String)` | The outputs of the operation. In the case of writing to a bucket, the name of the written files. | | `parameters` | `Map(String, String)` | The parameters used. Useful to debug the parameter query values. | | `options` | `Map(String, String)` | Extra information. You can access the values with `options['key']` where key is one of: file_template, file_format, file_compression, bucket_path, execution_type. | | `cpu_time` | `Float64` | The CPU time used by the sinks in seconds. | ### tinybird.data\_transfer¶ Stats of data transferred per hour by a Workspace. | Field | Type | Description | | `timestamp` | `DateTime` | Date and time data transferred is tracked. By hour. | | `event` | `LowCardinality(String)` | Type of operation generated the data (ie. `sink` ) | | `origin_provider` | `LowCardinality(String)` | Provider data was transferred from. | | `origin_region` | `LowCardinality(String)` | Region data was transferred from. | | `destination_provider` | `LowCardinality(String)` | Provider data was transferred to. | | `destination_region` | `LowCardinality(String)` | Region data was transferred to. | | `kind` | `LowCardinality(String)` | `intra` | `inter` depending if the data moves within or outside the region. | ### tinybird.jobs\_log¶ Contains all job executions performed in your Workspace. Tinybird logs all jobs with extra information in this Data Source: | Field | Type | Description | | --- | --- | --- | | `job_id` | `String` | Unique identifier for the job. | | `job_type` | `LowCardinality(String)` | Type of job execution. `delete_data` , `import` , `populateview` , `query` , `copy` , `copy_from_main` , `copy_from_branch` , `data_branch` , `deploy_branch` , `regression_tests` , `sink` , `sink_from_branch` . | | `workspace_id` | `String` | Unique identifier for the Workspace. | | `pipe_id` | `String` | Unique identifier for the Pipe. | | `pipe_name` | `String` | Name of the Pipe. | | `created_at` | `DateTime` | Timestamp when the job was created. | | `updated_at` | `DateTime` | Timestamp when the job was last updated. | | `started_at` | `DateTime` | Timestamp when the job execution started. | | `status` | `LowCardinality(String)` | Current status of the job. `waiting` , `working` , `done` , `error` , `cancelled` . | | `error` | `Nullable(String)` | Detailed error message if the result was error. | | `job_metadata` | `JSON String` | Additional metadata related to the job execution. | Learn more about how to track background jobs execution in the [Jobs monitoring guide](https://www.tinybird.co/docs/docs/monitoring/jobs). ## Use resource\_tags to better track usage¶ You can use tags that you've added to your resources, like Pipes or Data Sources, to analyze usage and cost attribution across your organization. For example, you can add tags for projects, environments, or versions and compare usage in later queries to Service Data Sources such as [tinybird.datasources_ops_stats](https://www.tinybird.co/docs/about:blank#tinybird-datasources-ops-stats) , which aggregates operations data by day. The following Service Data Sources support `resource_tags`: - `pipe_stats_rt` - `pipe_stats` - `endpoint_errors` - `organization.pipe_stats_rt` - `organization.pipe_stats` - `datasources_ops_log` - `datasources_ops_stats` - `organization.datasources_ops_log` - `organization.datasources_ops_stats` To add tags to resources, see [Organizing resources in Workspaces](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work/organizing-resources). ## Query \_state columns¶ Several of the Service Data Sources include columns suffixed with `_state` . This suffix identifies columns with values that are in an intermediate aggregated state. When reading these columns, merge the intermediate states to get the final value. To merge intermediate states, wrap the column in the original aggregation function and apply the `-Merge` combinator. For example, to finalize the value of the `avg_duration_state` column, you use the `avgMerge` function: ##### finalize the value for the avg\_duration\_state column SELECT date, avgMerge(avg_duration_state) avg_time, quantilesTimingMerge(0.9, 0.95, 0.99)(quantile_timing_state) quantiles_timing_in_ms_array FROM tinybird.pipe_stats where pipe_id = 'PIPE_ID' group by date See [Combinators](https://www.tinybird.co/docs/docs/sql-reference/functions/aggregate-functions#aggregate-function-combinators) to learn more about the `-Merge` combinator. ## Organization Service Data Sources¶ The following is a complete list of available Organization Service Data Sources: | Field | Description | | --- | --- | | `organization.workspaces` | Lists all Organization Workspaces and related information, including name, IDs, databases, plan, when it was created, and whether it has been soft-deleted. | | `organization.processed_data` | Information related to all processed data per day per workspace. | | `organization.datasources_storage` | Equivalent to tinybird.datasources_storage but with data for all Organization Workspaces. | | `organization.pipe_stats` | Equivalent to tinybird.pipe_stats but with data for all Organization Workspaces. | | `organization.pipe_stats_rt` | Equivalent to tinybird.pipe_stats_rt but with data for all Organization Workspaces. | | `organization.datasources_ops_log` | Equivalent to tinybird.datasources_ops_log but with data for all Organization Workspaces. | | `organization.data_transfer` | Equivalent to tinybird.data_transfer but with data for all Organization Workspaces. | | `organization.jobs_log` | Equivalent to tinybird.jobs_log but with data for all Organization Workspaces. | | `organization.sinks_ops_log` | Equivalent to tinybird.sinks_ops_log but with data for all Organization Workspaces. | | `organization.bi_stats` | Equivalent to tinybird.bi_stats but with data for all Organization Workspaces. | | `organization.bi_stats_rt` | Equivalent to tinybird.bi_stats_rt but with data for all Organization Workspaces. | | `organization.endpoint_errors` | Equivalent to tinybird.endpoint_errors but with data for all Organization Workspaces. | To query Organization Service Data Sources, go to any Workspace that belongs to the Organization and use the previous as regular Service Data Source from the Playground or within Pipes. Use the admin `Token of an Organization Admin` . You can also copy your admin Token and make queries using your preferred method, like `tb sql`. ### metrics\_logs Service Data Source¶ The `metrics_logs` Service Data Source is available in all the organization's workspaces. As with the rest of Organization Service Data Sources, it's only available to Organization administrators. New records for each of the metrics monitored are added every minute with the following schema: | Field | Type | Description | | --- | --- | --- | | timestamp | DateTime | Timestamp of the metric | | cluster | LowCardinality(String) | Name of the cluster | | host | LowCardinality(String) | Name of the host | | metric | LowCardinality(String) | Name of the metric | | value | String | Value of the metric | | description | LowCardinality(String) | Description of the metric | | organization_id | String | ID of your organization | The available metrics are the following: | Metric | Description | | --- | --- | | MemoryTracking | Total amount of memory, in bytes, allocated by the server. | | OSMemoryTotal | The total amount of memory on the host system, in bytes. | | InstanceType | Instance type of the host. | | Query | Number of executing queries. | | NumberCPU | Number of CPUs. | | LoadAverage1 | The whole system load, averaged with exponential smoothing over 1 minute. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only the server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO. | | LoadAverage15 | The whole system load, averaged with exponential smoothing over 15 minutes. The load represents the number of threads across all the processes (the scheduling entities of the OS kernel), that are currently running by CPU or waiting for IO, or ready to run but not being scheduled at this point of time. This number includes all the processes, not only the server. The number can be greater than the number of CPU cores, if the system is overloaded, and many processes are ready to run but waiting for CPU or IO. | | CPUUsage | The ratio of time the CPU core was running OS kernel (system) code or userspace code. This is a system-wide metric, it includes all the processes on the host machine, not just the server. This includes also the time when the CPU was under-utilized due to the reasons internal to the CPU (memory loads, pipeline stalls, branch mispredictions, running another SMT core). | --- URL: https://www.tinybird.co/docs/publish Content: --- title: "Publish data · Tinybird Docs" theme-color: "#171612" description: "Overview of publishing data using Tinybird" --- # Publish your data¶ Whatever you need from your data, you can achieve it using Tinybird. Publish it as a queryable [API Endpoint](https://www.tinybird.co/docs/docs/publish/api-endpoints) , a [Materialized View](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) , or an advanced type of Tinybird Pipe (either a [Copy Pipe](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes) or a [Sink Pipe](https://www.tinybird.co/docs/docs/publish/sinks/s3-sink#sink-pipes) ). If you're new to Tinybird and looking to learn a simple flow of ingest data > query it > publish an API Endpoint, check out our [quick start](https://www.tinybird.co/docs/docs/get-started/quick-start)! --- URL: https://www.tinybird.co/docs/publish/api-endpoints Last update: 2024-10-16T16:43:40.000Z Content: --- title: "API Endpoints · Tinybird Docs" theme-color: "#171612" description: "API Endpoints make it easy to use the results of your queries in applications." --- # API Endpoints¶ Tinybird can turn any Pipe into an API Endpoint that you can query. For example, you can ingest your data, build SQL logic inside a Pipe, and then publish the result of your query as an HTTP API Endpoint. You can then create interactive [Charts](https://www.tinybird.co/docs/docs/publish/charts) of your data. API Endpoints make it easy to use the results of your queries in applications. Any app that can run an HTTP GET can use Tinybird API Endpoints. Tinybird represents API Endpoints using the icon. ## Create an API Endpoint¶ To create an API Endpoint, you first need a [Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) . You can publish any of the queries in your Pipes as an API Endpoint. ### Using the UI¶ First, [create a Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#creating-pipes-in-the-ui) in the UI. In the Pipe, select **Create API Endpoint** , then select the node that you want to publish. You can export a CSV file with the extracted data by selecting **Export CSV**. ### Using the CLI¶ First, [create a Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#creating-pipes-in-the-cli) using the Tinybird CLI. Use the following command to publish an API Endpoint from the CLI. This automatically selects the final node in the Pipe. tb pipe publish PIPE_NAME_OR_ID If you want to manually select a different node to publish, supply the node name as the final command argument: tb pipe publish PIPE_NAME_OR_ID NODE_NAME ## Secure your API Endpoints¶ Access to the APIs you publish in Tinybird are also protected with Tokens. You can limit which operations a specific Token can do through scopes. For example, you can create Tokens that are only able to do admin operations on Tinybird resources, or only have `READ` permission for a specific Data Source. See [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) to understand how they work and see what types are available. ## API gateways¶ API gateways allow you to cloak or rebrand Tinybird API Endpoints while meeting additional security and compliance requirements. When you publish an [API Endpoint](https://www.tinybird.co/docs/docs/publish/api-endpoints) in Tinybird, it's available through `api.tinybird.co` or the API Gateway URL that corresponds to your [Workspace](https://www.tinybird.co/docs/docs/get-started/administration/workspaces) region. See [API Endpoint URLs](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) . API Endpoints are secured using [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) that are managed inside your Tinybird Workspace. Sometimes you might want to put the Tinybird API Endpoints behind an API Gateway. For example: - To present a unified brand experience to your users. - To avoid exposing Tokens and the underlying technology. - To comply with regulations around data privacy and security. - To add Tinybird to an existing API architecture. ### Alternative approaches¶ You can meet the requirements satisfied by an API gateway through other methods. - Use[ JSON Web Tokens (JWTs)](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#json-web-tokens-jwts) to habe your application call Tinybird API Endpoints from the frontend without proxying through your backend. - Appropriately scope the Token used inside your application. Exposing a read-only Token has limited security concerns as it can't be used to modify data. You can invalidate the Token at any time. - Use row-level security to ensure that an Token only provides access to the appropriate data. ### Amazon API Gateway¶ The steps to create a reverse proxy using Amazon API Gateway are as follows: 1. Access the API Gateway console. 2. Select** Create API** , then** HTTP API** . 3. Select** Add Integration** and then select** HTTP** . 4. Configure the integration with the method** GET** and the full URL to your Tinybird API with its Token. For example, `https://api.tinybird.co/v0/pipes/top-10-products.json?token=p.eyJ1Ijog...` 5. Set a name for the API and select** Next** . 6. On the** Routes** page, set the method to** GET** and configure the desired** Resource path** . For example,** /top-10-products** . 7. Go through the rest of the step to create the API. You can find more information about applying a custom domain name in the [Amazon API Gateway documentation](https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-custom-domain-names.html). ### Google Cloud Apigee¶ The steps to create a reverse proxy using Apigee are as follows: 1. Access the Apigee console. 2. Add a new** Reverse Proxy** . 3. Add your** Base path** . For example,** /top-10-products** . 4. Add the** Target** . For example, `https://api.tinybird.co/v0/pipes/top-10-products.json?token=p.eyJ1Ijog...` 5. Select** Pass through** for security. 6. Select an environment to deploy the API to. 7. Deploy, and test the API. You can find more information about applying a custom domain name in the [Apigee documentation](https://cloud.google.com/apigee/docs/api-platform/publish/portal/custom-domain). ### Grafbase Edge Gateway¶ To create a new Grafbase Edge Gateway using the Grafbase CLI, follow these steps. Inside a new directory, run: npx grafbase init --template openapi-tinybird In Tinybird, open your API Endpoint page. Select **Create Chart**, **Share this API Endpoint** , then select **OpenAPI 3.0** . Copy the link that appears, including the full Token. Create an `.env` file using the following template and enter the required details. # TINYBIRD_API_URL is the URL for your published API Endpoint TINYBIRD_API_URL= # TINYBIRD_API_TOKEN is the Token with READ access to the API Endpoint TINYBIRD_API_TOKEN= # TINYBIRD_API_SCHEMA is the OpenAPI 3.0 spec URL copied from the API Endpoint docs page TINYBIRD_API_SCHEMA= You can now run the Grafbase Edge Gateway locally: npx grafbase dev Open the local Pathfinder at `http://127.0.0.1:4000` to test your Edge Gateway. Here is an example GraphQL query: query Tinybird { tinybird { topPages { data { action payload } rows } } } Make sure to replace `topPages` with the name of your API Endpoint. ### NGINX¶ The following is an example NGINX configuration file that handles a `GET` request and make the request to Tinybird on your behalf. The Token is only accessed server-side and never exposed to the user. worker_processes 1; events { worker_connections 1024; } http { server { listen 8080; server_name localhost; location /top-10-products { proxy_pass https://api.tinybird.co/v0/pipes/top-10-products.json?token=p.eyJ1Ijog...; } } } ## Query API and API Endpoints¶ The [Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) is similar to running SQL statements against a normal database instead of using it as your backend, which is useful for ad-hoc queries. Publish API Endpoints instead of using the Query API in the following situations: - You want to build and maintain all the logic in Tinybird and call the API Endpoint to fetch the result. - You want to use incremental nodes in your Pipes to simplify the development and maintenance of your queries. - You need support for query parameters and more complex logic using the Tinybird[ templating language](https://www.tinybird.co/docs/docs/cli/advanced-templates) . - You need to incorporate changes from your query, which means little downstream impact. You can monitor performance of individual API Endpoints using [pipe_stats_rt](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-pipe-stats-rt) and [pipe_stats](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-pipe-stats) , uncovering optimization opportunities. All requests to the Query API are grouped together, making it more difficult to monitor performance of a specific query. ## Errors and retries¶ API Endpoints return standard HTTP success or error codes. For errors, the response also includes extra information about what went wrong, encoded in the response as JSON. ### Error codes¶ API Endpoints might return the following HTTP error codes: | Code | Description | | --- | --- | | 400 | Bad request. A `HTTP400` can be returned in several scenarios and typically represents a malformed request such as errors in your SQL queries or missing query parameters. | | 403 | Forbidden. The auth Token doesn't have the correct scopes. | | 404 | Not found. This usually occurs when the name of the API Endpoint is wrong or hasn't been published. | | 405 | HTTP Method not allowed. Requests to API Endpoints must use the `GET` method. | | 408 | Request timeout. This occurs when the query takes too long to complete by default this is 10 seconds. | | 414 | Request-URI Too Large. Not all APIs have the same limit but it's usually 2KB for GET requests. Reduce the URI length or use a POST request to avoid the limit. | | 429 | Too many requests. Usually occurs when an API Endpoint is hitting into rate limits. | | 499 | Connection closed. This occurs if the client closes the connection after 1 second, if this is unexpected increase the connection timeout on your end. | | 500 | Internal Server Error. Usually an unexpected transient service error. | Errors when running a query are usually reported as 400 Bad request or 500 Internal Server Error, depending on whether the error can be fixed by the caller or not. In those cases the API response has an additional HTTP header, `X-DB-Exception-Code` where you can check the internal database error, reported as a stringified number. For a full list of internal database errors, see [List of API Endpoint database errors](https://www.tinybird.co/docs/list-of-errors). ### Retries¶ When implementing an API Gateway, make sure to handle potential errors and implement retry strategies where appropriate. Implement automatic retries for the following errors: - HTTP 429: Too many requests - HTTP 500: Internal Server Error Follow an exponential backoff when retrying requests that produce the previous errors. See [Exponential backoff](https://en.wikipedia.org/wiki/Exponential_backoff) in Wikipedia. ### Token limitations with API Gateways¶ When using an API Gateway or proxy between your application and Tinybird, your proxy uses a Token to authenticate requests to Tinybird. Treat the Token as a secret and don't expose it to the client. Use a service such as Unkey to add multi-tenant API keys, rate limiting, and token usage analytics to your app at scale. ## Free plan limits¶ The Free plan is the free tier of Tinybird. See ["Tinybird plans"](https://www.tinybird.co/docs/docs/get-started/plans) ). The Free plan has the following limits on the amount of API requests per day that you can make against published API Endpoints. | Description | Limit and time window | | --- | --- | | API Endpoint | 1,000 requests per day | | Data Sources storage | 10 gigabytes in total | These limits don't apply to paid plans. To learn more about how Tinybird bills for different data operations, see [Billing](https://www.tinybird.co/docs/docs/get-started/plans/billing). ## Next steps¶ - Read more about how to use Tokens in the[ Tokens docs](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) . - Read the guide:[ "Consume APIs in a Next.js frontend with JWTs"](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-apis-nextjs) . - Understand[ Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) . --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides Content: --- title: "API Endpoints guides · Tinybird Docs" theme-color: "#171612" description: "Guides for using Tinybird API Endpoints." --- # API Endpoints guides¶ Tinybird API Endpoints make it easy to use your data in applications. These guides show you how to integrate API Endpoints into different tools and frameworks: - Build interactive frontends by consuming APIs in Next.js with JWT authentication. - Explore and visualize data by using API Endpoints in Jupyter notebooks. - Create monitoring dashboards by integrating with Grafana. - Export metrics in Prometheus format for observability tools. Each guide provides step-by-step instructions and code examples to help you get started quickly. The guides cover common integration patterns like authentication, data filtering, and visualization. The following guides are available: --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/advanced-dynamic-endpoints-functions Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Advanced template functions for dynamic API Endpoint · Tinybird Docs" theme-color: "#171612" description: "Learn more about creating dynamic API Endpoint using advanced templates." --- # Advanced template functions for dynamic API Endpoint¶ The [Template functions section](https://www.tinybird.co/docs/docs/cli/template-functions) of the [Advanced templates docs](https://www.tinybird.co/docs/docs/cli/advanced-templates) explains functions that help you create more advanced dynamic templates. On this page, you'll learn about how these templates can be used to create dynamic API Endpoint with Tinybird. ## Prerequisites¶ Make sure you're familiar with [template functions](https://www.tinybird.co/docs/docs/cli/template-functions) and [query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters). ## Example data¶ This guide uses the eCommerce events data enriched with products. The data looks like this: ##### Events and products data SELECT *, price, city, day FROM events_mat ANY LEFT JOIN products_join_sku ON product_id = sku 17.84MB, 131.07k x 12 ( 9.29ms ) ## Tips and tricks¶ When the complexity of Pipes and API Endpoints grows, developing them and knowing what's going-on to debug problems can become challenging. Here are some useful tricks for using Tinybird's product: ### WHERE 1=1¶ When you filter by different criteria, given by dynamic parameters that can be omitted, you'll need a `WHERE` clause. But if none of the parameters are present, you'll need to add a `WHERE` statement with a dummy condition (like `1=1` ) that's always true, and then add the other filter statements dynamically if the parameters are defined, like you do in the [defined](https://www.tinybird.co/docs/about:blank#defined) example of this guide. ### Use the set function¶ The [set](https://www.tinybird.co/docs/docs/cli/advanced-templates#variables-vs-parameters) function present in the previous snippet lets you set the value of a parameter in a Node, so that you can check the output of a query depending on the value of the parameters it takes. Otherwise, you'd have to publish an API Endpoint and make requests to it with different parameters. Using `set` , you don't have to exit the Tinybird UI while creating an API Endpoint and the whole process is faster, without needing to go back and forth between your browser or IDE and Postman or cURL. Another example of its usage: ##### Using set to try out different parameter values % {% set select_cols = 'date,user_id,event,city' %} SELECT {{columns(select_cols)}} FROM events_mat 1.19MB, 24.58k x 4 ( 3.52ms ) You can use more than one `set` statement. Put each one on a separate line at the beginning of a Node. `set` is also a way to set defaults for parameters. If you used `set` statements to test your API Endpoint while developing, remember to remove them before publishing your code, because if not, the `set` function overrides any incoming parameter. ### Default argument¶ Another way to set default values for parameters is using the `default` argument that most Tinybird template functions accept. The previous code could be rewritten as follows: ##### Using the default argument % SELECT {{columns(select_cols, 'date,user_id,event,city')}} FROM events_mat 1.19MB, 24.58k x 4 ( 5.02ms ) Keep in mind that defining the same parameter in more than one place in your code in different ways can lead to inconsistent behavior. Here's a solution to avoid that: ### Using WITH statements to avoid duplicating code¶ If you plan to use the same dynamic parameters more than once in a node of a Pipe, define them in one place to avoid duplicating code. This also makes it clearer which parameters will appear in the Node. You can do this with one or more statements at the beginning of a Node, using the `WITH` clause. The WITH clause supports CTEs. These are preprocessed before executing the query, and can only return one row. This is different to other databases such as Postgres. For example: ##### DRY with the with clause % {% set terms='orchid' %} WITH {{split_to_array(terms, '1,2,3')}} AS needles SELECT *, joinGet(products_join_sku, 'color', product_id) color, joinGet(products_join_sku, 'title', product_id) title FROM events WHERE multiMatchAny(lower(color), needles) OR multiMatchAny(lower(title), needles) 5.53MB, 49.15k x 7 ( 12.84ms ) ### Documenting your API Endpoints¶ Tinybird creates auto-generated documentation for all your published API Endpoints, taking the information from the dynamic parameters found in the Pipe. It's best practice to set default values and descriptions for every parameter in one place (also because some functions don't accept a description, for example). This is typically done in the final Node, with `WITH` statements at the beginning. See how to do it in the [last section](https://www.tinybird.co/docs/about:blank#putting-it-all-together) of this guide. ### Hidden parameters¶ If you use some functions like `enumerate_with_last` in the example [below](https://www.tinybird.co/docs/about:blank#enumerate-with-last) , you'll end up with some variables (called `x`, `last` in that code snippet) that Tinybird will interpret as if they were parameters that you can set, and they will appear in the auto-generated documentation page. To avoid that, add a leading underscore to their name, renaming `x` to `_x` and `last` to `_last`. ### Debugging any query¶ Tinybird has an experimental feature that lets you see how the actual SQL code that will be run on Tinybird for any published API Endpoint looks, interpolating the query string parameters that you pass in the request URL. If you have a complex query and you'd like to know what is the SQL that will be run, [contact Tinybird](https://www.tinybird.co/docs/mailto:support@tinybird.co) to get access to this feature to debug a query. ## Advanced functions¶ Most of these functions also appear in the [Advanced templates](https://www.tinybird.co/docs/docs/cli/advanced-templates?#template-functions) section of the docs. The following are practical examples of their usage so that it's easier for you to understand how to use them. ### defined¶ The `defined` function lets you check if a query string parameter exists in the request URL or not. Imagine you want to filter events with a price within a minimum or a maximum price, set by two dynamic parameters that could be omitted. A way to define the API Endpoint would be like this: ##### filter by price % {% set min_price=20 %} {% set max_price=50 %} SELECT *, price FROM events_mat WHERE 1 = 1 {% if defined(min_price) %} AND price >= {{Float32(min_price)}} {% end %} {% if defined(max_price) %} AND price <= {{Float32(max_price)}} {% end %} 3.34MB, 24.58k x 8 ( 7.60ms ) To see the effect of having a parameter not defined, use `set` to set its value to `None` like this: ##### filter by price, price not defined % {% set min_price=None %} {% set max_price=None %} SELECT *, price FROM events_mat WHERE 1 = 1 {% if defined(min_price) %} AND price >= {{Float32(min_price)}} {% end %} {% if defined(max_price) %} AND price <= {{Float32(max_price)}} {% end %} 1.12MB, 8.19k x 8 ( 7.29ms ) It's also possible to provide smart defaults to avoid needing to use the `defined` function at all: ##### filter by price with default values % SELECT *, price FROM events_mat_cols WHERE price >= {{Float32(min_price, 0)}} AND price <= {{Float32(max_price, 999999999)}} ### Array(variable\_name, 'type', \[default\])¶ Transforms a comma-separated list of values into a Tuple. You can provide a default value for it or not: % SELECT {{Array(code, 'UInt32', default='13412,1234123,4123')}} AS codes_1, {{Array(code, 'UInt32', '13412,1234123,4123')}} AS codes_2, {{Array(code, 'UInt32')}} AS codes_3 To filter events whose type belongs to the ones provided in a dynamic parameter, separated by commas, you'd define the API Endpoint like this: ##### Filter by list of elements % SELECT * FROM events WHERE event IN {{Array(event_types, 'String', default='buy,view')}} 2.76MB, 24.58k x 5 ( 5.46ms ) And then the URL of the API Endpoint would be something like `{% user("apiHost") %}/v0/pipes/your_pipe_name.json?event_types=buy,view` ### sql\_and¶ `sql_and` lets you create a filter with `AND` operators and several expressions dynamically, taking into account if the dynamic parameters in a template it are present in the request URL. It's not possible to use Tinybird functions inside the `{{ }}` brackets in templates. `sql_and` can only be used with the `{column_name}__{operand}` syntax. This function does the same as what you saw in the previous query: filtering a column by the values that are present in a tuple generated by `Array(...)` if `operand` is `in` , are greater than (with the `gt` operand), or less than (with the `lt` operand). Let's see an example to make it clearer: - Endpoint template code - Generated SQL ##### SQL\_AND AND COLUMN\_\_IN % SELECT *, joinGet(products_join_sku, 'section_id', product_id) section_id FROM events WHERE {{sql_and(event__in=Array(event_types, 'String', default='buy,view'), section_id__in=Array(sections, 'Int16', default='1,2'))}} 9.22MB, 81.92k x 6 ( 12.41ms ) You don't have to provide default values. If you set the `defined` argument of `Array` to `False` , when that parameter isn't provided, no SQL expression will be generated. You can see this in the next code snippet: - Endpoint template code - Generated SQL ##### defined=False % SELECT *, joinGet(products_join_sku, 'section_id', product_id) section_id FROM events WHERE {{sql_and(event__in=Array(event_types, 'String', default='buy,view'), section_id__in=Array(sections, 'Int16', defined=False))}} 3.69MB, 32.77k x 6 ( 6.84ms ) ### split\_to\_array(name, \[default\])¶ This works similarly to `Array` , but it returns an Array of Strings (instead of a tuple). You'll have to cast the result to the type you want after. As you can see here too, they behave in a similar way: ##### array and split\_to\_array % SELECT {{Array(code, 'UInt32', default='1,2,3')}}, {{split_to_array(code, '1,2,3')}}, arrayMap(x->toInt32(x), {{split_to_array(code, '1,2,3')}}), 1 in {{Array(code, 'UInt32', default='1,2,3')}}, '1' in {{split_to_array(code, '1,2,3')}} 1.00B, 1.00 x 5 ( 2.90ms ) One thing that you'll want to keep in mind is that you can't pass non-constant values (arrays, for example) to operations that require them. For example, this would fail: ##### using a non-constant expression where one is required % SELECT 1 IN arrayMap(x->toInt32(x), {{split_to_array(code, '1,2,3')}}) [Error] Element of set in IN, VALUES, or LIMIT, or aggregate function parameter, or a table function argument is not a constant expression (result column not found): arrayMap(lambda(tuple(x), toInt32(x)), ['1', '2', '3']): While processing 1 IN arrayMap(x -> toInt32(x), ['1', '2', '3']). (BAD_ARGUMENTS) If you find an error like this, you should use a Tuple instead (remember that `{{Array(...)}}` returns a tuple). This will work: ##### Use a tuple instead % SELECT 1 IN {{Array(code, 'Int32', default='1,2,3')}} 1.00B, 1.00 x 1 ( 0.94ms ) `split_to_array` is often used with [enumerate_with_last](https://www.tinybird.co/docs/about:blank#enumerate-with-last). ### column and columns¶ They let you select one or several columns from a Data Source or Pipe, given their name. You can also provide a default value. ##### columns % SELECT {{columns(cols, 'date,user_id,event')}} FROM events 3.81MB, 122.88k x 3 ( 4.52ms ) ##### column % SELECT date, {{column(user, 'user_id')}} FROM events 1.57MB, 130.82k x 2 ( 2.71ms ) ### enumerate\_with\_last¶ Creates an iterable array, returning a Boolean value that allows checking if the current element is the last element in the array. Its most common usage is to select several columns, or compute some function over them. See an example of `columns` and `enumerate_with_last` here: - Endpoint template code - Generated SQL ##### enumerate\_with\_last \+ columns % SELECT {% if defined(group_by) %} {{columns(group_by)}}, {% end %} sum(price) AS revenue, {% for last, x in enumerate_with_last(split_to_array(count_unique_vals_columns, 'section_id,city')) %} uniq({{symbol(x)}}) as {{symbol(x)}} {% if not last %},{% end %} {% end %} FROM events_enriched {% if defined(group_by) %} GROUP BY {{columns(group_by)}} ORDER BY {{columns(group_by)}} {% end %} If you use the `defined` function around a parameter it doesn't make sense to give it a default value because if it's not provided, that line will never be run. ### error and custom\_error¶ They let you return customized error responses. With `error` you can customize the error message: ##### error % {% if not defined(event_types) %} {{error('You need to provide a value for event_types')}} {% end %} SELECT *, joinGet(products_join_sku, 'section_id', product_id) section_id FROM events WHERE event IN {{Array(event_types, 'String')}} ##### error response using error {"error": "You need to provide a value for event_types"} And with `custom_error` you can also customize the response code: ##### custom\_error % {% if not defined(event_types) %} {{custom_error({'error': 'You need to provide a value for event_types', 'code': 400})}} {% end %} SELECT *, joinGet(products_join_sku, 'section_id', product_id) section_id FROM events WHERE event IN {{Array(event_types, 'String')}} ##### error response using custom\_error {"error": "You need to provide a value for event_types", "code": 400} **Note:** `error` and `custom_error` have to be placed at the start of a node or they won't work. The order should be: 1. `set` lines, to give some parameter a default value (optional) 2. Parameter validation functions: `error` and `custom_error` definitions 3. The SQL query itself ## Putting it all together¶ You've created a Pipe where you use most of these advanced techniques to filter ecommerce events. You can see its live documentation page [here](https://app.tinybird.co/gcp/europe-west3/endpoints/t_e06de80c854d45298d566b93f50840d9?token=p.eyJ1IjogIjdmOTIwMmMzLWM1ZjctNDU4Ni1hZDUxLTdmYzUzNTRlMTk5YSIsICJpZCI6ICI0NDI5OWRkZi1lY2JmLTRkZGItYmM5MS1mMWNmZjNlMjdiNDgifQ.tZ5aOMy9Vp2L2R5qCZpiwysHp9v6bnQBW9aApl1Z3F8) and play with it on Swagger [here.](https://app.tinybird.co/gcp/europe-west3/openapi?url=https%253A%252F%252Fapi.tinybird.co%252Fv0%252Fpipes%252Fopenapi.json%253Ftoken%253Dp.eyJ1IjogIjdmOTIwMmMzLWM1ZjctNDU4Ni1hZDUxLTdmYzUzNTRlMTk5YSIsICJpZCI6ICI0NDI5OWRkZi1lY2JmLTRkZGItYmM5MS1mMWNmZjNlMjdiNDgifQ.tZ5aOMy9Vp2L2R5qCZpiwysHp9v6bnQBW9aApl1Z3F8) This is its code: ##### advanced\_dynamic\_endpoints.pipe NODE events_enriched SQL > SELECT *, price, city, day FROM events_mat_cols ANY LEFT JOIN products_join_sku ON product_id = sku NODE filter_by_price SQL > % SELECT * FROM events_enriched WHERE 1 = 1 {% if defined(min_price) %} AND price >= {{Float32(min_price)}} {% end %} {% if defined(max_price) %} AND price <= {{Float32(max_price)}} {% end %} NODE filter_by_event_type_and_section_id SQL > % SELECT * FROM filter_by_price {% if defined(event_types) or defined(section_ids) %} ... WHERE {{sql_and(event__in=Array(event_types, 'String', defined=False, enum=['remove_item_from_cart','view','search','buy','add_item_to_cart']), section_id__in=Array(section_ids, 'Int32', defined=False))}} {% end %} NODE filter_by_title_or_color SQL > % SELECT * FROM filter_by_event_type_and_section_id {% if defined(search_terms) %} WHERE multiMatchAny(lower(color), {{split_to_array(search_terms)}}) OR multiMatchAny(lower(title), {{split_to_array(search_terms)}}) {% end %} NODE group_by_or_not SQL > % SELECT {% if defined(group_by) %} {{columns(group_by)}}, sum(price) AS revenue, {% for _last, _x in enumerate_with_last(split_to_array(count_unique_vals_columns)) %} uniq({{symbol(_x)}}) as {{symbol(_x)}} {% if not _last %},{% end %} {% end %} {% else %} * {% end %} FROM filter_by_title_or_color {% if defined(group_by) %} GROUP BY {{columns(group_by)}} ORDER BY {{columns(group_by)}} {% end %} NODE pagination SQL > % WITH {{Array(group_by, 'String', '', description='Comma-separated name of columns. If defined, group by and order the results by these columns. The sum of revenue will be returned')}}, {{Array(count_unique_vals_columns, 'String', '', description='Comma-separated name of columns. If both group_by and count_unique_vals_columns are defined, the number of unique values in the columns given in count_unique_vals_columns will be returned as well')}}, {{Array(search_terms, 'String', '', description='Comma-separated list of search terms present in the color or title of products')}}, {{Array(event_types, 'String', '', description="Comma-separated list of event name types", enum=['remove_item_from_cart','view','search','buy','add_item_to_cart'])}}, {{Array(section_ids, 'String', '', description="Comma-separated list of section IDs. The minimum value for an ID is 0 and the max is 50.")}} SELECT * FROM group_by_or_not LIMIT {{Int32(page_size, 100)}} OFFSET {{Int32(page, 0) * Int32(page_size, 100)}} To replicate it in your account, copy the previous code to a new file called `advanced_dynamic_endpoints.pipe` locally and run `tb push pipes/advanced_dynamic_endpoints.pipe` with our [CLI](https://www.tinybird.co/docs/docs/cli/install) to push it to your Tinybird account. --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-api-endpoints-in-grafana Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Consume API Endpoints in Grafana · Tinybird Docs" theme-color: "#171612" description: "Grafana is an awesome open source analytics & monitoring tool. In this guide, you'll learn how to create dashboards consuming Tinybird API Endpoints." --- # Consume API Endpoints in Grafana¶ [Grafana](https://grafana.com/grafana/) is an awesome open source analytics & monitoring tool. In this guide, you'll learn how to create Grafana Dashboards and Alerts consuming Tinybird API Endpoints. ## Prerequisites¶ This guide assumes you have a Tinybird Workspace with an active Data Source, Pipes, and at least one API Endpoint. You'll also need a Grafana account. ## 1. Install the Infinity plugin¶ Follow the steps in [Grafana Infinity plugin installation](https://grafana.com/grafana/plugins/yesoreyeram-infinity-datasource/?tab=installation). ## 2. Create a Grafana data source¶ Create a new Data Source using the Infinity plugin. Connections > Data sources > Add new data source > Infinity. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-ds-infinity.png&w=3840&q=75) Edit the name and complete the basic setup: **Authentication** : choose Bearer Token. Pick a token with access to all the needed endpoints or create different data sources per token. Feel free to also add a restrictive list of allowed hosts. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-ds-auth.png&w=3840&q=75) That's basically it, but it's a good practice to add a **Health check** with an endpoint that the Token has access to so you verify connectivity is OK. ## 3. Configure the Query in Grafana¶ Create a new Dashboard, edit the suggested Panel, and use the Data Source you just created. For the example you'll consume the endpoint shown in this picture. A time series of sensor temperature and humidity readings. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-endpoint-sample.png&w=3840&q=75) So, the in the Query editor is: - Type: JSON - Parser: Backend (needed for alerting and JSON parsing) - Source: URL - Format: Table - Method: GET - URL: `https://api.eu-central-1.aws.tinybird.co/v0/pipes/api_readings.json` - Parsing options & Result fields > Rows/root: `data` Needed cause the Tinybird response includes metadata, data, statistics... - Parsing options & Result fields > Columns: add the needed fields, select types, and adjust time formats. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-query.png&w=3840&q=75) It is important to use the Backend Parser for the Alerts to work and for compatibility with the root and field selectors as mentioned in the [plugin docs](https://grafana.com/docs/plugins/yesoreyeram-infinity-datasource/latest/query/backend/#root-selector--field-selector). For the Time Formats you can use plugin options. By default it uses *Default ISO* so you can simply add `formatDateTime(t,'%FT%TZ') as t` to your Tinybird API Endpoint and don't need to configure the option in Grafana. Save the dashboard and you should now be able to see a chart. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-dashboard.png&w=3840&q=75) ## 4. Using time ranges¶ When you have millions of rows it's better to filter time ranges in Tinybird than to retrieve all the data and filter later when showing the chart in Grafana. You will get faster responses and more efficient use of the resources. Edit the Tinybird Pipe to accept `start_ts` and `end_ts` parameters. % SELECT formatDateTime(timestamp,'%FT%TZ') t, temperature, humidity FROM readings {% if defined(start_ts) and defined(end_ts) %} WHERE timestamp BETWEEN parseDateTimeBestEffort({{String(start_ts)}}) AND parseDateTimeBestEffort({{String(end_ts)}}) {% end %} ORDER BY t ASC In the Query editor, next to URL, click on Headers, Request params and fill URL Query Params. Use Grafana's global variables [$__from and $__to](https://grafana.com/docs/grafana/v9.0/variables/variable-types/global-variables/#__from-and-__to) defined with the time range selector at the top right od the dashboard. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-time-ranges.png&w=3840&q=75) As said, filtering helps to have more efficient dashboards, here you can see how using filters, the scan size decreases. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-stats-filter.png&w=3840&q=75) ## 5. Dashboard variables¶ You can also define dashboard variables and use them in the query: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-dashboard-variable.png&w=3840&q=75) That will make it interactive: <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-variables.gif&w=3840&q=75) Note you have to edit the Pipe: % SELECT formatDateTime(timestamp,'%FT%TZ') t, {% if defined(magnitude) and magnitude != 'all' %} {{column(magnitude)}} {% else %} temperature, humidity {% end %} FROM readings {% if defined(start_ts) and defined(end_ts) %} WHERE timestamp BETWEEN parseDateTimeBestEffort({{String(start_ts)}}) AND parseDateTimeBestEffort({{String(end_ts)}}) {% end %} ORDER BY t ASC ## 6. Alerts¶ Infinity plugin supports alerting, so you can create your own [rules](https://grafana.com/docs/grafana/latest/alerting/alerting-rules/) . In this example alert triggers when temperature goes outside a defined range. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-alert-definition.png&w=3840&q=75) And this is what you will see in the dashboard. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fgrafana-alert-trigger.png&w=3840&q=75) Be sure to correctly set up [notifications](https://grafana.com/docs/grafana/latest/alerting/configure-notifications/) if needed. ## Note¶ Note: a previous version of this guide referred to the [JSON API plugin](https://grafana.com/grafana/plugins/marcusolsson-json-datasource/) but it was migrated to using [Infinity plugin](https://grafana.com/grafana/plugins/yesoreyeram-infinity-datasource/) since it's now the [default supported](https://grafana.com/blog/2024/02/05/infinity-plugin-for-grafana-grafana-labs-will-now-maintain-the-versatile-data-source-plugin/) one. --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-api-endpoints-in-prometheus-format Last update: 2024-12-18T15:47:50.000Z Content: --- title: "Consume API Endpoints in Prometheus format · Tinybird Docs" theme-color: "#171612" description: "Export Pipe endpoints in Prometheus format to integrate Tinybird data into your monitoring stack." --- # Consume API Endpoints in Prometheus format¶ Prometheus is a powerful open-source monitoring and alerting toolkit widely used for metrics collection and visualization. You can export Pipe endpoints in Prometheus format to integrate your Tinybird data into your monitoring stack. ## Prerequisites¶ This guide assumes you have a Tinybird Workspace with an active Data Source, Pipes, and at least one API Endpoint. ## Structure data for Prometheus¶ To export the Pipe output in Prometheus format, data must conform to the following structure: ### Mandatory columns¶ - `name (String)` : The name of the metric. - `value (Number)` : The numeric value for the metric. ### Optional columns¶ - `help (String)` : A description of the metric. - `timestamp (Number)` : A Unix timestamp for the metric. - `type (String)` : Defines the metric type ( `counter` , `gauge` , `histogram` , `summary` , `untyped` , or empty). - `labels (Map(String, String))` : A set of key-value pairs providing metric dimensions. ### Example¶ Here’s an example of a Tinybird Pipe query that outputs two metrics, `http_request_count` and `http_request_duration_seconds` , in the same query. Both metrics include labels for `method` and `status_code`. SELECT -- Metric 1: http_request_count 'http_request_count' AS name, toFloat64(count(*)) AS value, 'Total number of HTTP requests' AS help, 'counter' AS type, map('method', method, 'status_code', status_code) AS labels FROM http_requests GROUP BY method, status_code UNION ALL SELECT -- Metric 2: http_request_duration_seconds 'http_request_duration_seconds' AS name, avg(request_time) AS value, 'Average HTTP request duration in seconds' AS help, 'gauge' AS type, map('method', method, 'status_code', status_code) AS labels FROM http_requests GROUP BY method, status_code ORDER BY name ## Export Tinybird Pipe endpoint in Prometheus format¶ Export Pipe data in Prometheus format by appending .prometheus to your API endpoint URI. For example: https://api.tinybird.co/v0/pipes/your_pipe_name.prometheus The following is an example Prometheus output: # HELP http_request_count Total number of HTTP requests # TYPE http_request_count counter http_request_count{method="PUT",status_code="203"} 1 http_request_count{method="PATCH",status_code="203"} 1 http_request_count{method="DELETE",status_code="201"} 4 http_request_count{method="POST",status_code="203"} 1 http_request_count{method="OPTIONS",status_code="203"} 1 http_request_count{method="PATCH",status_code="204"} 1 http_request_count{method="PUT",status_code="204"} 1 http_request_count{method="HEAD",status_code="203"} 1 http_request_count{method="GET",status_code="201"} 4 http_request_count{method="POST",status_code="204"} 1 http_request_count{method="GET",status_code="203"} 1 http_request_count{method="POST",status_code="201"} 4 http_request_count{method="DELETE",status_code="204"} 1 http_request_count{method="OPTIONS",status_code="201"} 4 http_request_count{method="GET",status_code="204"} 1 http_request_count{method="PATCH",status_code="201"} 4 http_request_count{method="PUT",status_code="201"} 4 http_request_count{method="DELETE",status_code="203"} 1 http_request_count{method="HEAD",status_code="201"} 4 # HELP http_request_duration_seconds Average HTTP request duration in seconds # TYPE http_request_duration_seconds gauge http_request_duration_seconds{method="GET",status_code="200"} 75.01 http_request_duration_seconds{method="DELETE",status_code="201"} 11.01 http_request_duration_seconds{method="POST",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="HEAD",status_code="204"} 169.01 http_request_duration_seconds{method="PATCH",status_code="204"} 169.01 http_request_duration_seconds{method="PUT",status_code="204"} 169.01 http_request_duration_seconds{method="HEAD",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="OPTIONS",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="DELETE",status_code="200"} 75.01 http_request_duration_seconds{method="OPTIONS",status_code="204"} 169.01 http_request_duration_seconds{method="GET",status_code="201"} 11.01 http_request_duration_seconds{method="PATCH",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="PUT",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="POST",status_code="204"} 169.01 http_request_duration_seconds{method="DELETE",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="PUT",status_code="200"} 75.01 http_request_duration_seconds{method="POST",status_code="200"} 75.01 http_request_duration_seconds{method="PATCH",status_code="200"} 75.01 http_request_duration_seconds{method="POST",status_code="201"} 11.01 http_request_duration_seconds{method="DELETE",status_code="204"} 169.01 http_request_duration_seconds{method="OPTIONS",status_code="201"} 11.01 http_request_duration_seconds{method="GET",status_code="204"} 169.01 http_request_duration_seconds{method="PATCH",status_code="201"} 11.01 http_request_duration_seconds{method="PUT",status_code="201"} 11.01 http_request_duration_seconds{method="GET",status_code="202"} 102.00999999999999 http_request_duration_seconds{method="HEAD",status_code="201"} 11.01 ## Integrate endpoints in Prometheus-compatible tools¶ Now that you’ve structured and exported your Pipe data in Prometheus format, you can integrate it into monitoring and observability tools. Prometheus is widely supported by various visualization and alerting platforms, making it easy to use your Tinybird data with tools like Grafana, Datadog, and more. See the documentation for these tools to integrate them with the Prometheus endpoints from Tinybird. ## Monitoring your Tinybird Organization with Grafana and Datadog¶ Check the [Tinybird Organization metrics](https://github.com/tinybirdco/tinybird-org-metrics-exporter) repository for a working example of how to consume the Prometheus endpoints in Grafana and Datadog to monitor your Tinybird Organization. --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-apis-in-a-notebook Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Consume APIs in a Notebook · Tinybird Docs" theme-color: "#171612" description: "Notebooks are a great resource for exploring data and generating plots. In this guide, you'll learn how to consume Tinybird APIs in a notebook." --- # Consume APIs in a notebook¶ Notebooks are a great resource for exploring data and generating plots. In this guide, you'll learn how to consume Tinybird APIs in a notebook. ## Prerequisites¶ This [Colab notebook](https://github.com/tinybirdco/examples/blob/master/notebook/consume_from_apis.ipynb) uses a Data Source of updates to Wikipedia to show how to consume data from queries. There are two options: Using the [Query API](https://www.tinybird.co/docs/docs/api-reference/query-api) , and using API Endpoints using the [Pipes API](https://www.tinybird.co/docs/docs/api-reference/pipe-api) and parameters. The full code for every example in this guide can be found in the notebook. This guide assumes some familiarity with Python. ## Setup¶ Follow the setup steps in the [notebook file](https://github.com/tinybirdco/examples/blob/master/notebook/consume_from_apis.ipynb) and use the linked CSV file of Wikipedia updates to create a new Data Source in your Workspace. For less than 100 MB of data, you can fetch all the data. For calls with than 100 MB of data, you need to do it sequentially, with not more than 100 MB per API call. The solution is to get batches using Data Source sorting keys. Selecting the data by columns used in the sorting key keeps it fast. In this example, the Data Source is sorted on the `timestamp` column, so you can use batches of a fixed amount of time. In general, time is a good way to batch. The functions `fetch_table_streaming_query` and `fetch_table_streaming_endpoint` in the notebook work as generators. They should always be used in a `for` loop or as the input for another generator. You should process each batch as it arrives and discard unwanted fetched data. Only fetch the data you need in the processing. The idea here isn't to recreate a Data Source in the notebook, but to process each batch as it arrives and write less data to your DataFrame. ## Fetch data with the Query API¶ This guide uses the [requests library for Python](https://pypi.org/project/requests/) . The SQL query pulls in an hour less of data than the full Data Source. A DataFrame is created from the text part of the response. ##### DataFrame from the query API table_name = 'wiki' host = 'api.tinybird.co' format = 'CSVWithNames' time_column = 'toDateTime(timestamp)' date_end = 'toDateTime(1644754546)' s = requests.Session() s.headers['Authorization'] = f'Bearer {token}' URL = f'https://{host}/v0/sql' sql = f'select * from {table_name} where {time_column} <= {date_end}' params = {'q': sql + f" FORMAT {format}"} r = s.get(f"{URL}?{urlencode(params)}") df = pd.read_csv(StringIO(r.text)) Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ## Fetch data from an API Endpoint & parameters¶ This Endpoint node in the Pipe `endpoint_wiki` selects from the Data Source within a range of dates, using the parameters for `date_start` and `date_end`. ##### Endpoint wiki % SELECT * FROM wiki WHERE timestamp BETWEEN toInt64(toDateTime({{String(date_start, '2022-02-13 10:30:00')}})) AND toInt64(toDateTime({{String(date_end, '2022-02-13 11:00:00')}})) These parameters are passed in the call to the API Endpoint to select only the data within the range. A DataFrame is created from the text part of the response. ##### Dataframe from API Endpoint host = 'api.tinybird.co' api_endpoint = 'endpoint_wiki' format = 'csv' date_start = '2022-02-13 10:30:00' date_end = '2022-02-13 11:30:00' s = requests.Session() s.headers['Authorization'] = f'Bearer {token}' URL = f'https://{host}/v0/pipes/{api_endpoint}.{format}' params = {'date_start': date_start, 'date_end': date_end } r = s.get(f"{URL}?{urlencode(params)}") df = pd.read_csv(StringIO(r.text)) Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ## Fetch batches of data using the Query API¶ The function `fetch_table_streaming_query` in the notebook accepts more complex queries than a date range. Here you choose what you filter and sort by. This example reads in batches of 5 minutes to create a small DataFrame, which should then be processed, with the results of the processing appended to the final DataFrame. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fconsume-apis-in-a-notebook-1.png&w=3840&q=75) <-figcaption-> 5-minute batches of data using the index ##### DataFrames from batches returned by the Query API tinybird_stream = fetch_table_streaming_query(token, 'wiki', 60*5, 1644747337, 1644758146, sorting='timestamp', filters="type IN ['edit','new']", time_column="timestamp", host='api.tinybird.co') df_all=pd.DataFrame() for x in tinybird_stream: df_batch = pd.read_csv(StringIO(x)) # TO DO: process batch and discard fetched data df_proc=process_dataframe(df_batch) df_all = df_all.append(df_proc) # Careful: appending dfs means keeping a lot of data in memory ## Fetch batches of data from an API Endpoint & parameters¶ The function `fetch_table_streaming_endpoint` in the notebook sends a call to the API with parameters for the `batch size`, `start` and `end` dates, and, optionally, filters on the `bot` and `server_name` columns. This example reads in batches of 5 minutes to create a small DataFrame, which should then be processed, with the results of the processing appended to the final DataFrame. ‍The API Endpoint `wiki_stream_example` first selects data for the range of dates, then for the batch, and then applies the filters on column values. ##### API Endpoint wiki\_stream\_example % SELECT * from wiki --DATE RANGE WHERE timestamp BETWEEN toUInt64(toDateTime({{String(date_start, '2022-02-13 10:30:00', description="start")}})) AND toUInt64(toDateTime({{String(date_end, '2022-02-13 10:35:00', description="end")}})) --BATCH BEGIN AND timestamp BETWEEN toUInt64(toDateTime({{String(date_start, '2022-02-13 10:30:00', description="start")}}) + interval {{Int16(batch_no, 1, description="batch number")}} * {{Int16(batch_size, 10, description="size of the batch")}} second) --BATCH END AND toUInt64(toDateTime({{String(date_start, '2022-02-13 10:30:00', description="start")}}) + interval ({{Int16(batch_no, 1, description="batch number")}} + 1) * {{Int16(batch_size, 10, description="size of the batch")}} second) --FILTERS {% if defined(bot) %} AND bot = {{String(bot, description="is a bot")}} {% end %} {% if defined(server_name) %} AND server_name = {{String(server_name, description="server")}} {% end %} These parameters are passed in the call to the API Endpoint to select only the data for the batch. A DataFrame is created from the text part of the response. ##### DataFrames from batches from the API Endpoint tinybird_stream = fetch_table_streaming_endpoint(token, 'csv', 60*5, '2022-02-13 10:15:00', '2022-02-13 13:15:00', bot = False, server_name='en.wikipedia.org' ) df_all=pd.DataFrame() for x in tinybird_stream: df_batch = pd.read_csv(StringIO(x)) # TO DO: process batch and discard fetched data df_proc=process_dataframe(df_batch) df_all = df_all.append(df_proc) # Careful: appending dfs means keeping a lot of data in memory ## Next steps¶ - Explore more use cases for Tinybird in the[ Use Case Hub](https://www.tinybird.co/docs/docs/use-cases) . - Looking for other ways to integrate? Try[ consume APIs in a Next.js frontend](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-apis-nextjs) . --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/consume-apis-nextjs Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Consume APIs in a Next.js frontend with JWTs · Tinybird Docs" theme-color: "#171612" description: "In this guide, you'll learn how to generate self-signed JWTs from your backend, and call Tinybird APIs directly from your frontend, using Next.js." --- # Consume APIs in a Next.js frontend with JWTs¶ In this guide, you'll learn how to generate self-signed JWTs from your backend, and call Tinybird APIs directly from your frontend, using Next.js. JWTs are signed tokens that allow you to securely authorize and share data between your application and Tinybird. If you want to read more about JWTs, check out the [JWT.io](https://jwt.io/) website. You can view the [live demo](https://guide-nextjs-jwt-auth.vercel.app/) or browse the [GitHub repo (guide-nextjs-jwt-auth)](https://github.com/tinybirdco/guide-nextjs-jwt-auth). ## Prerequisites¶ This guide assumes that you have a Tinybird account, and you are familiar with creating a Tinybird Workspace and pushing resources to it. Make sure you understand the concept of Tinybird's [Static Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#what-should-i-use-tokens-for). You'll need a working familiarity with JWTs, JavaScript, and Next.js. ## Run the demo¶ These steps cover running the GitHub demo locally. [Skip to the next section](https://www.tinybird.co/docs/about:blank#understand-the-code) for a breakdown of the code. ### 1. Clone the GitHub repo¶ Clone the [GitHub repo (guide-nextjs-jwt-auth)](https://github.com/tinybirdco/guide-nextjs-jwt-auth) to your local machine. ### 2. Push Tinybird resources¶ The repo includes two sample Tinybird resources: - `events.datasource` : The Data Source for incoming events. - `top_airlines.pipe` : An API Endpoint giving a list of top 10 airlines by booking volume. Configure the [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/install) and `tb push` the resources to your Workspace. Alternatively, you can drag and drop the files onto the UI to upload them. ### 3. Generate some fake data¶ Use [Mockingbird](https://tbrd.co/mockingbird-nextjs-jwt-demo) to generate fake data for the `events` Data Source. Using this link ^ provides a pre-configured schema, but you will need to enter your Workspace admin Token and Host. When configured, scroll down and select `Start Generating!`. In the Tinybird UI, confirm that the `events` Data Source is successfully receiving data. ### 4. Install dependencies¶ Navigate to the cloned repo and install the dependencies with `npm install`. ### 5. Configure .env¶ First create a new file `.env.local` cp .env.example .env.local Copy your [Tinybird host](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) and admin Token (used as the `TINYBIRD_SIGNING_TOKEN` ) to the `.env.local` file: TINYBIRD_SIGNING_TOKEN="TINYBIRD_SIGNING_TOKEN>" # Use your Admin Token as the signing Token TINYBIRD_WORKSPACE="YOUR_WORKSPACE_ID" # The UUID of your Workspace NEXT_PUBLIC_TINYBIRD_HOST="YOUR_TINYBIRD_API_REGION e.g. https://api.tinybird.co" # Your regional API host Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ### Run the demo app¶ Run it locally: npm run dev Then open `localhost:3000` with your browser. ## Understand the code¶ This section breaks down the key parts of code from the example. ### .env¶ The `.env` file contains the environment variables used in the application. ##### .env file TINYBIRD_SIGNING_TOKEN="YOUR SIGNING TOKEN" TINYBIRD_WORKSPACE="YOUR WORKSPACE ID" NEXT_PUBLIC_TINYBIRD_HOST="YOUR API HOST e.g. https://api.tinybird.co" #### TINYBIRD\_SIGNING\_TOKEN¶ `TINYBIRD_SIGNING_TOKEN` is the token used to sign JWTs. **You must use your admin Token** . It is a shared secret between your application and Tinybird. Your application uses this Token to sign JWTs, and Tinybird uses it to verify the JWTs. It should be kept secret, as exposing it could allow unauthorized access to your Tinybird resources. It is best practice to store this in an environment variable instead of hardcoding it in your application. #### TINYBIRD\_WORKSPACE¶ `TINYBIRD_WORKSPACE` is the ID of your Workspace. It is used to identify the Workspace that the JWT is generated for. The Workspace ID is included inside the JWT payload. Workspace IDs are UUIDs and can be found using the CLI `tb workspace current` command or from the Tinybird UI. #### NEXT\_PUBLIC\_TINYBIRD\_HOST¶ `NEXT_PUBLIC_TINYBIRD_HOST` is the base URL of the Tinybird API. It is used to construct the URL for the Tinybird API Endpoints. You must use the correct URL for [your Tinybird region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) . The `NEXT_PUBLIC_` prefix is required for Next.js to expose the variable to the client side. ### token.ts¶ The `token.ts` file contains the logic to generate and sign JWTs. It uses the `jsonwebtoken` library to create the Token. ##### token.ts "use server"; import jwt from "jsonwebtoken"; const TINYBIRD_SIGNING_TOKEN = process.env.TINYBIRD_SIGNING_TOKEN ?? ""; const WORKSPACE_ID = process.env.TINYBIRD_WORKSPACE ?? ""; const PIPE_ID = "top_airlines"; export async function generateJWT() { const next10minutes = new Date(); next10minutes.setTime(next10minutes.getTime() + 1000 * 60 * 10); const payload = { workspace_id: WORKSPACE_ID, name: "my_demo_jwt", exp: Math.floor(next10minutes.getTime() / 1000), scopes: [ { type: "PIPES:READ", resource: PIPE_ID, }, ], }; return jwt.sign(payload, TINYBIRD_SIGNING_TOKEN, {noTimestamp: true}); } This code runs on the backend to generate JWTs without exposing secrets to the user. It pulls in the `TINYBIRD_SIGNING_TOKEN` and `WORKSPACE_ID` from the environment variables. As this example only exposes a single API Endpoint ( `top_airlines.pipe` ), the `PIPE_ID` is hardcoded to its deployed ID. If you had multiple API Endpoints, you would need to create an item in the `scopes` array for each one. The `generateJWT` function handles creation of the JWT. A JWT has various [required fields](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#jwt-payload). The `exp` field sets the expiration time of the JWT in the form a UTC timestamp. In this case, it's set to 10 minutes in the future. You can adjust this value to suit your needs. The `name` field is a human-readable name for the JWT. This value is only used for logging. The `scopes` field defines what the JWT can access. This is an array, which allows you create one JWT that can access multiple API Endpoints. In this case, you only have one API Endpoint. Under `scopes` , the `type` field is always `PIPES:READ` for reading data from a Pipe. The `resource` field is the ID or name of the Pipe you want to access. If required, you can also add `fixed_parameters` here to supply parameters to the API Endpoint. Finally, the payload is signed using the `jsonwebtoken` library and the `TINYBIRD_SIGNING_TOKEN`. ### useFetch.tsx¶ The `useFetch.tsx` file contains a custom React hook that fetches data from the Tinybird API using a JWT. It also handles refreshing the token if it expires. ##### useFetch.tsx import { generateJWT } from "@/server/token"; import { useState } from "react"; export function useFetcher() { const [token, setToken] = useState(""); const refreshToken = async () => { const newToken = await generateJWT(); setToken(newToken); return newToken; }; return async (url: string) => { let currentToken = token; if (!currentToken) { currentToken = await refreshToken(); } const response = await fetch(url + "?token=" + currentToken); if (response.status === 200) { return response.json(); } if (response.status === 403) { const newToken = await refreshToken(); return fetch(url + "?token=" + newToken).then((res) => res.json()); } }; } This code runs on the client side and is used to fetch data from the Tinybird API. It uses the `generateJWT` function from the `token.ts` file to get a JWT. The JWT is stored in the `token` state. Most importantly, it uses the standard `fetch` API to make requests to the Tinybird API. The JWT is passed as a `token` query parameter in the URL. If the request returns a `403` status code, the hook then calls `refreshToken` to get a new JWT and retries the request. However, note that this is a simple implementation and there are other reasons why a request might fail with a `403` status code (e.g., the JWT is invalid, the API Endpoint has been removed, etc.). ### page.tsx¶ The `page.tsx` file contains the main logic for the Next.js page. It is responsible for initiating the call to the Tinybird API Endpoints and rendering the data into a chart. ##### page.tsx "use client"; import { BarChart, Card, Subtitle, Text, Title } from "@tremor/react"; import useSWR from "swr"; import { getEndpointUrl } from "@/utils"; import { useFetcher } from "@/hooks/useFetch"; const REFRESH_INTERVAL_IN_MILLISECONDS = 5000; // five seconds export default function Dashboard() { const endpointUrl = getEndpointUrl(); const fetcher = useFetcher(); let top_airline, latency, errorMessage; const { data } = useSWR(endpointUrl, fetcher, { refreshInterval: REFRESH_INTERVAL_IN_MILLISECONDS, onError: (error) => (errorMessage = error), }); if (!data) return; if (data?.error) { errorMessage = data.error; return; } top_airline = data.data; latency = data.statistics?.elapsed; return ( Top airlines by bookings Ranked from highest to lowest {top_airline && ( )} {latency && Latency: {latency * 1000} ms} {errorMessage && (

Oops, something happens: {errorMessage}

Check your console for more information

)}
); } It uses [SWR](https://swr.vercel.app/) and the `useFetcher` hook from [useFetch.tsx](https://www.tinybird.co/docs/about:blank#usefetch-tsx) to fetch data from the Tinybird API. When the API Endpoint returns data, it's rendered as bar chart using the `BarChart` component from the `@tremor/react` library. ## Next steps¶ - Read the[ blog post on JWTs](https://www.tinybird.co/blog-posts/jwt-api-endpoints-public-beta) . - Explore more use cases that use this approach, like[ building a real-time, user-facing dashboard](https://www.tinybird.co/docs/docs/use-cases/user-facing-dashboards) . --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/reliable-scheduling-with-trigger Last update: 2025-01-08T08:08:08.000Z Content: --- title: "Reliable scheduling with Trigger.dev · Tinybird Docs" theme-color: "#171612" description: "Learn how to create complex, reliable scheduling with Trigger.dev" --- # Reliable scheduling with Trigger.dev¶ [Trigger.dev](https://trigger.dev/) is an open source background job platform. With Trigger.dev you can easily create, schedule, and manage background jobs using code. Read on to learn how to create complex, reliable scheduling with Trigger.dev. ## Before you start¶ Before you start, ensure: - You have a[ Trigger.dev account](https://trigger.dev/) . - You have a[ Tinybird Workspace](https://www.tinybird.co/) . ## Create your first trigger task¶ The [tinybird-trigger-tasks package](https://www.npmjs.com/package/@sdairs/tinybird-trigger-tasks) implements tasks for Tinybird Copy Pipes and the Query API. You can [find the source code in the @sdairs/tinybird-trigger repo](https://github.com/sdairs/tinybird-trigger). 1. Create a working directory, and run `npx trigger.dev@latest init` to connect the project to Trigger.dev. 2. Inside the `trigger` directory, install the npm package with `npm install @sdairs/tinybird-trigger-tasks` . 3. Create a new file called `myTask.ts` and add the following code: import { task } from "@trigger.dev/sdk/v3"; import { tinybirdCopyTask } from "@sdairs/tinybird-trigger-tasks"; export const exampleExecutor = task({ id: "example-executor", run: async (payload, { ctx }) => { console.log("Example executor task is running"); // Run a copy job const copyResult = await tinybirdCopyTask.triggerAndWait({ pipeId: }); console.log(copyResult); }, }); 1. Go to your Tinybird Workspace, and create a new Pipe. Use the following SQL: SELECT number + 1 AS value FROM numbers(100) 1. Name the Pipe `my_copy` , then select `Create Copy` from the actions menu. Follow the prompts to create the Copy Pipe. 2. Update `myTask.ts` , replacing `` with the name of your Pipe, `my_copy` in this case. 3. Create a `.env` file in your directory root. 4. Go to your Tinybird Workspace and copy the Admin Token, then add it to the `.env` file as follows: TINYBIRD_TOKEN=p.eyJ... 1. Run `npx trigger.dev@latest dev` to push the task to Trigger.dev. 2. Go to your Trigger.dev dashboard, and perform a test run to trigger the task and the Copy Pipe. 3. Go to your Tinybird Workspace and check the Copy Pipe results. ## See also¶ - [ Trigger.dev quick start](https://trigger.dev/docs/quick-start) - [ tinybird-trigger repo](https://github.com/sdairs/tinybird-trigger) - [ YouTube: Using Trigger.dev with Tinybird for code-first background job execution](https://www.youtube.com/watch?v=0TcQfcMrGNw) --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/serverless-analytics-api Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Handling data privacy in serverless analytics APIs · Tinybird Docs" theme-color: "#171612" description: "Creating an Analytics Dashboard where each user is able to access only certain parts of the data is really easy with Tinybird. You don't need to build anything specific from scratch. Tinybird is able to provide dynamic API Endpoints, including specific security requirements per-user." --- # Handling data privacy in serverless analytics APIs¶ Creating an analytics dashboard where each user is able to access only certain parts of the data is really easy with Tinybird. You don't need to build anything specific from scratch. Tinybird provides dynamic API Endpoints, including specific security requirements per-user. ## The serverless approach to real-time analytics¶ Let's assume you have just two components - the simplest possible stack: - ** A frontend application:** Code that runs in the browser. - ** A backend application:** Code that runs in the server and manages both the user authentication and the authorization. Very probably, the backend will also expose an API from where the frontend fetches the information needed. This guide covers the different workflows that will handle each user operation with the right permissions, by integrating your backend with [Static Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#what-should-i-use-tokens-for) in a very simple way. ## Create Tokens on user sign-up¶ The only thing you need (to ensure that your users have the right permissions on your data) is a created Tinybird [Static Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#what-should-i-use-tokens-for) every time you create a new user in your backend. ##### Creating a Token with filter scope TOKEN= curl -H "Authorization: Bearer $TOKEN" \ -d "name=user_692851_token" \ -d "scope=PIPES:READ:ecommerce_example" \ -d "scope=DATASOURCES:READ:events:user_id=692851" \ https://api.tinybird.co/v0/tokens/ Use a Token with the right scope. Replace `` with a Token whose [scope](https://www.tinybird.co/docs/docs/api-reference/token-api) is `TOKENS` or `ADMIN`. This Token will let a given user query their own transactions stored in an `events` Data Source and exposed in an `ecommerce_example` API Endpoint. Some other noteworthy things you can see here: - You can give a `name` to every Token you create. In this case, the name contains the `user_id` , so that it's easier to see what Token is assigned to each user. - You can assign as many scopes to each Token as you want, and `DATASOURCES:READ:datasource_name` and `PIPES:READ:pipe_name` can take an optional SQL filter (like this example does) to restrict the rows that queries authenticated with the Token will have access to. If everything runs successfully, your call will return JSON containing a Token with the specified scopes: ##### Creating a Token with filter scope: Response { "token": "p.eyJ1IjogImI2Yjc1MDExLWNkNGYtNGM5Ny1hMzQxLThhNDY0ZDUxMWYzNSIsICJpZCI6ICI0YTYzZDExZC0zNjg2LTQwN2EtOWY2My0wMzU2ZGE2NmU5YzQifQ.2QP1BRN6fNfgS8EMxqkbfKasDUD1tqzQoJXBafa5dWs", "scopes": [ { "type": "PIPES:READ", "resource": "ecommerce_example", "filter": "" }, { "type": "DATASOURCES:READ", "resource": "events", "filter": "user_id=692851" } ], "name": "user_692851_token" } All the Tokens you create are also visible in your [Workspace > Tokens page](https://app.tinybird.co/tokens) in the UI, where you can create, update and delete them. ## Modify Tokens when user permissions are changed¶ Imagine one of your users is removed from a group, which makes them lose some permissions on the data they can consume. Once that is reflected in your backend, you can [update the user admin Token](https://www.tinybird.co/docs/docs/api-reference/token-api#put--v0-tokens-(.+)) accordingly as follows: ##### Modify an existing Token TOKEN= USER_TOKEN= curl -X PUT \ -H "Authorization: Bearer $TOKEN" \ -d "name=user_692851_token" \ -d "scope=PIPES:READ:ecommerce_example" \ -d "scope=DATASOURCES:READ:events:user_id=692851 and event in ('buy', 'add_item_to_cart')" \ https://api.tinybird.co/v0/tokens/$USER_TOKEN Pass the Token you previously created as a path parameter. Replace `` by the value of `token` from the previous response, or [copy it from the UI](https://app.tinybird.co/tokens). In this example you'd be restricting the SQL filter of the `DATASOURCES:READ:events` scope to restrict the type of events the user will be able to read from the `events` Data Source. This is the response you'd see from the API: ##### Modify an existing Token: Response { "token": "p.eyJ1IjogImI2Yjc1MDExLWNkNGYtNGM5Ny1hMzQxLThhNDY0ZDUxMWYzNSIsICJpZCI6ICI0YTYzZDExZC0zNjg2LTQwN2EtOWY2My0wMzU2ZGE2NmU5YzQifQ.2QP1BRN6fNfgS8EMxqkbfKasDUD1tqzQoJXBafa5dWs", "scopes": [ { "type": "PIPES:READ", "resource": "ecommerce_example", "filter": "" }, { "type": "DATASOURCES:READ", "resource": "events", "filter": "user_id=692851 and event in ('buy', 'add_item_to_cart')" } ], "name": "user_692851_token" } ## Delete Tokens after user deletion¶ Whenever a user is removed from your system, you should also [remove the Token from Tinybird](https://www.tinybird.co/docs/docs/api-reference/token-api#delete--v0-tokens-(.+)) . That will make things easier for you in the future. ##### Remove a token TOKEN= USER_TOKEN= curl -X DELETE \ -H "Authorization: Bearer $TOKEN" \ https://api.tinybird.co/v0/tokens/$USER_TOKEN If the Token is successfully deleted, this request will respond with no content and a 204 status code. ## Refresh Tokens¶ It's a good practice to change Tokens from time to time, so you can automate this in your backend as well. Refreshing a Token requires executing this request for every one of your users: ##### Refresh a token TOKEN= USER_TOKEN= curl -X POST \ -H "Authorization: Bearer $TOKEN" \ https://api.tinybird.co/v0/tokens/$USER_TOKEN/refresh ## Next steps¶ - Learn more about the[ Tokens API](https://www.tinybird.co/docs/docs/api-reference/token-api) . - Understand the concept of[ Static Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#what-should-i-use-tokens-for) . --- URL: https://www.tinybird.co/docs/publish/api-endpoints/guides/share-endpoint-documentation Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Share API Endpoints documentation · Tinybird Docs" theme-color: "#171612" description: "In this guide you'll learn how to share your Tinybird API Endpoint documentation with development teams." --- # Share Tinybird API Endpoint documentation¶ In this guide, you'll learn how to share your Tinybird API Endpoint documentation with development teams. ## The Tinybird API Endpoint page¶ When you publish an API Endpoint, Tinybird generates a documentation page for you that is ready to share and OpenAPI-compatible (v3.0). It contains your API Endpoint description, information about the dynamic parameters you can use when querying this Endpoint, and code snippets for quickly integrating your API in 3rd party applications. To share your published API Endpoint, navigate to the "Create Chart" button (top right of the UI) > "Share this API Endpoint" modal: ## Use Static Tokens to define API Endpoint subsets¶ Tinybird authentication is based on [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) which contain different scopes for specific resources. For example, a Token lets you read from one or many API Endpoints, or get write permissions for a particular Data Source. If you take a closer look at the URLs generated for sharing a public API Endpoint page, you'll see that after the Endpoint ID, it includes a Token parameter. This means that this page is only accessible if the Token provided in the URL has read permissions for it: https://api.tinybird.co/endpoint/t_bdcad2252e794c6573e21e7e?token= For security, Tinybird automatically generates a read-only Token when sharing a public API Endpoint page for the first time. If you don't explicitly use it, your Admin Token won't ever get exposed. ### The API Endpoints list page¶ Tinybird also allows you to render the API Endpoints information for a given Token. https://app.tinybird.co///endpoints?token= Enter the URL above (with your Token and the provider and region where the API Endpoint is published) into the browser, and it'll return a list that shows all API Endpoints that this Token can read from. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fsharing-endpoints-documentation-with-development-teams-2.png&w=3840&q=75) <-figcaption-> The API Endpoints list page is extremely useful for sharing your API Endpoint documentation with development teams When integrating your API Endpoint in your applications it's highly recommend that you manage dedicated Tokens. The easiest way is creating a Token for every application environment, so that you can also track the different requests to your API Endpoints by application, and choose which API Endpoints are accessible for them. Once you do that, you can share auto-generated documentation with ease, without compromising your data privacy and security. API Endpoint docs pages include a read Token by default. In the "Share this API Endpoint" modal, you can also see public URLs for every Token with read permissions for your Pipe. ## Browse your docs in Swagger¶ As mentioned above, all Tinybird's documentation is compatible with OpenAPI 3.0 and accessible via API. A quick way of generating documentation in Swagger is navigating to the "Create Chart" button > "Share this API Endpoint" modal > "OpenAPI 3.0" tab, copying the "Shareable link" URL, and using it in your preferred Swagger installation. <-figure-> ![](/docs/_next/image?url=%2Fdocs%2Fimg%2Fsharing-endpoints-documentation-with-development-teams-3.png&w=3840&q=75) <-figcaption-> You can generate as many URLs as you need by using different Tokens If you use a Token with permissions for more than one API Endpoint, the Swagger documentation will contain information about all the API Endpoints at once. ## Next steps¶ - You've got Endpoints, now make them pretty: Use[ Tinybird Charts](https://www.tinybird.co/docs/docs/publish/charts) . - Learn how to[ monitor and analyze your API performance](https://www.tinybird.co/docs/docs/monitoring/analyze-endpoints-performance) . --- URL: https://www.tinybird.co/docs/publish/api-endpoints/list-of-errors Last update: 2024-11-05T10:29:52.000Z Content: --- title: "List of API Endpoint database errors · Tinybird Docs" theme-color: "#171612" description: "The following list contains all internal database errors that an API Endpoint might return, and their numbers." --- # List of internal database errors¶ API Endpoint responses have an additional HTTP header, `X-DB-Exception-Code` , where you can check the internal database error, reported as a stringified number. The following list contains all internal database errors and their numbers: - `UNSUPPORTED_METHOD = "1"` - `UNSUPPORTED_PARAMETER = "2"` - `UNEXPECTED_END_OF_FILE = "3"` - `EXPECTED_END_OF_FILE = "4"` - `CANNOT_PARSE_TEXT = "6"` - `INCORRECT_NUMBER_OF_COLUMNS = "7"` - `THERE_IS_NO_COLUMN = "8"` - `SIZES_OF_COLUMNS_DOESNT_MATCH = "9"` - `NOT_FOUND_COLUMN_IN_BLOCK = "10"` - `POSITION_OUT_OF_BOUND = "11"` - `PARAMETER_OUT_OF_BOUND = "12"` - `SIZES_OF_COLUMNS_IN_TUPLE_DOESNT_MATCH = "13"` - `DUPLICATE_COLUMN = "15"` - `NO_SUCH_COLUMN_IN_TABLE = "16"` - `DELIMITER_IN_STRING_LITERAL_DOESNT_MATCH = "17"` - `CANNOT_INSERT_ELEMENT_INTO_CONSTANT_COLUMN = "18"` - `SIZE_OF_FIXED_STRING_DOESNT_MATCH = "19"` - `NUMBER_OF_COLUMNS_DOESNT_MATCH = "20"` - `CANNOT_READ_ALL_DATA_FROM_TAB_SEPARATED_INPUT = "21"` - `CANNOT_PARSE_ALL_VALUE_FROM_TAB_SEPARATED_INPUT = "22"` - `CANNOT_READ_FROM_ISTREAM = "23"` - `CANNOT_WRITE_TO_OSTREAM = "24"` - `CANNOT_PARSE_ESCAPE_SEQUENCE = "25"` - `CANNOT_PARSE_QUOTED_STRING = "26"` - `CANNOT_PARSE_INPUT_ASSERTION_FAILED = "27"` - `CANNOT_PRINT_FLOAT_OR_DOUBLE_NUMBER = "28"` - `CANNOT_PRINT_INTEGER = "29"` - `CANNOT_READ_SIZE_OF_COMPRESSED_CHUNK = "30"` - `CANNOT_READ_COMPRESSED_CHUNK = "31"` - `ATTEMPT_TO_READ_AFTER_EOF = "32"` - `CANNOT_READ_ALL_DATA = "33"` - `TOO_MANY_ARGUMENTS_FOR_FUNCTION = "34"` - `TOO_FEW_ARGUMENTS_FOR_FUNCTION = "35"` - `BAD_ARGUMENTS = "36"` - `UNKNOWN_ELEMENT_IN_AST = "37"` - `CANNOT_PARSE_DATE = "38"` - `TOO_LARGE_SIZE_COMPRESSED = "39"` - `CHECKSUM_DOESNT_MATCH = "40"` - `CANNOT_PARSE_DATETIME = "41"` - `NUMBER_OF_ARGUMENTS_DOESNT_MATCH = "42"` - `ILLEGAL_TYPE_OF_ARGUMENT = "43"` - `ILLEGAL_COLUMN = "44"` - `ILLEGAL_NUMBER_OF_RESULT_COLUMNS = "45"` - `UNKNOWN_FUNCTION = "46"` - `UNKNOWN_IDENTIFIER = "47"` - `NOT_IMPLEMENTED = "48"` - `LOGICAL_ERROR = "49"` - `UNKNOWN_TYPE = "50"` - `EMPTY_LIST_OF_COLUMNS_QUERIED = "51"` - `COLUMN_QUERIED_MORE_THAN_ONCE = "52"` - `TYPE_MISMATCH = "53"` - `STORAGE_DOESNT_ALLOW_PARAMETERS = "54"` - `STORAGE_REQUIRES_PARAMETER = "55"` - `UNKNOWN_STORAGE = "56"` - `TABLE_ALREADY_EXISTS = "57"` - `TABLE_METADATA_ALREADY_EXISTS = "58"` - `ILLEGAL_TYPE_OF_COLUMN_FOR_FILTER = "59"` - `UNKNOWN_TABLE = "60"` - `ONLY_FILTER_COLUMN_IN_BLOCK = "61"` - `SYNTAX_ERROR = "62"` - `UNKNOWN_AGGREGATE_FUNCTION = "63"` - `CANNOT_READ_AGGREGATE_FUNCTION_FROM_TEXT = "64"` - `CANNOT_WRITE_AGGREGATE_FUNCTION_AS_TEXT = "65"` - `NOT_A_COLUMN = "66"` - `ILLEGAL_KEY_OF_AGGREGATION = "67"` - `CANNOT_GET_SIZE_OF_FIELD = "68"` - `ARGUMENT_OUT_OF_BOUND = "69"` - `CANNOT_CONVERT_TYPE = "70"` - `CANNOT_WRITE_AFTER_END_OF_BUFFER = "71"` - `CANNOT_PARSE_NUMBER = "72"` - `UNKNOWN_FORMAT = "73"` - `CANNOT_READ_FROM_FILE_DESCRIPTOR = "74"` - `CANNOT_WRITE_TO_FILE_DESCRIPTOR = "75"` - `CANNOT_OPEN_FILE = "76"` - `CANNOT_CLOSE_FILE = "77"` - `UNKNOWN_TYPE_OF_QUERY = "78"` - `INCORRECT_FILE_NAME = "79"` - `INCORRECT_QUERY = "80"` - `UNKNOWN_DATABASE = "81"` - `DATABASE_ALREADY_EXISTS = "82"` - `DIRECTORY_DOESNT_EXIST = "83"` - `DIRECTORY_ALREADY_EXISTS = "84"` - `FORMAT_IS_NOT_SUITABLE_FOR_INPUT = "85"` - `RECEIVED_ERROR_FROM_REMOTE_IO_SERVER = "86"` - `CANNOT_SEEK_THROUGH_FILE = "87"` - `CANNOT_TRUNCATE_FILE = "88"` - `UNKNOWN_COMPRESSION_METHOD = "89"` - `EMPTY_LIST_OF_COLUMNS_PASSED = "90"` - `SIZES_OF_MARKS_FILES_ARE_INCONSISTENT = "91"` - `EMPTY_DATA_PASSED = "92"` - `UNKNOWN_AGGREGATED_DATA_VARIANT = "93"` - `CANNOT_MERGE_DIFFERENT_AGGREGATED_DATA_VARIANTS = "94"` - `CANNOT_READ_FROM_SOCKET = "95"` - `CANNOT_WRITE_TO_SOCKET = "96"` - `CANNOT_READ_ALL_DATA_FROM_CHUNKED_INPUT = "97"` - `CANNOT_WRITE_TO_EMPTY_BLOCK_OUTPUT_STREAM = "98"` - `UNKNOWN_PACKET_FROM_CLIENT = "99"` - `UNKNOWN_PACKET_FROM_SERVER = "100"` - `UNEXPECTED_PACKET_FROM_CLIENT = "101"` - `UNEXPECTED_PACKET_FROM_SERVER = "102"` - `RECEIVED_DATA_FOR_WRONG_QUERY_ID = "103"` - `TOO_SMALL_BUFFER_SIZE = "104"` - `CANNOT_READ_HISTORY = "105"` - `CANNOT_APPEND_HISTORY = "106"` - `FILE_DOESNT_EXIST = "107"` - `NO_DATA_TO_INSERT = "108"` - `CANNOT_BLOCK_SIGNAL = "109"` - `CANNOT_UNBLOCK_SIGNAL = "110"` - `CANNOT_MANIPULATE_SIGSET = "111"` - `CANNOT_WAIT_FOR_SIGNAL = "112"` - `THERE_IS_NO_SESSION = "113"` - `CANNOT_CLOCK_GETTIME = "114"` - `UNKNOWN_SETTING = "115"` - `THERE_IS_NO_DEFAULT_VALUE = "116"` - `INCORRECT_DATA = "117"` - `ENGINE_REQUIRED = "119"` - `CANNOT_INSERT_VALUE_OF_DIFFERENT_SIZE_INTO_TUPLE = "120"` - `UNSUPPORTED_JOIN_KEYS = "121"` - `INCOMPATIBLE_COLUMNS = "122"` - `UNKNOWN_TYPE_OF_AST_NODE = "123"` - `INCORRECT_ELEMENT_OF_SET = "124"` - `INCORRECT_RESULT_OF_SCALAR_SUBQUERY = "125"` - `CANNOT_GET_RETURN_TYPE = "126"` - `ILLEGAL_INDEX = "127"` - `TOO_LARGE_ARRAY_SIZE = "128"` - `FUNCTION_IS_SPECIAL = "129"` - `CANNOT_READ_ARRAY_FROM_TEXT = "130"` - `TOO_LARGE_STRING_SIZE = "131"` - `AGGREGATE_FUNCTION_DOESNT_ALLOW_PARAMETERS = "133"` - `PARAMETERS_TO_AGGREGATE_FUNCTIONS_MUST_BE_LITERALS = "134"` - `ZERO_ARRAY_OR_TUPLE_INDEX = "135"` - `UNKNOWN_ELEMENT_IN_CONFIG = "137"` - `EXCESSIVE_ELEMENT_IN_CONFIG = "138"` - `NO_ELEMENTS_IN_CONFIG = "139"` - `ALL_REQUESTED_COLUMNS_ARE_MISSING = "140"` - `SAMPLING_NOT_SUPPORTED = "141"` - `NOT_FOUND_NODE = "142"` - `FOUND_MORE_THAN_ONE_NODE = "143"` - `FIRST_DATE_IS_BIGGER_THAN_LAST_DATE = "144"` - `UNKNOWN_OVERFLOW_MODE = "145"` - `QUERY_SECTION_DOESNT_MAKE_SENSE = "146"` - `NOT_FOUND_FUNCTION_ELEMENT_FOR_AGGREGATE = "147"` - `NOT_FOUND_RELATION_ELEMENT_FOR_CONDITION = "148"` - `NOT_FOUND_RHS_ELEMENT_FOR_CONDITION = "149"` - `EMPTY_LIST_OF_ATTRIBUTES_PASSED = "150"` - `INDEX_OF_COLUMN_IN_SORT_CLAUSE_IS_OUT_OF_RANGE = "151"` - `UNKNOWN_DIRECTION_OF_SORTING = "152"` - `ILLEGAL_DIVISION = "153"` - `AGGREGATE_FUNCTION_NOT_APPLICABLE = "154"` - `UNKNOWN_RELATION = "155"` - `DICTIONARIES_WAS_NOT_LOADED = "156"` - `ILLEGAL_OVERFLOW_MODE = "157"` - `TOO_MANY_ROWS = "158"` - `TIMEOUT_EXCEEDED = "159"` - `TOO_SLOW = "160"` - `TOO_MANY_COLUMNS = "161"` - `TOO_DEEP_SUBQUERIES = "162"` - `TOO_DEEP_PIPELINE = "163"` - `READONLY = "164"` - `TOO_MANY_TEMPORARY_COLUMNS = "165"` - `TOO_MANY_TEMPORARY_NON_CONST_COLUMNS = "166"` - `TOO_DEEP_AST = "167"` - `TOO_BIG_AST = "168"` - `BAD_TYPE_OF_FIELD = "169"` - `BAD_GET = "170"` - `CANNOT_CREATE_DIRECTORY = "172"` - `CANNOT_ALLOCATE_MEMORY = "173"` - `CYCLIC_ALIASES = "174"` - `CHUNK_NOT_FOUND = "176"` - `DUPLICATE_CHUNK_NAME = "177"` - `MULTIPLE_ALIASES_FOR_EXPRESSION = "178"` - `MULTIPLE_EXPRESSIONS_FOR_ALIAS = "179"` - `THERE_IS_NO_PROFILE = "180"` - `ILLEGAL_FINAL = "181"` - `ILLEGAL_PREWHERE = "182"` - `UNEXPECTED_EXPRESSION = "183"` - `ILLEGAL_AGGREGATION = "184"` - `UNSUPPORTED_MYISAM_BLOCK_TYPE = "185"` - `UNSUPPORTED_COLLATION_LOCALE = "186"` - `COLLATION_COMPARISON_FAILED = "187"` - `UNKNOWN_ACTION = "188"` - `TABLE_MUST_NOT_BE_CREATED_MANUALLY = "189"` - `SIZES_OF_ARRAYS_DOESNT_MATCH = "190"` - `SET_SIZE_LIMIT_EXCEEDED = "191"` - `UNKNOWN_USER = "192"` - `WRONG_PASSWORD = "193"` - `REQUIRED_PASSWORD = "194"` - `IP_ADDRESS_NOT_ALLOWED = "195"` - `UNKNOWN_ADDRESS_PATTERN_TYPE = "196"` - `SERVER_REVISION_IS_TOO_OLD = "197"` - `DNS_ERROR = "198"` - `UNKNOWN_QUOTA = "199"` - `QUOTA_DOESNT_ALLOW_KEYS = "200"` - `QUOTA_EXCEEDED = "201"` - `TOO_MANY_SIMULTANEOUS_QUERIES = "202"` - `NO_FREE_CONNECTION = "203"` - `CANNOT_FSYNC = "204"` - `NESTED_TYPE_TOO_DEEP = "205"` - `ALIAS_REQUIRED = "206"` - `AMBIGUOUS_IDENTIFIER = "207"` - `EMPTY_NESTED_TABLE = "208"` - `SOCKET_TIMEOUT = "209"` - `NETWORK_ERROR = "210"` - `EMPTY_QUERY = "211"` - `UNKNOWN_LOAD_BALANCING = "212"` - `UNKNOWN_TOTALS_MODE = "213"` - `CANNOT_STATVFS = "214"` - `NOT_AN_AGGREGATE = "215"` - `QUERY_WITH_SAME_ID_IS_ALREADY_RUNNING = "216"` - `CLIENT_HAS_CONNECTED_TO_WRONG_PORT = "217"` - `TABLE_IS_DROPPED = "218"` - `DATABASE_NOT_EMPTY = "219"` - `DUPLICATE_INTERSERVER_IO_ENDPOINT = "220"` - `NO_SUCH_INTERSERVER_IO_ENDPOINT = "221"` - `ADDING_REPLICA_TO_NON_EMPTY_TABLE = "222"` - `UNEXPECTED_AST_STRUCTURE = "223"` - `REPLICA_IS_ALREADY_ACTIVE = "224"` - `NO_ZOOKEEPER = "225"` - `NO_FILE_IN_DATA_PART = "226"` - `UNEXPECTED_FILE_IN_DATA_PART = "227"` - `BAD_SIZE_OF_FILE_IN_DATA_PART = "228"` - `QUERY_IS_TOO_LARGE = "229"` - `NOT_FOUND_EXPECTED_DATA_PART = "230"` - `TOO_MANY_UNEXPECTED_DATA_PARTS = "231"` - `NO_SUCH_DATA_PART = "232"` - `BAD_DATA_PART_NAME = "233"` - `NO_REPLICA_HAS_PART = "234"` - `DUPLICATE_DATA_PART = "235"` - `ABORTED = "236"` - `NO_REPLICA_NAME_GIVEN = "237"` - `FORMAT_VERSION_TOO_OLD = "238"` - `CANNOT_MUNMAP = "239"` - `CANNOT_MREMAP = "240"` - `MEMORY_LIMIT_EXCEEDED = "241"` - `TABLE_IS_READ_ONLY = "242"` - `NOT_ENOUGH_SPACE = "243"` - `UNEXPECTED_ZOOKEEPER_ERROR = "244"` - `CORRUPTED_DATA = "246"` - `INCORRECT_MARK = "247"` - `INVALID_PARTITION_VALUE = "248"` - `NOT_ENOUGH_BLOCK_NUMBERS = "250"` - `NO_SUCH_REPLICA = "251"` - `TOO_MANY_PARTS = "252"` - `REPLICA_IS_ALREADY_EXIST = "253"` - `NO_ACTIVE_REPLICAS = "254"` - `TOO_MANY_RETRIES_TO_FETCH_PARTS = "255"` - `PARTITION_ALREADY_EXISTS = "256"` - `PARTITION_DOESNT_EXIST = "257"` - `UNION_ALL_RESULT_STRUCTURES_MISMATCH = "258"` - `CLIENT_OUTPUT_FORMAT_SPECIFIED = "260"` - `UNKNOWN_BLOCK_INFO_FIELD = "261"` - `BAD_COLLATION = "262"` - `CANNOT_COMPILE_CODE = "263"` - `INCOMPATIBLE_TYPE_OF_JOIN = "264"` - `NO_AVAILABLE_REPLICA = "265"` - `MISMATCH_REPLICAS_DATA_SOURCES = "266"` - `STORAGE_DOESNT_SUPPORT_PARALLEL_REPLICAS = "267"` - `CPUID_ERROR = "268"` - `INFINITE_LOOP = "269"` - `CANNOT_COMPRESS = "270"` - `CANNOT_DECOMPRESS = "271"` - `CANNOT_IO_SUBMIT = "272"` - `CANNOT_IO_GETEVENTS = "273"` - `AIO_READ_ERROR = "274"` - `AIO_WRITE_ERROR = "275"` - `INDEX_NOT_USED = "277"` - `ALL_CONNECTION_TRIES_FAILED = "279"` - `NO_AVAILABLE_DATA = "280"` - `DICTIONARY_IS_EMPTY = "281"` - `INCORRECT_INDEX = "282"` - `UNKNOWN_DISTRIBUTED_PRODUCT_MODE = "283"` - `WRONG_GLOBAL_SUBQUERY = "284"` - `TOO_FEW_LIVE_REPLICAS = "285"` - `UNSATISFIED_QUORUM_FOR_PREVIOUS_WRITE = "286"` - `UNKNOWN_FORMAT_VERSION = "287"` - `DISTRIBUTED_IN_JOIN_SUBQUERY_DENIED = "288"` - `REPLICA_IS_NOT_IN_QUORUM = "289"` - `LIMIT_EXCEEDED = "290"` - `DATABASE_ACCESS_DENIED = "291"` - `MONGODB_CANNOT_AUTHENTICATE = "293"` - `INVALID_BLOCK_EXTRA_INFO = "294"` - `RECEIVED_EMPTY_DATA = "295"` - `NO_REMOTE_SHARD_FOUND = "296"` - `SHARD_HAS_NO_CONNECTIONS = "297"` - `CANNOT_PIPE = "298"` - `CANNOT_FORK = "299"` - `CANNOT_DLSYM = "300"` - `CANNOT_CREATE_CHILD_PROCESS = "301"` - `CHILD_WAS_NOT_EXITED_NORMALLY = "302"` - `CANNOT_SELECT = "303"` - `CANNOT_WAITPID = "304"` - `TABLE_WAS_NOT_DROPPED = "305"` - `TOO_DEEP_RECURSION = "306"` - `TOO_MANY_BYTES = "307"` - `UNEXPECTED_NODE_IN_ZOOKEEPER = "308"` - `FUNCTION_CANNOT_HAVE_PARAMETERS = "309"` - `INVALID_SHARD_WEIGHT = "317"` - `INVALID_CONFIG_PARAMETER = "318"` - `UNKNOWN_STATUS_OF_INSERT = "319"` - `VALUE_IS_OUT_OF_RANGE_OF_DATA_TYPE = "321"` - `BARRIER_TIMEOUT = "335"` - `UNKNOWN_DATABASE_ENGINE = "336"` - `DDL_GUARD_IS_ACTIVE = "337"` - `UNFINISHED = "341"` - `METADATA_MISMATCH = "342"` - `SUPPORT_IS_DISABLED = "344"` - `TABLE_DIFFERS_TOO_MUCH = "345"` - `CANNOT_CONVERT_CHARSET = "346"` - `CANNOT_LOAD_CONFIG = "347"` - `CANNOT_INSERT_NULL_IN_ORDINARY_COLUMN = "349"` - `INCOMPATIBLE_SOURCE_TABLES = "350"` - `AMBIGUOUS_TABLE_NAME = "351"` - `AMBIGUOUS_COLUMN_NAME = "352"` - `INDEX_OF_POSITIONAL_ARGUMENT_IS_OUT_OF_RANGE = "353"` - `ZLIB_INFLATE_FAILED = "354"` - `ZLIB_DEFLATE_FAILED = "355"` - `BAD_LAMBDA = "356"` - `RESERVED_IDENTIFIER_NAME = "357"` - `INTO_OUTFILE_NOT_ALLOWED = "358"` - `TABLE_SIZE_EXCEEDS_MAX_DROP_SIZE_LIMIT = "359"` - `CANNOT_CREATE_CHARSET_CONVERTER = "360"` - `SEEK_POSITION_OUT_OF_BOUND = "361"` - `CURRENT_WRITE_BUFFER_IS_EXHAUSTED = "362"` - `CANNOT_CREATE_IO_BUFFER = "363"` - `RECEIVED_ERROR_TOO_MANY_REQUESTS = "364"` - `SIZES_OF_NESTED_COLUMNS_ARE_INCONSISTENT = "366"` - `TOO_MANY_FETCHES = "367"` - `ALL_REPLICAS_ARE_STALE = "369"` - `DATA_TYPE_CANNOT_BE_USED_IN_TABLES = "370"` - `INCONSISTENT_CLUSTER_DEFINITION = "371"` - `SESSION_NOT_FOUND = "372"` - `SESSION_IS_LOCKED = "373"` - `INVALID_SESSION_TIMEOUT = "374"` - `CANNOT_DLOPEN = "375"` - `CANNOT_PARSE_UUID = "376"` - `ILLEGAL_SYNTAX_FOR_DATA_TYPE = "377"` - `DATA_TYPE_CANNOT_HAVE_ARGUMENTS = "378"` - `UNKNOWN_STATUS_OF_DISTRIBUTED_DDL_TASK = "379"` - `CANNOT_KILL = "380"` - `HTTP_LENGTH_REQUIRED = "381"` - `CANNOT_LOAD_CATBOOST_MODEL = "382"` - `CANNOT_APPLY_CATBOOST_MODEL = "383"` - `PART_IS_TEMPORARILY_LOCKED = "384"` - `MULTIPLE_STREAMS_REQUIRED = "385"` - `NO_COMMON_TYPE = "386"` - `DICTIONARY_ALREADY_EXISTS = "387"` - `CANNOT_ASSIGN_OPTIMIZE = "388"` - `INSERT_WAS_DEDUPLICATED = "389"` - `CANNOT_GET_CREATE_TABLE_QUERY = "390"` - `EXTERNAL_LIBRARY_ERROR = "391"` - `QUERY_IS_PROHIBITED = "392"` - `THERE_IS_NO_QUERY = "393"` - `QUERY_WAS_CANCELLED = "394"` - `FUNCTION_THROW_IF_VALUE_IS_NON_ZERO = "395"` - `TOO_MANY_ROWS_OR_BYTES = "396"` - `QUERY_IS_NOT_SUPPORTED_IN_MATERIALIZED_VIEW = "397"` - `UNKNOWN_MUTATION_COMMAND = "398"` - `FORMAT_IS_NOT_SUITABLE_FOR_OUTPUT = "399"` - `CANNOT_STAT = "400"` - `FEATURE_IS_NOT_ENABLED_AT_BUILD_TIME = "401"` - `CANNOT_IOSETUP = "402"` - `INVALID_JOIN_ON_EXPRESSION = "403"` - `BAD_ODBC_CONNECTION_STRING = "404"` - `PARTITION_SIZE_EXCEEDS_MAX_DROP_SIZE_LIMIT = "405"` - `TOP_AND_LIMIT_TOGETHER = "406"` - `DECIMAL_OVERFLOW = "407"` - `BAD_REQUEST_PARAMETER = "408"` - `EXTERNAL_EXECUTABLE_NOT_FOUND = "409"` - `EXTERNAL_SERVER_IS_NOT_RESPONDING = "410"` - `PTHREAD_ERROR = "411"` - `NETLINK_ERROR = "412"` - `CANNOT_SET_SIGNAL_HANDLER = "413"` - `ALL_REPLICAS_LOST = "415"` - `REPLICA_STATUS_CHANGED = "416"` - `EXPECTED_ALL_OR_ANY = "417"` - `UNKNOWN_JOIN = "418"` - `MULTIPLE_ASSIGNMENTS_TO_COLUMN = "419"` - `CANNOT_UPDATE_COLUMN = "420"` - `CANNOT_ADD_DIFFERENT_AGGREGATE_STATES = "421"` - `UNSUPPORTED_URI_SCHEME = "422"` - `CANNOT_GETTIMEOFDAY = "423"` - `CANNOT_LINK = "424"` - `SYSTEM_ERROR = "425"` - `CANNOT_COMPILE_REGEXP = "427"` - `UNKNOWN_LOG_LEVEL = "428"` - `FAILED_TO_GETPWUID = "429"` - `MISMATCHING_USERS_FOR_PROCESS_AND_DATA = "430"` - `ILLEGAL_SYNTAX_FOR_CODEC_TYPE = "431"` - `UNKNOWN_CODEC = "432"` - `ILLEGAL_CODEC_PARAMETER = "433"` - `CANNOT_PARSE_PROTOBUF_SCHEMA = "434"` - `NO_COLUMN_SERIALIZED_TO_REQUIRED_PROTOBUF_FIELD = "435"` - `PROTOBUF_BAD_CAST = "436"` - `PROTOBUF_FIELD_NOT_REPEATED = "437"` - `DATA_TYPE_CANNOT_BE_PROMOTED = "438"` - `CANNOT_SCHEDULE_TASK = "439"` - `INVALID_LIMIT_EXPRESSION = "440"` - `CANNOT_PARSE_DOMAIN_VALUE_FROM_STRING = "441"` - `BAD_DATABASE_FOR_TEMPORARY_TABLE = "442"` - `NO_COLUMNS_SERIALIZED_TO_PROTOBUF_FIELDS = "443"` - `UNKNOWN_PROTOBUF_FORMAT = "444"` - `CANNOT_MPROTECT = "445"` - `FUNCTION_NOT_ALLOWED = "446"` - `HYPERSCAN_CANNOT_SCAN_TEXT = "447"` - `BROTLI_READ_FAILED = "448"` - `BROTLI_WRITE_FAILED = "449"` - `BAD_TTL_EXPRESSION = "450"` - `BAD_TTL_FILE = "451"` - `SETTING_CONSTRAINT_VIOLATION = "452"` - `MYSQL_CLIENT_INSUFFICIENT_CAPABILITIES = "453"` - `OPENSSL_ERROR = "454"` - `SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY = "455"` - `UNKNOWN_QUERY_PARAMETER = "456"` - `BAD_QUERY_PARAMETER = "457"` - `CANNOT_UNLINK = "458"` - `CANNOT_SET_THREAD_PRIORITY = "459"` - `CANNOT_CREATE_TIMER = "460"` - `CANNOT_SET_TIMER_PERIOD = "461"` - `CANNOT_DELETE_TIMER = "462"` - `CANNOT_FCNTL = "463"` - `CANNOT_PARSE_ELF = "464"` - `CANNOT_PARSE_DWARF = "465"` - `INSECURE_PATH = "466"` - `CANNOT_PARSE_BOOL = "467"` - `CANNOT_PTHREAD_ATTR = "468"` - `VIOLATED_CONSTRAINT = "469"` - `QUERY_IS_NOT_SUPPORTED_IN_LIVE_VIEW = "470"` - `INVALID_SETTING_VALUE = "471"` - `READONLY_SETTING = "472"` - `DEADLOCK_AVOIDED = "473"` - `INVALID_TEMPLATE_FORMAT = "474"` - `INVALID_WITH_FILL_EXPRESSION = "475"` - `WITH_TIES_WITHOUT_ORDER_BY = "476"` - `INVALID_USAGE_OF_INPUT = "477"` - `UNKNOWN_POLICY = "478"` - `UNKNOWN_DISK = "479"` - `UNKNOWN_PROTOCOL = "480"` - `PATH_ACCESS_DENIED = "481"` - `DICTIONARY_ACCESS_DENIED = "482"` - `TOO_MANY_REDIRECTS = "483"` - `INTERNAL_REDIS_ERROR = "484"` - `SCALAR_ALREADY_EXISTS = "485"` - `CANNOT_GET_CREATE_DICTIONARY_QUERY = "487"` - `UNKNOWN_DICTIONARY = "488"` - `INCORRECT_DICTIONARY_DEFINITION = "489"` - `CANNOT_FORMAT_DATETIME = "490"` - `UNACCEPTABLE_URL = "491"` - `ACCESS_ENTITY_NOT_FOUND = "492"` - `ACCESS_ENTITY_ALREADY_EXISTS = "493"` - `ACCESS_ENTITY_FOUND_DUPLICATES = "494"` - `ACCESS_STORAGE_READONLY = "495"` - `QUOTA_REQUIRES_CLIENT_KEY = "496"` - `ACCESS_DENIED = "497"` - `LIMIT_BY_WITH_TIES_IS_NOT_SUPPORTED = "498"` - S3_ERROR = "499" - `AZURE_BLOB_STORAGE_ERROR = "500"` - `CANNOT_CREATE_DATABASE = "501"` - `CANNOT_SIGQUEUE = "502"` - `AGGREGATE_FUNCTION_THROW = "503"` - `FILE_ALREADY_EXISTS = "504"` - `CANNOT_DELETE_DIRECTORY = "505"` - `UNEXPECTED_ERROR_CODE = "506"` - `UNABLE_TO_SKIP_UNUSED_SHARDS = "507"` - `UNKNOWN_ACCESS_TYPE = "508"` - `INVALID_GRANT = "509"` - `CACHE_DICTIONARY_UPDATE_FAIL = "510"` - `UNKNOWN_ROLE = "511"` - `SET_NON_GRANTED_ROLE = "512"` - `UNKNOWN_PART_TYPE = "513"` - `ACCESS_STORAGE_FOR_INSERTION_NOT_FOUND = "514"` - `INCORRECT_ACCESS_ENTITY_DEFINITION = "515"` - `AUTHENTICATION_FAILED = "516"` - `CANNOT_ASSIGN_ALTER = "517"` - `CANNOT_COMMIT_OFFSET = "518"` - `NO_REMOTE_SHARD_AVAILABLE = "519"` - `CANNOT_DETACH_DICTIONARY_AS_TABLE = "520"` - `ATOMIC_RENAME_FAIL = "521"` - `UNKNOWN_ROW_POLICY = "523"` - `ALTER_OF_COLUMN_IS_FORBIDDEN = "524"` - `INCORRECT_DISK_INDEX = "525"` - `NO_SUITABLE_FUNCTION_IMPLEMENTATION = "527"` - `CASSANDRA_INTERNAL_ERROR = "528"` - `NOT_A_LEADER = "529"` - `CANNOT_CONNECT_RABBITMQ = "530"` - `CANNOT_FSTAT = "531"` - `LDAP_ERROR = "532"` - `INCONSISTENT_RESERVATIONS = "533"` - `NO_RESERVATIONS_PROVIDED = "534"` - `UNKNOWN_RAID_TYPE = "535"` - `CANNOT_RESTORE_FROM_FIELD_DUMP = "536"` - `ILLEGAL_MYSQL_VARIABLE = "537"` - `MYSQL_SYNTAX_ERROR = "538"` - `CANNOT_BIND_RABBITMQ_EXCHANGE = "539"` - `CANNOT_DECLARE_RABBITMQ_EXCHANGE = "540"` - `CANNOT_CREATE_RABBITMQ_QUEUE_BINDING = "541"` - `CANNOT_REMOVE_RABBITMQ_EXCHANGE = "542"` - `UNKNOWN_MYSQL_DATATYPES_SUPPORT_LEVEL = "543"` - `ROW_AND_ROWS_TOGETHER = "544"` - `FIRST_AND_NEXT_TOGETHER = "545"` - `NO_ROW_DELIMITER = "546"` - `INVALID_RAID_TYPE = "547"` - `UNKNOWN_VOLUME = "548"` - `DATA_TYPE_CANNOT_BE_USED_IN_KEY = "549"` - `CONDITIONAL_TREE_PARENT_NOT_FOUND = "550"` - `ILLEGAL_PROJECTION_MANIPULATOR = "551"` - `UNRECOGNIZED_ARGUMENTS = "552"` - `LZMA_STREAM_ENCODER_FAILED = "553"` - `LZMA_STREAM_DECODER_FAILED = "554"` - `ROCKSDB_ERROR = "555"` - `SYNC_MYSQL_USER_ACCESS_ERROR = "556"` - `UNKNOWN_UNION = "557"` - `EXPECTED_ALL_OR_DISTINCT = "558"` - `INVALID_GRPC_QUERY_INFO = "559"` - `ZSTD_ENCODER_FAILED = "560"` - `ZSTD_DECODER_FAILED = "561"` - `TLD_LIST_NOT_FOUND = "562"` - `CANNOT_READ_MAP_FROM_TEXT = "563"` - `INTERSERVER_SCHEME_DOESNT_MATCH = "564"` - `TOO_MANY_PARTITIONS = "565"` - `CANNOT_RMDIR = "566"` - `DUPLICATED_PART_UUIDS = "567"` - `RAFT_ERROR = "568"` - `MULTIPLE_COLUMNS_SERIALIZED_TO_SAME_PROTOBUF_FIELD = "569"` - `DATA_TYPE_INCOMPATIBLE_WITH_PROTOBUF_FIELD = "570"` - `DATABASE_REPLICATION_FAILED = "571"` - `TOO_MANY_QUERY_PLAN_OPTIMIZATIONS = "572"` - `EPOLL_ERROR = "573"` - `DISTRIBUTED_TOO_MANY_PENDING_BYTES = "574"` - `UNKNOWN_SNAPSHOT = "575"` - `KERBEROS_ERROR = "576"` - `INVALID_SHARD_ID = "577"` - `INVALID_FORMAT_INSERT_QUERY_WITH_DATA = "578"` - `INCORRECT_PART_TYPE = "579"` - `CANNOT_SET_ROUNDING_MODE = "580"` - `TOO_LARGE_DISTRIBUTED_DEPTH = "581"` - `NO_SUCH_PROJECTION_IN_TABLE = "582"` - `ILLEGAL_PROJECTION = "583"` - `PROJECTION_NOT_USED = "584"` - `CANNOT_PARSE_YAML = "585"` - `CANNOT_CREATE_FILE = "586"` - `CONCURRENT_ACCESS_NOT_SUPPORTED = "587"` - `DISTRIBUTED_BROKEN_BATCH_INFO = "588"` - `DISTRIBUTED_BROKEN_BATCH_FILES = "589"` - `CANNOT_SYSCONF = "590"` - `SQLITE_ENGINE_ERROR = "591"` - `DATA_ENCRYPTION_ERROR = "592"` - `ZERO_COPY_REPLICATION_ERROR = "593"` - BZIP2_STREAM_DECODER_FAILED = "594" - BZIP2_STREAM_ENCODER_FAILED = "595" - `INTERSECT_OR_EXCEPT_RESULT_STRUCTURES_MISMATCH = "596"` - `NO_SUCH_ERROR_CODE = "597"` - `BACKUP_ALREADY_EXISTS = "598"` - `BACKUP_NOT_FOUND = "599"` - `BACKUP_VERSION_NOT_SUPPORTED = "600"` - `BACKUP_DAMAGED = "601"` - `NO_BASE_BACKUP = "602"` - `WRONG_BASE_BACKUP = "603"` - `BACKUP_ENTRY_ALREADY_EXISTS = "604"` - `BACKUP_ENTRY_NOT_FOUND = "605"` - `BACKUP_IS_EMPTY = "606"` - `CANNOT_RESTORE_DATABASE = "607"` - `CANNOT_RESTORE_TABLE = "608"` - `FUNCTION_ALREADY_EXISTS = "609"` - `CANNOT_DROP_FUNCTION = "610"` - `CANNOT_CREATE_RECURSIVE_FUNCTION = "611"` - `OBJECT_ALREADY_STORED_ON_DISK = "612"` - `OBJECT_WAS_NOT_STORED_ON_DISK = "613"` - `POSTGRESQL_CONNECTION_FAILURE = "614"` - `CANNOT_ADVISE = "615"` - `UNKNOWN_READ_METHOD = "616"` - LZ4_ENCODER_FAILED = "617" - LZ4_DECODER_FAILED = "618" - `POSTGRESQL_REPLICATION_INTERNAL_ERROR = "619"` - `QUERY_NOT_ALLOWED = "620"` - `CANNOT_NORMALIZE_STRING = "621"` - `CANNOT_PARSE_CAPN_PROTO_SCHEMA = "622"` - `CAPN_PROTO_BAD_CAST = "623"` - `BAD_FILE_TYPE = "624"` - `IO_SETUP_ERROR = "625"` - `CANNOT_SKIP_UNKNOWN_FIELD = "626"` - `BACKUP_ENGINE_NOT_FOUND = "627"` - `OFFSET_FETCH_WITHOUT_ORDER_BY = "628"` - `HTTP_RANGE_NOT_SATISFIABLE = "629"` - `HAVE_DEPENDENT_OBJECTS = "630"` - `UNKNOWN_FILE_SIZE = "631"` - `UNEXPECTED_DATA_AFTER_PARSED_VALUE = "632"` - `QUERY_IS_NOT_SUPPORTED_IN_WINDOW_VIEW = "633"` - `MONGODB_ERROR = "634"` - `CANNOT_POLL = "635"` - `CANNOT_EXTRACT_TABLE_STRUCTURE = "636"` - `INVALID_TABLE_OVERRIDE = "637"` - `SNAPPY_UNCOMPRESS_FAILED = "638"` - `SNAPPY_COMPRESS_FAILED = "639"` - `NO_HIVEMETASTORE = "640"` - `CANNOT_APPEND_TO_FILE = "641"` - `CANNOT_PACK_ARCHIVE = "642"` - `CANNOT_UNPACK_ARCHIVE = "643"` - `REMOTE_FS_OBJECT_CACHE_ERROR = "644"` - `NUMBER_OF_DIMENSIONS_MISMATCHED = "645"` - `CANNOT_BACKUP_DATABASE = "646"` - `CANNOT_BACKUP_TABLE = "647"` - `WRONG_DDL_RENAMING_SETTINGS = "648"` - `INVALID_TRANSACTION = "649"` - `SERIALIZATION_ERROR = "650"` - `CAPN_PROTO_BAD_TYPE = "651"` - `ONLY_NULLS_WHILE_READING_SCHEMA = "652"` - `CANNOT_PARSE_BACKUP_SETTINGS = "653"` - `WRONG_BACKUP_SETTINGS = "654"` - `FAILED_TO_SYNC_BACKUP_OR_RESTORE = "655"` - `MEILISEARCH_EXCEPTION = "656"` - `UNSUPPORTED_MEILISEARCH_TYPE = "657"` - `MEILISEARCH_MISSING_SOME_COLUMNS = "658"` - `UNKNOWN_STATUS_OF_TRANSACTION = "659"` - `HDFS_ERROR = "660"` - `CANNOT_SEND_SIGNAL = "661"` - `FS_METADATA_ERROR = "662"` - `INCONSISTENT_METADATA_FOR_BACKUP = "663"` - `ACCESS_STORAGE_DOESNT_ALLOW_BACKUP = "664"` - `CANNOT_CONNECT_NATS = "665"` - `NOT_INITIALIZED = "667"` - `INVALID_STATE = "668"` - `NAMED_COLLECTION_DOESNT_EXIST = "669"` - `NAMED_COLLECTION_ALREADY_EXISTS = "670"` - `NAMED_COLLECTION_IS_IMMUTABLE = "671"` - `INVALID_SCHEDULER_NODE = "672"` - `RESOURCE_ACCESS_DENIED = "673"` - `RESOURCE_NOT_FOUND = "674"` - CANNOT_PARSE_IPV4 = "675" - CANNOT_PARSE_IPV6 = "676" - `THREAD_WAS_CANCELED = "677"` - `IO_URING_INIT_FAILED = "678"` - `IO_URING_SUBMIT_ERROR = "679"` - `MIXED_ACCESS_PARAMETER_TYPES = "690"` - `UNKNOWN_ELEMENT_OF_ENUM = "691"` - `TOO_MANY_MUTATIONS = "692"` - `AWS_ERROR = "693"` - `ASYNC_LOAD_CYCLE = "694"` - `ASYNC_LOAD_FAILED = "695"` - `ASYNC_LOAD_CANCELED = "696"` - `CANNOT_RESTORE_TO_NONENCRYPTED_DISK = "697"` - `INVALID_REDIS_STORAGE_TYPE = "698"` - `INVALID_REDIS_TABLE_STRUCTURE = "699"` - `USER_SESSION_LIMIT_EXCEEDED = "700"` - `CLUSTER_DOESNT_EXIST = "701"` - `CLIENT_INFO_DOES_NOT_MATCH = "702"` - `INVALID_IDENTIFIER = "703"` - `QUERY_CACHE_USED_WITH_NONDETERMINISTIC_FUNCTIONS = "704"` - `TABLE_NOT_EMPTY = "705"` - `LIBSSH_ERROR = "706"` - `GCP_ERROR = "707"` - `ILLEGAL_STATISTICS = "708"` - `CANNOT_GET_REPLICATED_DATABASE_SNAPSHOT = "709"` - `FAULT_INJECTED = "710"` - `FILECACHE_ACCESS_DENIED = "711"` - `TOO_MANY_MATERIALIZED_VIEWS = "712"` - `BROKEN_PROJECTION = "713"` - `UNEXPECTED_CLUSTER = "714"` - `CANNOT_DETECT_FORMAT = "715"` - `CANNOT_FORGET_PARTITION = "716"` - `EXPERIMENTAL_FEATURE_ERROR = "717"` - `TOO_SLOW_PARSING = "718"` - `QUERY_CACHE_USED_WITH_SYSTEM_TABLE = "719"` - `USER_EXPIRED = "720"` - `DEPRECATED_FUNCTION = "721"` - `ASYNC_LOAD_WAIT_FAILED = "722"` - `PARQUET_EXCEPTION = "723"` - `TOO_MANY_TABLES = "724"` - `TOO_MANY_DATABASES = "725"` - `DISTRIBUTED_CACHE_ERROR = "900"` - `CANNOT_USE_DISTRIBUTED_CACHE = "901"` - `KEEPER_EXCEPTION = "999"` - `POCO_EXCEPTION = "1000"` - `STD_EXCEPTION = "1001"` - `UNKNOWN_EXCEPTION = "1002"` --- URL: https://www.tinybird.co/docs/publish/charts Last update: 2024-12-13T10:17:28.000Z Content: --- title: "Charts · Tinybird Docs" theme-color: "#171612" description: "Create beautiful, fast charts of your Tinybird data." --- # Charts¶ Charts are a great way to visualize your data. You can create and publish easy, fast Charts in Tinybird from any of your published API Endpoints. <-figure-> ![Example Tinybird dashboard showing multiple chart types](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcharts-dashboard.png&w=3840&q=75) <-figcaption-> Example Tinybird Charts dashboard Check out the [live demo](https://guide-tinybird-charts.vercel.app/) to see an example of Charts in action. ## Overview¶ When you publish an API Endpoint, you often want to visualize the data in a more user-friendly way. Charts are a great way to do this. Tinybird provides three options: - No-code: A fast, UI-based flow for creating Charts that live in your Tinybird Workspace UI (great for internal reference use, smaller projects, and getting started). - Low code: Using the Tinybird UI to create Charts and generate an iframe, which you can then embed in your own application. - Code-strong: Using the `@tinybirdco/charts` npm React library to build out exactly what you need, in your own application, using React components. Fully customizable and secured with JWTs. You can either generate initial Chart data in the Tinybird UI, or start using the library directly. Instead of coding your own charts and dashboards from scratch, use Tinybird's pre-built Chart components. You won't have to implement the frontend and backend architecture, or any security middleware. Use the library components and JWTs, manage the token exchange flow, and interact directly with any of your published Tinybird API Endpoints. To create a Chart, you need to have a published API Endpoint. Learn how to [publish an API Endpoint here](https://www.tinybird.co/docs/docs/publish/api-endpoints). ## The Tinybird Charts library BETA¶ All options are built on the Tinybird Charts library ( `@tinybirdco/charts` ) , a modular way to build fast visualizations of your data, which leverages [Apache ECharts](https://echarts.apache.org/en/index.html) . You can use Tinybird Charts with any level of customization, and also use your Tinybird data with any third party library. ### Components¶ The library provides the following components: - `AreaChart` - `BarChart` - `BarList` - `DonutChart` - `LineChart` - `PieChart` - `Table` All components share the same API, making it easy to switch between different types of charts. The Tinybird Charts library is currently in public beta. If you have any feedback or suggestions, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Create a Chart in the UI (no-code)¶ 1. In your Workspace, navigate to the Overview page of one of your API Endpoints. 2. Select the "Create Chart" button (top right). 3. Configure your Chart by selecting the name, the type of Chart, and the fields you want to visualize. Under "Data", select the index and category. 4. Once you're happy with your Chart, select "Save". <-figure-> ![Example Tinybird pie chart, showing configuration options](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcharts-create-chart.png&w=3840&q=75) <-figcaption-> Example Tinybird pie Chart, showing configuration options Your Chart now lives in the API Endpoint Overview page. ## Create a Chart using an iframe (low code)¶ Tinybird users frequently want to take the data from their Tinybird API Endpoints, create charts, and embed them in their own dashboard application. A low-overhead option is to take the generated iframe and drop it into your application: 1. Create a Chart using the process described above. 2. In the API Endpoint Overview page, scroll down to the "Charts" tab and select your Chart. 3. Select the `<>` tab to access the code snippets. 4. Copy and paste the ready-to-use iframe code into your application. <-figure-> ![GIF showing a user creating a Pie Chart in the Tinybird UI and generating the iframe code](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcharts-create-pie-chart.gif&w=3840&q=75) <-figcaption-> Creating a Pie Chart in the Tinybird UI and generating the iframe code ## Create a Chart using the React library (code-strong)¶ This option gives you the most customization flexibility. You'll need to be familiar with frontend development and styling. To create a Chart component and use the library, you can either create a Chart in the UI first, or use the library directly. Using the library directly means there will not be a Chart created in your Workspace, and no generated snippet, so skip to #2. ### 1. View your Chart code¶ 1. In your Workspace, navigate to the API Endpoint Overview page. 2. Scroll down to the "Charts" tab and select one of your Charts. 3. Select the `<>` tab to access the code snippets. 4. You now have the code for a ready-to-use React component. ### 2. Install the library¶ Install the `@tinybirdco/charts` library locally in your project: npm install @tinybirdco/charts ### 3. Create a JWT¶ Calls need to be secured with a token. To learn more about the token exchange, see [Understanding the token exchange](https://www.tinybird.co/docs/docs/publish/charts/guides/charts-using-iframes-and-jwt-tokens#understanding-the-token-exchange). In React code snippets, the Chart components are authenticated using the `token` prop, where you paste your [Tinybird Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens). You can limit how often you or your users can fetch Tinybird APIs on a per-endpoint or per-user basis. See [Rate limits for JWTs](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#rate-limits-for-jwts). #### Create a JWT¶ There is wide support for creating JWTs in many programming languages and frameworks. See the [Tinybird JWT docs for popular options](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#create-a-jwt-in-production). ### 4. Embed the Chart into your application¶ Copy and paste the Chart snippet (either generated by Tinybird, or constructed by you [using the same configuration](https://www.npmjs.com/package/@tinybirdco/charts#usage) ) into your application. For example: ##### Example Line Chart component code snippet import React from 'react' import { LineChart } from '@tinybirdco/charts' function MyLineChart() { return ( ) } ### 5. Fetch data¶ The most common approach for fetching and rendering data is to directly use a single Chart component (or group of individual components) by passing the required props. These are included by default within each generated code snippet, so for most use cases, you should be able to simply copy, paste, and have the Chart you want. The library offers [many additional props](https://www.npmjs.com/package/@tinybirdco/charts?activeTab=readme#api) for further customization, including many that focus specifically on fetching data. See [6. Customization](https://www.tinybird.co/docs/docs/publish/charts#6-customization) for more. #### Alternative approaches and integrations¶ Depending on your needs, you have additional options: - Wrapping components within `` to share styles, query configuration, and custom loading and error states[ among several Chart components](https://www.npmjs.com/package/@tinybirdco/charts?activeTab=readme#reusing-styles-and-query-config-using-the-chartprovider) . - Adding your own fetcher to the `ChartProvider` (or to a specific Chart component) using the `fetcher` prop . This can be useful to add custom headers or dealing with JWT tokens. - Using the `useQuery` hook to[ fetch data and pass it directly to the component](https://www.npmjs.com/package/@tinybirdco/charts?activeTab=readme#using-the-hook) . It works with any custom component, or with any third-party library like[ Tremor](https://www.tremor.so/) or[ shadcn](https://ui.shadcn.com/) . <-figure-> ![GIF showing how to use the library with Tinybird Charts & shadcn in the UI](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcharts-tb-sd.gif&w=3840&q=75) <-figcaption-> Using the library with Tinybird Charts & shadcn ### 6. Customization¶ Tinybird supports customization: - ** Standard customization** : Use the properties provided by the[ @tinybirdco/charts library](https://www.npmjs.com/package/@tinybirdco/charts?activeTab=readme#api) . - ** Advanced customization** : Send a specific parameter to[ customize anything within a Chart](https://www.npmjs.com/package/@tinybirdco/charts?activeTab=readme#extra-personalization-using-echarts-options) , aligning with the[ ECharts specification](https://echarts.apache.org/handbook/en/get-started/) . ### 7. Filtering¶ Filtering is possible by using the endpoint parameters. Use the `params` data-fetching property to pass your parameters to a chart component, `` , or the `useQuery` hook. See the [example snippet](https://www.tinybird.co/docs/about:blank#4-embed-the-chart-into-your-application) for how `params` and filter are used. ### 8. Advanced configuration (optional)¶ #### Polling¶ Control the frequency that your chart polls for new data by setting the `refreshInterval` prop (interval in milliseconds). The npm library offers [a range of additional component props](https://www.npmjs.com/package/@tinybirdco/charts) specifically for data fetching, so be sure to review them and use a combination to build the perfect chart. Use of this feature may significantly increase your billing costs. The lower the refreshInterval prop (so the more frequently you're polling for fresh data), the more requests you're making to your Tinybird API Endpoints. Read [the billing docs](https://www.tinybird.co/docs/docs/get-started/plans/billing) and understand the pricing of different operations. #### Global vs local settings¶ Each chart can have its own settings, or settings can be shared across a group of Chart components by wrapping them within ``. #### States¶ Chart components can be one of a range of states: - Success - Error - Loading ## Example use cases¶ Two examples showing different ways to generate JWTs, set up a local project, and implement a Chart: 1. Guide:[ Consume APIs in a Next.js frontend with JWTs](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-apis-nextjs) . 2. Guide:[ Build charts with iframes and JWTs](https://www.tinybird.co/docs/docs/publish/charts/guides/charts-using-iframes-and-jwt-tokens) . Interested in Charts but don't have any data to use? Run the demo in the [Add Tinybird Charts to a Next.js frontend](https://www.tinybird.co/docs/docs/publish/charts/guides/add-charts-to-nextjs) guide, which uses a bootstrapped Next.js app and fake data. ## Troubleshooting¶ ### Handle token errors¶ See the information on [JWT error handling](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#error-handling). ### Refresh token¶ See the information on [JWT refreshing and limitations](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#jwt-limitations). ## Next steps¶ - Check out the[ Tinybird Charts library](https://www.npmjs.com/package/@tinybirdco/charts) . - Understand how Tinybird[ uses Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) . --- URL: https://www.tinybird.co/docs/publish/charts/guides Content: --- title: "Charts guides · Tinybird Docs" theme-color: "#171612" description: "Guides for using Tinybird Charts." --- # Charts guides¶ Learn how to create and integrate Tinybird Charts into your applications. These guides cover different approaches, from no-code solutions using the Tinybird UI to code-strong implementations with the React library. Tinybird Charts make it easy to visualize your data from published API Endpoints. You can: - Create charts directly in the Tinybird UI with no coding required. - Embed charts in your applications using iframes. - Build custom chart implementations using the `@tinybirdco/charts` React library. The following guides are available: --- URL: https://www.tinybird.co/docs/publish/charts/guides/add-charts-to-nextjs Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Add Tinybird Charts to a Next.js frontend · Tinybird Docs" theme-color: "#171612" description: "Tinybird Charts make it easy to create interactive charts. In this guide, we'll show you how to add Tinybird Charts to a Next.js frontend." --- # Add Tinybird Charts to a Next.js frontend¶ In this guide, you'll learn how to generate create Tinybird Charts from the UI, and add them to your Next.js frontend. Tinybird Charts make it easy to visualize your data and create interactive charts. You can create a chart from the UI, and then embed it in your frontend application. This guide will show you how to add Tinybird Charts to a Next.js frontend. You can view the [live demo](https://guide-tinybird-charts.vercel.app/) or browse the [GitHub repo (guide-tinybird-charts)](https://github.com/tinybirdco/guide-tinybird-charts). <-figure-> ![Tinybird charts demo](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fcharts-demo.png&w=3840&q=75) ## Prerequisites¶ This guide assumes that you have a Tinybird account, and you are familiar with creating a Tinybird Workspace and pushing resources to it. You'll need a working familiarity with JavaScript and Next.js. ## Run the demo¶ These steps cover running the GitHub demo locally. [Skip to the next section](https://www.tinybird.co/docs/about:blank#build-from-scratch) to build the demo from scratch. ### 1. Clone the GitHub repo¶ Clone the [GitHub repo (guide-tinybird-charts)](https://github.com/tinybirdco/guide-tinybird-charts) to your local machine. ### 2. Push Tinybird resources¶ The repo contains a `tinybird` folder which includes sample Tinybird resources: - `events.datasource` : The Data Source for incoming events. - `airline_market_share.pipe` : An API Endpoint giving a count of bookings per airline. - `bookings_over_time.pipe` : An API Endpoint giving a time series of booking volume over time. - `bookings_over_time_by_airline.pipe` : An API Endpoint giving a time series of booking volume over time with an `airline` filter. - `meal_choice_distribution.pipe` : An API Endpoint giving a count of meal choices across bookings. - `top_airlines.pipe` : An API Endpoint giving a list of the top airlines by booking volume. Make a new Tinybird Workspace in the region of your choice. Then, configure the [Tinybird CLI](https://www.tinybird.co/docs/docs/cli/install) (install and authenticate) and `tb push` the resources to your Workspace. Alternatively, you can drag and drop the files onto the UI to upload them. ### 3. Generate some fake data¶ Use [Mockingbird](https://tbrd.co/mockingbird-tinybird-charts-guide) to generate fake data for the `events` Data Source. Using this link ^ provides a pre-configured schema, and you'll just need to enter your Workspace Admin Token and select the Host region that matches your Workspace. When configured, select `Save` , then scroll down and select `Start Generating!`. In the Tinybird UI, confirm that the `events` Data Source is successfully receiving data. ### 4. Install dependencies¶ In the cloned repo, navigate to `/app` and install the dependencies with `npm install`. ### 5. Configure .env¶ First create a new file `.env.local` ##### Create the .env.local file in /app cp .env.example .env.local From the Tinybird UI, copy the read Token for the Charts (if you deployed the resources from this repo, it will be called `CHART_READ_TOKEN` ). Paste the Token into the `.env.local` file in your directory: ##### In the .env.local file NEXT_PUBLIC_TINYBIRD_STATIC_READ_TOKEN="STATIC READ TOKEN" ### Run the demo app¶ Run it locally: npm run dev Then open `localhost:3000` with your browser. ## Build from scratch¶ This section will take you from a fresh Tinybird Workspace to a Next.js app with a Tinybird Chart. ### 1. Set up a Workspace¶ Create a new Workspace. This guide uses the `EU GCP` region, but you can use any region. Save [this .datasource file](https://github.com/tinybirdco/guide-tinybird-charts/blob/main/tinybird/datasources/events.datasource) locally, and upload it to the Tinybird UI - you can either drag and drop, or use **Create new (+)** to add a new Data Source. You now have a Workspace with an `events` Data Source and specified schema. Time to generate some data to fill the Data Source! ### 2. Generate some fake data¶ Use [Mockingbird](https://tbrd.co/mockingbird-tinybird-charts-guide) to generate fake data for the `events` Data Source. Using this link ^ provides a pre-configured schema, and you'll just need to enter your Workspace Admin Token and select the Host region that matches your Workspace. When configured, select `Save` , then scroll down and select `Start Generating!`. In the Tinybird UI, confirm that the `events` Data Source is successfully receiving data. ### 3. Create and publish an API Endpoint¶ In the Tinybird UI, select the `events` Data Source and then select `Create Pipe` in the top right. In the new Pipe, change the name to `top_airlines`. In the first SQL Node,paste the following SQL: SELECT airline, count() as bookings FROM events GROUP BY airline ORDER BY bookings DESC LIMIT 5 Name this node `endpoint` and select `Run`. Now, publish the Pipe by selecting `Create API Endpoint` and selecting the `endpoint` Node. Congratulations! You have a published API Endpoint. ### 4. Create a Chart¶ Publishing the API Endpoint takes you to the API Endpoint overview page. Scroll down to the `Output` section and select the `Charts` tab. Select `Create Chart`. On the `General` tab, set the name to `Top Airlines` then choose `Bar List` as the Chart type. On the `Data` tab, choose the `airline` column for the `Index` and check the `bookings` box for the `Categories`. Select Save. ### 5. View the Chart component code¶ After saving your Chart, you'll be returned to the API Endpoint overview page and you'll see your Chart in the `Output` section. To the view the component code for the Chart, select the code symbol ( `<>` ) above it. You'll see the command to install the `tinybird-charts` library as well as the React component code. ### 6. Create a new Next.js app¶ On your local machine, create a new working directory and navigate to it. For this example, it's called `myapp`. ##### Make a new working directory for your Next.js frontend app mkdir myapp cd myapp In the `myapp` dir, create a new Next.js app with the following command: ##### Initialize a new Next.js app npx create-next-app@latest Some prompts to configure the app appears. Use the following settings: ##### Example new Next.js app settings ✔ What is your project named? … tinybird-demo ✔ Would you like to use TypeScript? … No / [Yes] ✔ Would you like to use ESLint? … No / [Yes] ✔ Would you like to use Tailwind CSS? … No / [Yes] ✔ Would you like to use `src/` directory? … No / [Yes] ✔ Would you like to use App Router? (recommended) … No / [Yes] ✔ Would you like to customize the default import alias (@/*)? … [No] / Yes After the app is created, navigate to the app directory (this will be the same as the project name you entered, in this example, `tinybird-demo` ). cd tinybird-demo ### 7. Add the Tinybird Charts library¶ Add the Tinybird Charts library to your project npm install @tinybirdco/charts ### 8. Add the Chart component¶ Create a new subfolder and file `src/app/components/Chart.tsx` . This will contain the component code for the Chart. [Copy the component from the Tinybird UI](https://www.tinybird.co/docs/about:blank#5-view-the-chart-component-code) and paste it here. It should look like this: ##### Example Chart.tsx code copied from Tinybird UI Chart 'use client' import { BarList } from '@tinybirdco/charts' export function TopAirlines() { return ( ) } Save the file. ### 9. Add the Chart to your page¶ In your `src/app/page.tsx` file, delete the default contents so you have an empty file. Then, import the `TopAirlines` component and add it to the page: ##### src/app/page.tsx import { TopAirlines } from "./components/Chart"; export default function Home() { return (
) } ### 10. Run the app¶ Run the app with `npm run dev` and open `localhost:3000` in your browser. ### 11. You're done\!¶ You've successfully added a Tinybird Chart to a Next.js frontend. Your Next.js frontend should now show a single bar line Chart. See the [live demo](https://guide-tinybird-charts.vercel.app/) and browse the [GitHub repo (guide-tinybird-charts)](https://github.com/tinybirdco/guide-tinybird-charts) for inspiration on how to combine more Chart components to make a full dashboard. ## Next steps¶ - Interested in dashboards? Explore Tinybird's many applications in the[ Use Case Hub](https://www.tinybird.co/docs/docs/use-cases) . --- URL: https://www.tinybird.co/docs/publish/charts/guides/analytics-with-confluent Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build user-facing analytics apps with Confluent · Tinybird Docs" theme-color: "#171612" description: "Learn how to build a user-facing web analytics application with Confluent and Tinybird." --- # Build a user-facing web analytics application with Confluent and Tinybird¶ In this guide you'll learn how to take data from Kafka and build a user-facing web analytics dashboard using Confluent and Tinybird. [GitHub Repository](https://github.com/tinybirdco/demo_confluent_charts/tree/main) <-figure-> ![Tinybird Charts showing e-commerce events](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-confluent-chart-1.png&w=3840&q=75) In this tutorial, you will learn how to: 1. Connect Tinybird to a Kafka topic. 2. Build and publish Tinybird API Endpoints using SQL. 3. Create 2 Charts without having to code from scratch. ## Prerequisites¶ To complete this tutorial, you'll need: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. An empty Tinybird Workspace 3. A Confluent account 4. Node.js >=20.11 This tutorial includes a [Next.js](https://nextjs.org/) app for frontend visualization, but you don't need working familiarity with TypeScript - just copy & paste the code snippets. ## 1. Setup¶ Clone the `demo_confluent_charts` repo. ## 2. Create your data¶ ### Option 1: Use your own existing data¶ In Confluent, create a Kafka topic with simulated e-commerce events data. Check [this file](https://github.com/tinybirdco/demo_confluent_charts/blob/main/tinybird/datasources/ecomm_events.datasource) for the schema outline to follow. ### Option 2: Mock the data¶ Use Tinybird's [Mockingbird](https://mockingbird.tinybird.co/docs) , an open source mock data stream generator, to stream mock web events instead. In the repo, navigate to `/datagen` and run `npm i` to install the dependencies. Create an `.env` and replace the default Confluent variables: cp .env.example .env Run the mock generator script: node mockConfluent.js ## 3. Connect Confluent <> Tinybird¶ In your Tinybird Workspace, create a new [Data Source](https://www.tinybird.co/docs/docs/get-data-in/data-sources) using the native [Confluent connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/confluent) . Paste in the bootstrap server, rename the connection to `tb_confluent` , then paste in your API key and secret. Select "Next". Search for and select your topic, and select "Next". Ingest from the earliest offset, then under "Advanced settings" > "Sorting key" select `timestamp`. Rename the Data Source to `ecomm_events` and select "Create". Your Data Source is now ready, and you've connected Confluent to Tinybird! You now have something that's like a database table and a Kafka consumer ***combined*** . Neat. ## 4. Transform your data¶ ### Query your data stream¶ Your data should now be streaming in, so let's do something with it. In Tinybird, you can transform data using straightforward SQL in chained nodes that form a Pipe. Create a new [Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) and rename it `sales_trend` . In the first node space, paste the following SQL: SELECT timestamp, sales FROM ecomm_events WHERE timestamp >= now() - interval 7 day This gets the timestamp and sales from just the last 7 days. Run the query and rename the node `filter_data`. In the second node space, paste the following: SELECT toDate(timestamp) AS ts, sum(sales) AS total_sales from filter_data GROUP BY ts ORDER BY ts This casts the timestamp to a date as `ts` , and sums up the sales - meaning you can get a trend of sales by day. Run the query and rename the node `endpoint`. ### Publish your transformed data¶ In the top right of the screen, select "Create API Endpoint" and select the `endpoint` Node. Congratulations! It's published and ready to be consumed. ## 5. Create a Tinybird Chart¶ In the top right of the screen, select "Create Chart". Rename the chart "Sales Trend" and select and Area Chart. Under the "Data" tab, select `ts` as the index and `total_sales` as the category. You should see a Chart magically appear! In the top right of the screen, select "Save". ## 6. Run an app locally¶ View the component code for the Chart by selecting the code symbol ( `<>` ) above it. Copy this code and paste into a new file in the `components` folder called `SalesTrend.tsx`. In `page.tsx` , replace `

Chart 1

` with your new Chart `` . Save and view in the browser with `npm run dev` . You should see your Chart! ### Create a second Pipe --> Chart¶ Create a second Pipe in Tinybird called `utm_sales`: SELECT utm_source, sum(sales) AS total_sales FROM ecomm_events WHERE timestamp >= now() - interval 7 day GROUP BY utm_source ORDER BY total_sales DESC This gets sales by utm over the last 7 days. Run the query and rename the node `endpoint` .... Then, you guessed it! Publish it as an Endpoint, create a Chart, and get the code. This time, create a donut Chart called "UTM Sales" with `utm_source` as the index and `total_sales` as the category. Check the "Legend" box and play around with the colors to create clear differentiators. Create a new component file called `UTMSales.tsx` and import in `page.tsx` replacing Chart 2. You did it! <-figure-> ![Tinybird Charts showing e-commerce events](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-confluent-chart-1.png&w=3840&q=75) ## Next steps¶ - Read more about[ Tinybird Charts](https://www.tinybird.co/docs/docs/publish/charts) . - Use Charts internally to[ monitor latency](https://www.tinybird.co/docs/docs/monitoring/latency#how-to-visualize-latency) in your own Workspace. --- URL: https://www.tinybird.co/docs/publish/charts/guides/bigquery-dashboard Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build user-facing dashboard with BigQuery · Tinybird Docs" theme-color: "#171612" description: "Learn how to build a user-facing dashboard using Tinybird and BigQuery." --- # Build a user-facing analytics dashboard with BigQuery and Tinybird¶ In this guide you'll learn how to take data from BigQuery and build a user-facing analytics dashboard using Tinybird, Next.js, and Tremor components. You'll end up with a dashboard and enough familiarity with Tremor to adjust the frontend & data visualization for your own projects in the future. Google BigQuery is a serverless data warehouse, offering powerful online analytical processing (OLAP) computations over large data sets with a familiar SQL interface. Since its launch in 2010, it's been widely adopted by Google Cloud users to handle long-running analytics queries to support strategic decision-making through business intelligence (BI) visualizations. Sometimes, however, you want to extend the functionality of your BigQuery data beyond business intelligence: For instance, real-time data visualizations that can be integrated into user-facing applications. As outlined in [the Tinybird blog post on BigQuery dashboard options](https://www.tinybird.co/blog-posts/bigquery-real-time-dashboard) , you can build Looker Studio dashboards over BigQuery data, but they'll struggle to support user-facing applications that require high concurrency, fresh data, and low-latency API responses. Tinybird is the smart option for fast and real-time. Let's get building! [GitHub Repository](https://github.com/tinybirdco/bigquery-dashboard) <-figure-> ![Analytics dashboard build with BigQuery data, Tinybird Endpoints, and Tremor components in a Next.js app](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-bigquery-dashboard.png&w=3840&q=75) Imagine you're a ***huge*** baseball fan. You want to build a real-time dashboard that aggregates up-to-the-moment accurate baseball stats from teams around the world, and gives you the scoop on all your favorite players. This tutorial explains how to build a really nice-looking prototype version. In this tutorial, you'll learn how to: 1. Ingest your existing BigQuery data into Tinybird. 2. Process and transform that data with accessible SQL. 3. Publish the transformations as real-time APIs. 4. Use Tremor components in a Next.js app to build a clean, responsive, real-time dashboard that consumes those API Endpoints. ## Prerequisites¶ To complete this tutorial, you'll need: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. A BigQuery account 3. Node.js >=20.11 This tutorial includes a [Next.js](https://nextjs.org/) app and [Tremor](https://www.tremor.so/components) components for frontend visualization, but you don't need working familiarity with TypeScript or JavaScript - just copy & paste the code snippets. ## 1. Create a Tinybird Workspace¶ Navigate to the Tinybird web UI ( [app.tinybird.co](https://app.tinybird.co/) ) and create an empty Tinybird Workspace (no starter kit) called `bigquery_dashboard` in your preferred region. ## 2. Connect your BigQuery dataset to Tinybird¶ To get your BigQuery data into Tinybird, you'll use the [Tinybird BigQuery Connector](https://www.tinybird.co/docs/docs/get-data-in/connectors/bigquery). Download [this sample dataset](https://github.com/tinybirdco/bigquery-dashboard/blob/main/baseball_stats.csv) that contains 20,000 rows of fake baseball stats. Upload it to your BigQuery project as a new CSV dataset. Next, follow the [steps in the documentation](https://www.tinybird.co/docs/docs/get-data-in/connectors/bigquery#load-a-bigquery-table-in-the-ui) to authorize Tinybird to view your BigQuery tables, select the table you want to sync, and set a sync schedule. Call the Data Source `baseball_game_stats`. Tinybird will copy the contents of your BigQuery table into a Tinybird Data Source and ensure the Data Source stays in sync with your BigQuery table. Tinybird can sync BigQuery tables as often as every 5 minutes. If you need fresher data in your real-time dashboards, consider sending data to Tinybird via alternative sources such as [Apache Kafka](https://www.tinybird.co/docs/docs/get-data-in/connectors/kafka), [Confluent Cloud](https://www.tinybird.co/docs/docs/get-data-in/connectors/confluent), [Google Pub/Sub](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-google-pubsub) , or Tinybird's native [HTTP streaming endpoint](https://www.tinybird.co/docs/docs/get-data-in/guides/ingest-from-the-events-api). ## 3. Create some Pipes¶ In Tinybird, a [Pipe](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) is a transformation definition comprised of a series of SQL statements. You can build metrics through a series of short, composable [nodes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes#nodes) of SQL. Think of Pipes as a way to build SQL queries without always needing to write common table expressions or subqueries, as these can be split out into reusable, independent nodes. For example, here's a simple single-Node Pipe definition that calculates the season batting average for each player: ##### player\_batting\_percentages.pipe SELECT player_name AS "Player Name", sum(stat_hits)/sum(stat_at_bats) AS "Batting Percentage" FROM baseball_game_stats GROUP BY "Player Name" ORDER BY "Batting Percentage" DESC Create your first Pipe from your newly-created BigQuery Data Source by selecting “Create Pipe” in the top right corner of the Tinybird UI. Paste in the SQL above and run the query. Rename the Pipe `player_batting_percentages`. Naming your Pipe something descriptive is important, as the Pipe name will be used as the URL slug for your API Endpoint later on. ## 4. Extend Pipes with Query Parameters¶ Every good dashboard is interactive. You can make your Tinybird queries interactive using Tinybird's templating language to [generate query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) . In Tinybird, you add query parameters using `{{(,}}` , defining the data type of the parameter, its name, and an optional default value. For example, you can extend the SQL query in the previous step to dynamically change the number of results returned from the Pipe, by using a `limit` parameter and a default value of 10: ##### player\_batting\_percentages.pipe plus query parameters SELECT player_name AS "Player Name", sum(stat_hits)/sum(stat_at_bats) AS "Batting Percentage" FROM baseball_game_stats GROUP BY "Player Name" ORDER BY "Batting Percentage" DESC LIMIT {{UInt16(limit, 10, description="The number of results to display")}} Replace the SQL in your Pipe with this code snippet. Run the query and rename the node `endpoint`. The `%` character at the start of a Tinybird SQL query shows there's a [dynamic query parameter](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters#define-dynamic-parameters) coming up. ## 5. Publish your Pipes as APIs¶ The magic of Tinybird is that you can instantly publish your Pipes as fully-documented, scalable REST APIs instantly. From the Pipe definition in the Tinybird UI, select “Create API Endpoint” in the top right corner, select the `endpoint` Node. Congratulations! You just ingested BigQuery data, transformed it, and published it as a Tinybird API Endpoint! ### Create additional Pipes¶ Create these additional 5 Pipes (they can also be found in the [project repository](https://github.com/tinybirdco/bigquery-dashboard/tree/main/data-project/pipes) ). Rename them as they are titled in each snippet, and call each node `endpoint` . Read through the SQL to get a sense of what each query does, then run and publish each one as its own API Endpoint: ##### batting\_percentage\_over\_time % SELECT game_date AS "Game Date", sum(stat_hits)/sum(stat_at_bats) AS "Batting Percentage" FROM baseball_game_stats WHERE player_team = {{String(team_name, 'BOS', required=True)}} GROUP BY "Game Date" ORDER BY "Game Date" ASC ##### most\_hits\_by\_type % SELECT player_name AS name, sum({{ column(hit_type, 'stat_hits') }}) AS value FROM baseball_game_stats GROUP BY name ORDER BY value DESC LIMIT 7 ##### opponent\_batting\_percentages % SELECT game_opponent AS "Team", sum(stat_hits) / sum(stat_at_bats) AS "Opponent Batting Percentage" FROM baseball_game_stats GROUP BY "Team" ORDER BY "Opponent Batting Percentage" ASC LIMIT {{ UInt16(limit, 10) }} ##### player\_batting\_percentages % SELECT player_name AS "Player Name", sum(stat_hits)/sum(stat_at_bats) AS "Batting Percentage" FROM baseball_game_stats GROUP BY "Player Name" ORDER BY "Batting Percentage" DESC LIMIT {{UInt16(limit, 10)}} ##### team\_batting\_percentages % SELECT player_team AS "Team", sum(stat_hits) / sum(stat_at_bats) AS "Batting Percentage" FROM baseball_game_stats GROUP BY "Team" ORDER BY "Batting Percentage" DESC LIMIT {{ UInt16(limit, 10) }} 1 Data Source, 6 Pipes: Perfect. Onto the next step. ## 6. Create a Next.js app¶ This tutorial uses Next.js, but you can visualize Tinybird APIs just about anywhere, for example with an app-building tool like [Retool](https://www.tinybird.co/blog-posts/service-data-sources-and-retool) or a monitoring platform like [Grafana](https://www.tinybird.co/blog-posts/tinybird-grafana-plugin-launch). In your terminal, create a project folder and inside it create your Next.js app, using all the default options: ##### Create a Next app mkdir bigquery-tinybird-dashboard cd bigquery-tinybird-dashboard npx create-next-app Tinybird APIs are accessible via [Tokens](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) . In order to run your dashboard locally, you'll need to create a `.env.local` file at the root of your new project: ##### Create .env.local at root of my-app touch .env.local And include the following: ##### Set up environment variables NEXT_PUBLIC_TINYBIRD_HOST="YOUR TINYBIRD API HOST" # Your regional API host e.g. https://api.tinybird.co NEXT_PUBLIC_TINYBIRD_TOKEN="YOUR SIGNING TOKEN" # Use your Admin Token as the signing token Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ## 7. Define your APIs in code¶ To support the dashboard components you're about to build, it's a great idea to create a helper file that contains all your Tinybird API references. In the project repo, that's called `tinybird.js` and it looks like this: ##### tinybird.js helper file const playerBattingPercentagesURL = (host, token, limit) => `https://${host}/v0/pipes/player_batting_percentages.json?limit=${limit}&token=${token}` const teamBattingPercentagesURL = (host, token, limit) => `https://${host}/v0/pipes/team_batting_percentages.json?limit=${limit}&token=${token}` const opponentBattingPercentagesURL = (host, token, limit) => `https://${host}/v0/pipes/opponent_batting_percentages.json?limit=${limit}&token=${token}` const battingPercentageOverTimeURL = (host, token, team_name) => `https://${host}/v0/pipes/batting_percentage_over_time.json?team_name=${team_name}&token=${token}` const hitsByTypeURL = (host, token, hit_type) => `https://${host}/v0/pipes/most_hits_by_type.json?hit_type=${hit_type}&token=${token}` const fetchTinybirdUrl = async (fetchUrl, setData, setLatency) => { const data = await fetch(fetchUrl) const jsonData = await data.json(); setData(jsonData.data); setLatency(jsonData.statistics.elapsed) } export { fetchTinybirdUrl, playerBattingPercentagesURL, teamBattingPercentagesURL, opponentBattingPercentagesURL, battingPercentageOverTimeURL, hitsByTypeURL } Inside `/src/app` , create a new subfolder called `/services` and paste the snippet into a new `tinybird.js` helper file. ## 8. Build your dashboard components¶ This tutorial uses the [Tremor React library](https://tremor.so/) because it provides a clean UI out of the box with very little code. You could easily use [ECharts](https://echarts.apache.org/en/index.html) or something similar if you prefer. ### Add Tremor to your Next.js app¶ You're going to use Tremor to create a simple bar chart that displays the signature count for each organization. Tremor provides stylish React chart components that you can deploy easily and customize as needed. Inside your app folder, install Tremor with the CLI: ##### Install Tremor npx @tremor/cli@latest init Select Next as your framework and allow Tremor to overwrite your existing `tailwind.config.js`. ### Create dashboard component files¶ Your final dashboard contains 3 Bar Charts, 1 Area Chart, and 1 Bar List. You'll use Tremor Cards to display these components, and each one will have an interactive input. In addition, you'll show the API response latency underneath the Chart (just so you can show off about how “real-timey” the dashboard is). Here's the code for the Player Batting Averages component ( `playerBattingPercentages.js` ). It sets up the file, defines the limit parameters, then renders the Chart components: "use-client"; import { Card, Title, Subtitle, BarChart, Text, NumberInput, Flex } from '@tremor/react'; // Tremor components import React, { useState, useEffect } from 'react'; import {fetchTinybirdUrl, playerBattingPercentagesURL } from '../services/tinybird.js' // Tinybird API // utilize useState/useEffect to get data from Tinybird APIs on change const PlayerBattingPercentages = ({host, token}) => { const [player_batting_percentages, setData] = useState([{ "Player Name": "", "Batting Percentage": 0, }]); // set latency from the API response const [latency, setLatency] = useState(0 // set limit parameter when the component input is changed const [limit, setLimit] = useState(10); // format the numbers on the component const valueFormatter = (number) => `${new Intl.NumberFormat("us").format(number).toString()}`; // set the Tinybird API URL with query parameters let player_batting_percentages_url = playerBattingPercentagesURL(host, token, limit) useEffect(() => { fetchTinybirdUrl(player_batting_percentages_url, setData, setLatency) }, [player_batting_percentages_url]); // build the Tremor component return (

Player Batting Percentages All Players
# of Results setLimit(value)} />
// Build the bar chart with the data received from the Tinybird API Latency: {latency*1000} ms // Add the latency metric ); }; export default PlayerBattingPercentages; In the project repo, you'll find the 5 dashboard components you need, inside the `src/app/components` directory. Each one renders a dashboard component to display the data received by one of the Tinybird APIs. It's time to build them out. For this tutorial, just recreate the same files in your app, pasting in the JavaScript (or downloading the files and dropping them in to your app directory). When building your own dashboard in future, use this as a template and build to fit your needs! ## 9. Compile components into a dashboard¶ Final step! Update your `page.tsx` file to render a nicely-organized dashboard with your 5 components: Replace the contents of `page.tsx` with [this file](https://github.com/tinybirdco/bigquery-dashboard/blob/main/src/app/page.js). The logic in this page gets your Tinybird Token from your local environment variables to be able to access the Tinybird APIs, then renders the 5 components you just built in a Tremor [Grid](https://blocks.tremor.so/blocks/grid-lists). To visualize your dashboard, run it locally with `npm run dev` and open http://localhost:3000. You'll see your complete real-time dashboard! <-figure-> ![Analytics dashboard build with BigQuery data, Tinybird Endpoints, and Tremor components in a Next.js app](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-bigquery-dashboard.png&w=3840&q=75) Notice the latencies in each dashboard component. This is the Tinybird API request latency. This isn't using any sort of cache or query optimization; each request is directly querying the 20,000 rows in the table and returning a response. As you interact with the dashboard and change inputs, the APIs respond. In this case, that's happening in just a few milliseconds. Now ***that's*** a fast dashboard. ### Optional: Expand your dashboard¶ You've got the basics: An active Workspace and Data Source, knowledge of how to build Pipes, and access to the [Tremor docs](https://www.tremor.so/docs/getting-started/installation) . Build out some more Pipes, API Endpoints, and visualizations! You can also spend some time [optimizing your data project](https://www.tinybird.co/docs/docs/work-with-data/query/sql-best-practices) for faster responses and minimal data processing using fine-tuned indexes, [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) , and more. ## Next steps¶ - Investigate the[ GitHub repository for this project](https://github.com/tinybirdco/bigquery-dashboard) in more depth. - Understand today's real-time analytics landscape with[ Tinybird's definitive guide](https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide) . - Learn how to implement[ multi-tenant security](https://www.tinybird.co/blog-posts/multi-tenant-saas-options) in your user-facing analytics. --- URL: https://www.tinybird.co/docs/publish/charts/guides/charts-using-iframes-and-jwt-tokens Last update: 2025-01-17T07:58:01.000Z Content: --- title: "Build charts with iframes and JWTs. · Tinybird Docs" theme-color: "#171612" description: "Tinybird Charts make it easy to create interactive charts. In this guide, you'll learn how build them using iframes and JWTs." --- # Build charts with iframes and JWTs¶ In this guide, you'll learn how to build Tinybird Charts using inline frames (iframes) and secure them using JSON Web Tokens (JWTs) [Tinybird Charts](https://www.tinybird.co/docs/docs/publish/charts) make it easy to visualize your data and create interactive charts. As soon as you've published an API Endpoint, you can create a Chart from the Tinybird UI, then immediately embed it in your frontend application. Check out the [live demo](https://guide-tinybird-charts.vercel.app/) to see an example of Charts in action. [JWTs](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#json-web-tokens-jwts) are signed tokens that allow you to securely authorize and share data between your application and Tinybird. ## Prerequisites¶ This guide assumes that you have a Tinybird Workspace with active data and one or more published API Endpoints. You'll need a basic familiarity with JavaScript and Python. ## 1. Create a Chart¶ [Create a new Chart](https://www.tinybird.co/docs/docs/publish/charts) based on any of your API Endpoints. ## 2. View the Chart component code¶ After saving your Chart, you're on the API Endpoint "Overview" page. Your Chart should be visible in the "Output" section. To the view the component code for the Chart, select the code symbol ( `<>` ) above it. Select the dropdown and select the iframe example, instead of the default React one. Now select the JWT so the Static Token shown in the iframe's URL is replaced by a `` placeholder. <-figure-> ![Get the iframe code](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-iframes-jwts__get-iframe-code.png&w=3840&q=75) ## 3. Insert your iframe into a new page¶ Now you have your Chart code, create a new `index.html` file to paste the code into: ##### index.html Tinybird Charts In the next step, you'll generate a JWT to replace the `` placeholder. ## 4. Create a JWT¶ ### Understanding the token exchange¶ <-figure-> ![Generate a new token](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fguides-iframes-jwts__token-exchange.png&w=3840&q=75) For each user session (or any other approach you want to follow), your frontend application will send a request with a JWT to your backend. It can be a new or an existing one. Your backend will self-sign and return the token. From that point onwards, you can use this token for any Chart or API call made to Tinybird, directly from your frontend application. JWTs support TTL and includes multi-tenancy capabilities, which makes them safe to use without creating any complex middleware. ### Create a JWT endpoint¶ Create a new endpoint that your frontend will use to retrieve your token. Remember to set your `TINYBIRD_SIGNING_KEY`: ##### Generate a new token from flask import Flask, jsonify, render_template import jwt import datetime import os app = Flask(__name__) # Get your Tinybird admin Token TINYBIRD_SIGNING_KEY= # Use your admin Token as the signing key, or use process.env.TB_TOKEN / similar if you have it set locally # Generate Token function for a specific pipe_id def generate_jwt(): expiration_time = datetime.datetime.utcnow() + datetime.timedelta(hours=48) workspace_id = "1f484a32-6966-4f63-9312-aadad64d3e12" token_name = "charts_token" pipe_id = "t_b9427fe2bcd543d1a8923d18c094e8c1" payload = { "workspace_id": workspace_id, "name": token_name, "exp": expiration_time, "scopes": [ { "type": "PIPES:READ", "resource": pipe_id }, ], } return jwt.encode(payload, TINYBIRD_SIGNING_KEY, algorithm='HS256') @app.route('/generate-token', methods=['GET']) def get_token(): token = generate_jwt() return jsonify({"token": token}) if __name__ == '__main__': app.run(host='0.0.0.0', port=5151) ## 5. Use the JWT in your iframe¶ Edit your `index.html` file using JavaScript to retrieve a JWT from your API Endpoint, and include this token in your iframe. ##### Update the index.html file Tinybird Charts ## Next steps¶ - Learn more about[ JSON Web Tokens (JWTs)](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens#json-web-tokens-jwts) - Learn more about[ Tinybird Charts](https://www.tinybird.co/docs/docs/publish/charts) - [ Consume APIs in a Next.js frontend using JWTs](https://www.tinybird.co/docs/docs/publish/api-endpoints/guides/consume-apis-nextjs) - [ Add Tinybird Charts to a Next.js frontend](https://www.tinybird.co/docs/docs/publish/charts/guides/add-charts-to-nextjs) --- URL: https://www.tinybird.co/docs/publish/charts/guides/real-time-dashboard Last update: 2025-01-20T11:43:08.000Z Content: --- title: "Build a real-time dashboard with Tremor & Next.js · Tinybird Docs" theme-color: "#171612" description: "Learn how to build a user-facing web analytics dashboard using Tinybird, Tremor, and Next.js." --- # Build a real-time dashboard¶ In this guide you'll learn how to build a real-time analytics dashboard from scratch, for free, using just 3 tools: Tinybird, Tremor, and Next.js. You'll end up with a dashboard and enough familiarity with Tremor to adjust the frontend & data visualization for your own projects in the future. [GitHub Repository](https://github.com/tinybirdco/demo-user-facing-saas-dashboard-signatures) Imagine you're a [DocuSign](https://www.docusign.com/) competitor. You're building a SaaS to disrupt the document signature space, and as a part of that, you want to give your users a real-time data analytics dashboard so they can monitor how, when, where, and what is happening with their documents in real time. In this tutorial, you'll learn how to: 1. Use Tinybird to capture events (like a document being sent, signed, or received) using the Tinybird Events API. 2. Process them with SQL. 3. Publish the transformations as real-time APIs. 4. Use Tremor components in a Next.js app to build a clean, responsive, real-time dashboard. Here's how it all fits together: <-figure-> ![Diagram showing the data flow from Tinybird --> Next.js --> Tremor](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-real-time-dashboard-data-flow.png&w=3840&q=75) ## Prerequisites¶ To complete this tutorial, you'll need the following: 1. A[ free Tinybird account](https://www.tinybird.co/signup) 2. Node.js >=18 3. Python >=3.8 4. Working familiarity with JavaScript This tutorial uses both the Tinybird web UI and the Tinybird CLI. If you're not familiar with the Tinybird CLI, [read the CLI docs](https://www.tinybird.co/docs/docs/cli/install) or just give it a go! You can copy and paste every code snippet and command in this tutorial - each step is clearly explained. ## 1. Create a Tinybird Workspace¶ Navigate to the Tinybird web UI ( [app.tinybird.co](https://app.tinybird.co/) ) and create an empty Tinybird Workspace (no starter kit) called `signatures_dashboard` in your preferred region. ## 2. Create the folder structure¶ In your terminal, create a folder called `tinybird-signatures-dashboard` . This folder is going to contain all your code. Inside it, create a bunch of folders to keep things organized: ##### Create the folder structure mkdir tinybird-signatures-dashboard && cd tinybird-signatures-dashboard mkdir datagen datagen/utils app tinybird The final structure will be: ##### Folder structure └── tinybird-signatures-dashboard ├── app ├── datagen │ └── utils └── tinybird ## 3. Install the Tinybird CLI¶ The Tinybird CLI is a command-line tool that allows you to interact with Tinybird's API. You will use it to create and manage the data project resources that underpin your real-time dashboard. Run the following commands to prepare the virtual environment, install the CLI, and authenticate (the `-i` flag is for "interactive"): ##### Install the Tinybird CLI python -m venv .venv source .venv/bin/activate pip install tinybird-cli tb auth -i Choose the region that matches your Workspace region (if you're not sure which region you chose, don't worry: In the Tinybird UI, select the same of the Workspace (top left) and it will say the region under your email address). You’ll then be prompted for your [user admin Token](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens) , which lives in the Tinybird UI under "Tokens". Paste it into the CLI and press enter. You're now authenticated to your Workspace from the CLI, and your auth details are saved in a `.tinyb` file in the current working directory. Your user admin Token has full read/write privileges for your Workspace. Don't share it or publish it in your application. You can find more detailed info about Static Tokens [in the Tokens docs](https://www.tinybird.co/docs/docs/get-started/administration/auth-tokens). Ensure that the `.tinyb` file and the `.venv` folder aren't publicly exposed by creating a `.gitignore` file and adding it: ##### Housekeeping: Hide your Token\! touch .gitignore echo ".tinyb" >> .gitignore echo ".venv" >> .gitignore ## 4. Create a mock data stream¶ Now download the [mockDataGenerator.js](https://github.com/tinybirdco/demo-user-facing-saas-dashboard-signatures/blob/main/datagen/mockDataGenerator.js) file and place it in the `datagen` folder. ##### Mock data generator cd datagen curl -O https://raw.githubusercontent.com/tinybirdco/demo-user-facing-saas-dashboard-signatures/refs/heads/main/datagen/mockDataGenerator.js ### What this file does¶ The `mockDataGenerator.js` script generates mock user accounts, with fields like `account_id`, `organization`, `phone_number` , and various certification statuses related to the account's means of identification: ##### Create fake account data const generateAccountPayload = () => { const status = ["active", "inactive", "pending"]; const id = faker.number.int({ min: 10000, max: 99999 }); account_id_list.push(id); return { account_id: id, organization: faker.company.name(), status: status[faker.number.int({ min: 0, max: 2 })], role: faker.person.jobTitle(), certified_SMS: faker.datatype.boolean(), phone: faker.phone.number(), email: faker.internet.email(), person: faker.person.fullName(), certified_email: faker.datatype.boolean(), photo_id_certified: faker.datatype.boolean(), created_on: (faker.date.between({ from: '2020-01-01', to: '2023-12-31' })).toISOString().substring(0, 10), timestamp: Date.now(), } } In addition, the code generates mock data events about the document signature process, with variable status values such as `in_queue`, `signing`, `expired`, `error` , and more: const generateSignaturePayload = (account_id, status, signatureType, signature_id, since, until, created_on) => { return { signature_id, account_id, status, signatureType, since: since.toISOString().substring(0, 10), until: until.toISOString().substring(0, 10), created_on: created_on.toISOString().substring(0, 10), timestamp: Date.now(), uuid: faker.string.uuid(), } } Lastly, the generator creates and sends a final status for the signature using weighted values: const finalStatus = faker.helpers.weightedArrayElement([ { weight: 7.5, value: 'completed' }, { weight: 1, value: 'expired' }, { weight: 0.5, value: 'canceled' }, { weight: 0.5, value: 'declined' }, { weight: 0.5, value: 'error' }, ]) // 7.5/10 chance of being completed, 1/10 chance of being expired, 0.5/10 chance of being canceled, declined or error ### Download the helper functions¶ This script also utilizes a couple of helper functions to access your Tinybird Token and send the data to Tinybird with an HTTP request using the Tinybird Events API. These helper functions are located in the `tinybird.js` file in the repo. [Download that file](https://github.com/tinybirdco/demo-user-facing-saas-dashboard-signatures/blob/main/datagen/utils/tinybird.js) and add it to the `datagen/utils` directory. ##### Helper functions cd datagen/utils curl -O https://raw.githubusercontent.com/tinybirdco/demo-user-facing-saas-dashboard-signatures/refs/heads/main/datagen/utils/tinybird.js The Tinybird Events API is useful for two reasons: 1. It allows for the flexible and efficient ingestion of data, representing various stages of signatures, directly into the Tinybird platform without needing complex streaming infrastructure. 2. It allows you to stream events directly from your application instead of relying on batch ETLs or change data capture which requires the events to first be logged in a transactional database, which can add lag to the data pipeline. ### Install the Faker library¶ Run this command: ##### Install Faker cd datagen npm init --yes npm install @faker-js/faker To run this file and start sending mock data to Tinybird, you need to create a custom script in the `package.json` generated file inside `datagen` folder. Open up that file and add the following to the scripts: ##### Add seed npm script "seed": "node data-project/mockDataGenerator.js" Note that since your code is using ES modules, you'll need to add `"type": "module"` to the `package.json` file to be able to run the script and access the modules. For more information on why, [read this helpful post](https://www.codeconcisely.com/posts/nextjs-esm/). Your package.json should now look something like this: ##### package.json { "name": "datagen", "version": "1.0.0", "description": "", "main": "index.js", "type": "module", "scripts": { "seed": "node ./mockDataGenerator.js" }, "dependencies": { "@faker-js/faker": "^8.4.1" }, "license": "ISC", "author": "" } Okay: You're ready to start sending mock data to Tinybird. Open up a new terminal tab or window in this local project directory, in the `datagen` folder run: ##### Generate mock data\! npm run seed Congratulations! You should see the seed output in your terminal. Let this run in the background so you have some data for the next steps. Return to your original terminal tab or window and move onto the next steps. ### Verify your mock data stream¶ To verify that the data is flowing properly into Tinybird, inspect the Tinybird Data Sources. In the Tinybird UI, navigate to the `signatures` and `accounts` [Data Sources](https://www.tinybird.co/docs/docs/get-data-in/data-sources) to confirm that the data has been received. The latest records should be visible. You can also confirm using the CLI, by running a SQL command on your Data Source: tb sql "select count() from signatures" If you run this a few times, and your mock data stream is still running, you'll see this number increase. Neat. This project uses mock data streams to simulate data generated by a hypothetical document signatures app. If you have your own app that's generating data, you don't need to do this! You can just add the helper functions to your codebase and call them to send data directly from your app to Tinybird. ## 5. Build dashboard metrics with SQL¶ You now have a Data Source: Events streaming into Tinybird, which ensures your real-time dashboard has access to fresh data. The next step is to build real-time metrics using [Tinybird Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes). A Pipe is a set of chained, composable nodes of SQL that process, transform, and enrich data in your Data Sources. Create a new Pipe in the Tinybird UI by selecting the + icon in the left-hand nav bar and selecting "Pipe". Rename your new Pipe `ranking_of_top_organizations_creating_signatures`. Next, time to make your first Node! Remove the placeholder text from the Node, and paste the following SQL in: % SELECT account_id, {% if defined(completed) %} countIf(status = 'completed') total {% else %} count() total {% end %} FROM signatures WHERE fromUnixTimestamp64Milli(timestamp) BETWEEN {{ Date( date_from, '2023-01-01', description="Initial date", required=True, ) }} AND {{ Date( date_to, '2024-01-01', description="End date", required=True ) }} GROUP BY account_id HAVING total > 0 ORDER BY total DESC Key points to understand in this snippet: 1. As well as standard SQL, it uses the Tinybird[ templating language and query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) - you can tell when query params are used, because the `%` symbol appears at the top of the query. This makes the query* dynamic* , so instead of hardcoding the date range, the user can now select a range and have the results refresh in real time. 2. It has an `if defined` statement. In this case, if a boolean tag called `completed` is passed, the Pipe calculates the number of completed signatures. Otherwise, it calculates all signatures. Select "Run" to run and save this Node, then rename `retrieve_signatures` . Below this Node, create a second one. Remove the placeholder text and paste the following SQL in: ##### Second Node SELECT organization, sum(total) AS org_total FROM retrieve_signatures LEFT JOIN accounts ON accounts.account_id = retrieve_signatures.account_id GROUP BY organization ORDER BY org_total DESC LIMIT {{Int8(limit, 10, description="The number of rows accounts to retrieve", required=False)}} Name this node `endpoint` and select "Run" to save it. You now have a 2-Node Pipe that gets the top `` number of organizations by signatures within a date range, either completed or total depending on whether a completed query parameter is passed or not. ## 6. Publish metrics as APIs¶ You're now ready to build a low-latency, high-concurrency REST API Endpoint from your Pipe - with just 2 clicks! Select the "Create API Endpoint" button at top right, then select the `endpoint` Node. You'll be greeted with an API page that contains a usage monitoring chart, parameter documentation, and sample usage. In addition, the API has been secured through an automatically-generated, read-only Token. ### Test your API¶ Copy the HTTP API Endpoint from the "Sample usage" box and paste it directly into a new browser tab to see the response. In the URL, you can manually adjust the `date_from` and `date_to` parameters and see the different responses. You can also adjust the `limit` parameter, which controls how many rows are returned. If you request the data in a JSON format (the default behavior), you'll also receive some metadata about the response, including statistics about the query latency: ##### Example Tinybird API statistics "statistics": { "elapsed": 0.001110996, "rows_read": 4738, "bytes_read": 101594 } You'll notice that the API response in this example took barely 1 millisecond (which is... pretty fast) so your dashboards are in good hands when it comes to being ultra responsive. When building out your own projects in the future, use this metadata [and Tinybird's other tools](https://www.tinybird.co/docs/docs/monitoring/health-checks) to monitor and optimize your dashboard query performance. ### Optional: Pull the Tinybird resources into your local directory¶ At this point, you've created a bunch of Tinybird resources: A Workspace, a Data Source, Pipes, and an API Endpoint. You can pull these resources down locally, so that you can manage this project with Git. In your terminal, start by pulling the Tinybird data project: ##### In the root directory tb pull --auto You'll see a confirmation that 3 resources ( `signatures.datasource`, `accounts.datasource` , and `ranking_of_top_organizations_creating_signatures.pipe` ) were written into two subfolders, `datasources` and `pipes` , which were created by using the `--auto` flag. Move them into the `data-project` directory: ##### Move to /tinybird directory cd tinybird mv datasources pipes tinybird/ As you add additional resources in the Tinybird UI, use the `tb pull –auto` command to pull files from Tinybird. You can then add them to your Git commits and push them to your remote repository. If you create data project resources locally using the CLI, you can push them to the Tinybird server with `tb push` . For more information on managing Tinybird data projects in the CLI, check out [this CLI overview](https://www.tinybird.co/docs/docs/cli/quick-start). ## 7. Create real-time dashboard¶ Now that you have a low-latency API with real-time dashboard metrics, you're ready to create the visualization layer using Next.js and Tremor. These two tools provide a scalable and responsive interface that integrate with Tinybird's APIs to display data dynamically. Plus, they look great. ## Initialize your Next.js project¶ In your terminal, create a folder call `app` and inside it create your Next.js app with this command. In this tutorial you'll use plain JavaScript files and Tailwind CSS: ##### Create a Next app cd app npx create-next-app . --js --tailwind --eslint --src-dir --app --import-alias "@/*" ### Add Tremor to your Next.js app¶ You're going to use Tremor to create a simple bar chart that displays the signature count for each organization. Tremor provides stylish React chart components that you can deploy easily and customize as needed. Install Tremor with the CLI: ##### Install Tremor npx @tremor/cli@latest init Select Next as your framework and allow Tremor to overwrite your existing `tailwind.config.js`. ### Add SWR to your Next.js app¶ You're going to use [SWR](https://swr.vercel.app/) to handle the API Endpoint data and refresh it every 5 seconds. SWR is a great React library to avoid dealing with data caching and revalidating complexity on your own. Plus, you can define what refresh policy you want to follow. Take a look to [its docs](https://swr.vercel.app/docs/revalidation) to know different revalidation strategies. ##### Install SWR npm i swr ### Set up environment variables¶ Next, you need to add your Tinybird host and user admin Token as environment variables so you can run the project locally. Create a `.env.local` file in the root directory ( `/signatures_dashboard` ) and add the following: ##### Set up environment variables NEXT_PUBLIC_TINYBIRD_HOST="YOUR TINYBIRD API HOST" # Your regional API host e.g. https://api.tinybird.co NEXT_PUBLIC_TINYBIRD_TOKEN="YOUR SIGNING TOKEN" # Use your Admin Token as the signing token Replace the Tinybird API hostname or region with the [API region](https://www.tinybird.co/docs/docs/api-reference#regions-and-endpoints) that matches your Workspace. ### Set up your page.js¶ Next.js created a `page.js` as part of the bootstrap process. Open it in your preferred code editor and clear the contents. Paste in the snippets in order from the following sections, understanding what each one does: ### Import UI libraries¶ To build your dashboard component, you will need to import various UI elements and functionalities from the libraries provided at the beginning of your file. Note the use of the `use client;` directive to render the components on the client side. For more details on this, check out the [Next.js docs](https://nextjs.org/docs/app/building-your-application/rendering#network-boundary). ##### Start building index.js "use client"; import { BarChart, Card, Subtitle, Text, Title } from "@tremor/react"; import React from "react"; import useSWR from "swr"; ### Define constants¶ Inside your main component, define the constants required for this specific component: ##### Add environment variables and states // Get your Tinybird host and Token from the .env file const TINYBIRD_HOST = process.env.NEXT_PUBLIC_TINYBIRD_HOST; // The host URL for the Tinybird API const TINYBIRD_TOKEN = process.env.NEXT_PUBLIC_TINYBIRD_TOKEN; // The access Token for authentication with the Tinybird API const REFRESH_INTERVAL_IN_MILLISECONDS = 5000; // five seconds ### Connect your dashboard to your Tinybird API¶ You'll need to write a function to fetch data from Tinybird. Note that for the sake of brevity, this snippet hardcodes the dates and uses the default limit in the Tinybird API. You could set up a Tremor datepicker and/or number input if you wanted to dynamically update the dashboard components from within the UI. ##### Define query parameters and Tinybird fetch function export default function Dashboard() { // Define date range for the query const today = new Date(); // Get today's date const dateFrom = new Date(today.setMonth(today.getMonth() - 1)); // Start the query's dateFrom to the one month before today const dateTo = new Date(today.setMonth(today.getMonth() + 1)); // Set the query's dateTo to be one month from today // Format for passing as a query parameter const dateFromFormatted = dateFrom.toISOString().substring(0, 10); const dateToFormatted = dateTo.toISOString().substring(0, 10); // Constructing the URL for fetching data, including host, token, and date range const endpointUrl = new URL( "/v0/pipes/ranking_of_top_organizations_creating_signatures.json", TINYBIRD_HOST ); endpointUrl.searchParams.set("token", TINYBIRD_TOKEN); endpointUrl.searchParams.set("date_from", dateFromFormatted); endpointUrl.searchParams.set("date_to", dateToFormatted); // Initializes variables for storing data let ranking_of_top_organizations_creating_signatures, latency, errorMessage; try { // Function to fetch data from Tinybird URL and parse JSON response const fetcher = (url) => fetch(url).then((r) => r.json()); // Using SWR hook to handle state and refresh result every five seconds const { data, error } = useSWR(endpointUrl.toString(), fetcher, { refreshInterval: REFRESH_INTERVAL_IN_MILLISECONDS, }); if (error) { errorMessage = error; return; } if (!data) return; if (data?.error) { errorMessage = data.error; return; } ranking_of_top_organizations_creating_signatures = data.data; // Setting the state with the fetched data latency = data.statistics?.elapsed; // Setting the state with the query latency from Tinybird } catch (e) { console.error(e); errorMessage = e; } ### Render the Component¶ Finally, include the rendering code to display the "Ranking of the top organizations creating signatures" in the component's return statement: ##### Render the dashboard component return ( Top Organizations Creating Signatures Ranked from highest to lowest {ranking_of_top_organizations_creating_signatures && ( )} {latency && Latency: {latency * 1000} ms} {errorMessage && (

Oops, something happens: {errorMessage}

Check your console for more information

)}
); } ### View your dashboard\!¶ It's time! Run `npm run dev` and navigate to `http://localhost:3000/` in your browser. You should see something like this: <-figure-> ![Diagram showing the data flow from Tinybird --> Next.js --> Tremor](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-real-time-dashboard-data-flow.png&w=3840&q=75) Congratulations! You've created a real-time dashboard component using Tinybird, Tremor, and Next.js. You'll notice the dashboard is rendering very quickly by taking a peek at the latency number below the component. In this example case, Tinybird returned the data for the dashboard in a little over 40 milliseconds aggregating over about a million rows. Not too bad for a relatively un-optimized query! ### Optional: Expand your dashboard¶ You've got the basics: An active Workspace and Data Source, knowledge of how to build Pipes, and access to the [Tremor docs](https://www.tremor.so/docs/getting-started/installation) . Build out some more Pipes, API Endpoints, and visualizations! <-figure-> ![Dashboard showing more visualizations](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Ftutorial-real-time-dashboard-further-examples.png&w=3840&q=75) You can also spend some time [optimizing your data project](https://www.tinybird.co/docs/docs/work-with-data/query/sql-best-practices) for faster responses and minimal data processing using fine-tuned indexes, [Materialized Views](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/materialized-views) , and more. ## Next steps¶ - Investigate the[ GitHub repository for this project](https://github.com/tinybirdco/demo-user-facing-saas-dashboard-signatures) in more depth. - Understand today's real-time analytics landscape with[ Tinybird's definitive guide](https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide) . - Learn how to implement[ multi-tenant security](https://www.tinybird.co/blog-posts/multi-tenant-saas-options) in your user-facing analytics. --- URL: https://www.tinybird.co/docs/publish/sinks Content: --- title: "Sinks · Tinybird Docs" theme-color: "#171612" description: "Sinks are the destinations for your data. They are the places where you can store your data after it has been transformed." --- # Sinks¶ Tinybird Sinks allow you to export data from your Tinybird Workspace to external systems on a scheduled or on-demand basis. Sinks are built on top of [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) and provide a fully managed way to push data to various destinations. ## Available Sinks¶ Tinybird supports the following Sink destinations: ## Key features¶ - Fully Managed: Sinks require no additional tooling or infrastructure management. - Scheduled or On-Demand: Run exports on a defined schedule using cron expressions or trigger them manually when needed. - Query Parameters: Support for parameterized queries allows flexible data filtering and transformation. - Observability: Monitor Sink operations and data transfer through Service Data Sources. ## Common use cases¶ Sinks enable various data integration scenarios: - Regular data exports to clients or partner systems. - Feeding data lakes and data warehouses. - Real-time data synchronization with external systems. - Event-driven architectures and data pipelines. --- URL: https://www.tinybird.co/docs/publish/sinks/kafka-sink Last update: 2025-01-07T15:48:10.000Z Content: --- title: "Kafka Sink · Tinybird Docs" theme-color: "#171612" description: "Push events to Kafka on a batch-based schedule using Tinybird's fully managed Kafka Sink Connector." --- # Kafka Sink¶ Kafka Sinks are currently in private beta. If you have any feedback or suggestions, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). Tinybird's Kafka Sink allows you to push the results of a query to a Kafka topic. Queries can be executed on a defined schedule or on-demand. Common uses for the Kafka Sink include: - Push events to Kafka as part of an event-driven architecture. - Exporting data to other systems that consume data from Kafka. - Hydrating a data lake or data warehouse with real-time data. Tinybird represents Sinks using the icon. ## Prerequisites¶ To use the Kafka Sink, you need to have a Kafka cluster that Tinybird can reach via the internet, or via private networking for Enterprise customers. ## Configure using the UI¶ ### 1. Create a Pipe and promote it to Sink Pipe¶ In the Tinybird UI, create a Pipe and write the query that produces the result you want to export. In the top right "Create API Endpoint" menu, select "Create Sink". In the modal, choose the destination (Kafka). ### 2. Choose the scheduling options¶ You can configure your Sink to run using a cron expression, so it runs automatically when needed. ### 3. Configure destination topic¶ Enter the Kafka topic where events are going to be pushed. ### 4. Preview and create¶ The final step is to check and confirm that the preview matches what you expect. Congratulations! You've created your first Sink. ## Configure using the CLI¶ ### 1. Create the Kafka Connection¶ Run the `tb connection create kafka` command, and follow the instructions. ### 2. Create Kafka Sink Pipe¶ To create a Sink Pipe, create a regular .pipe and filter the data you want to export to your bucket in the SQL section as in any other Pipe. Then, specify the Pipe as a sink type and add the needed configuration. Your Pipe should have the following structure: NODE node_0 SQL > SELECT * FROM events WHERE time >= toStartOfMinute(now()) - interval 30 minute) TYPE sink EXPORT_SERVICE kafka EXPORT_CONNECTION_NAME "test_kafka" EXPORT_KAFKA_TOPIC "test_kafka_topic" EXPORT_SCHEDULE "*/5 * * * *" **Pipe parameters** For this step, you will need to configure the following [Pipe parameters](https://www.tinybird.co/docs/docs/cli/datafiles/pipe-files#sink-pipe): | Key | Type | Description | | --- | --- | --- | | EXPORT_CONNECTION_NAME | string | Required. The connection name to the destination service. This the connection created in Step 1. | | EXPORT_KAFKA_TOPIC | string | Required. The desired topic for the export data. | | EXPORT_SCHEDULE | string | A crontab expression that sets the frequency of the Sink operation or the @on-demand string. | Once ready, push the datafile to your Workspace using `tb push` (or `tb deploy` if you are using [version control](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work) ) to create the Sink Pipe. ## Scheduling¶ The schedule applied doesn't guarantee that the underlying job executes immediately at the configured time. The job is placed into a job queue when the configured time elapses. It is possible that, if the queue is busy, the job could be delayed and executed after the scheduled time. To reduce the chances of a busy queue affecting your Sink Pipe execution schedule, distribute the jobs over a wider period of time rather than grouping them close together. For Enterprise customers, these settings can be customized. Reach out to your Customer Success team or email us at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). ## Query parameters¶ You can add [query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) to your Sink, the same way you do in API Endpoints or Copy Pipes. For scheduled executions, the default values for the parameters will be used when the Sink runs. ## Iterating a Kafka Sink (Coming soon)¶ Iterating features for Kafka Sinks aren't yet supported in the beta. They are documented here for future reference. Sinks can be iterated using [version control](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work) , similar to other resources in your project. When you create a Branch, resources are cloned from the main Branch. However, there are two considerations for Kafka Sinks to understand: **1. Schedules** When you create a Branch with an existing Kafka Sink, the resource will be cloned into the new Branch. However, **it will not be scheduled** . This prevents Branches from running exports unintentially and consuming resources, as it's common that development Branches don't need to export to external systems. If you want these queries to run in a Branch, you must recreate the Kafka Sink in the new Branch. **2. Connections** Connections aren't cloned when you create a Branch. You need to create a new Kafka connection in the new Branch for the Kafka Sink. ## Observability¶ Kafka Sink operations are logged in the [tinybird.sinks_ops_log](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-sinks-ops-log) Service Data Source. ## Limits & quotas¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Billing¶ Any Processed Data incurred by a Kafka Sink is charged at the standard rate for your account. The Processed Data is already included in your plan, and counts towards your commitment. If you're on an Enterprise plan, view your plan and commitment on the [Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations) tab in the UI. ## Next steps¶ - Get familiar with the[ Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) and see what's going on in your account - Deep dive on Tinybird's[ Pipes concept](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) --- URL: https://www.tinybird.co/docs/publish/sinks/s3-sink Last update: 2025-01-07T15:48:10.000Z Content: --- title: "S3 Sink · Tinybird Docs" theme-color: "#171612" description: "Offload data to S3 on a batch-based schedule using Tinybird's fully managed S3 Sink Connector." --- # S3 Sink¶ Tinybird's S3 Sink allows you to offload data to Amazon S3, either on a pre-defined schedule or on demand. It's good for a variety of different scenarios where Amazon S3 is the common ground, for example: - You're building a platform on top of Tinybird, and need to send data extracts to your clients on a regular basis. - You want to export new records to Amazon S3 every day, so you can load them into Snowflake to run ML recommendation jobs. - You need to share the data you have in Tinybird with other systems in your organization, in bulk. Tinybird represents Sinks using the icon. The Tinybird S3 Sink feature is available for Professional and Enterprise plans (see ["Tinybird plans"](https://www.tinybird.co/docs/docs/get-started/plans) ). If you are on a Build plan but want to access this feature, you can upgrade to Professional directly from your account Settings, or contact us at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). ## About Tinybird's S3 Sink¶ ### How it works¶ Tinybird's S3 Sink is fully managed and requires no additional tooling. You create a new connection to an Amazon S3 bucket, then choose a Pipe whose result gets written to Amazon S3. Tinybird provides you with complete observability and control over the executions, resulting files, their size, data transfer, and more. ### Sink Pipes¶ The Sink connector is built on Tinybird's Sink Pipes, an extension of the [Pipes](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) concept, similar to [Copy Pipes](https://www.tinybird.co/docs/docs/work-with-data/process-and-copy/copy-pipes) . Sink Pipes allow you to capture the result of a Pipe at a moment in time, and store the output. Currently, Amazon S3 is the only service Tinybird's Sink Pipes support. Sink Pipes can be run on a schedule, or executed on demand. ### Supported regions¶ The Tinybird S3 Sink feature only supports exporting data to the following AWS regions: - `us-east-*` - `us-west-*` - `eu-central-*` - `eu-west-*` - `eu-south-*` - `eu-north-*` ### Prerequisites¶ To use the Tinybird S3 Sink feature, you should be familiar with Amazon S3 buckets and have the necessary permissions to set up a new policy and role in AWS. ### Static or dynamic¶ When configuring an S3 sink, you can specify a file template for the exported files. This template can be static or dynamic. If you use a static filename in the file template, it means you're not including any dynamic elements like timestamps or column values. The behavior when using a static filename depends on the write strategy you choose. There are two options: 1. `new` (default): If you use this strategy with a static filename, each execution of the sink adds a new file with a suffix to avoid overwriting existing files. 2. `truncate` : With this strategy, if you use a static filename, each execution replaces the existing file with the same name. You can set the write strategy through CLI configuration. You can also override these settings for individual executions of the sink pipe. This means you can potentially change the file template or write strategy for a specific run. While static filenames are possible, use some form of dynamic naming or partitioning to avoid potential conflicts or data loss, especially when dealing with recurring exports. ### Scheduling considerations¶ The schedule applied to a [Sink Pipe](https://www.tinybird.co/docs/about:blank#sink-pipes) doesn't guarantee that the underlying job executes immediately at the configured time. The job is placed into a job queue when the configured time elapses. It is possible that, if the queue is busy, the job could be delayed and executed after the scheduled time. To reduce the chances of a busy queue affecting your Sink Pipe execution schedule, distribute the jobs over a wider period of time rather than grouping them close together. For Enterprise customers, these settings can be customized. Reach out to your Customer Success team or email us at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co). ### Query parameters¶ You can add [query parameters](https://www.tinybird.co/docs/docs/work-with-data/query/query-parameters) to your Sink Pipes, the same way you do in API Endpoints or Copy Pipes. - For on-demand executions, you can set parameters when you trigger the Sink Pipe to whatever values you wish. - For scheduled executions, the default values for the parameters will be used when the Sink Pipe runs. ## Set up¶ The setup process involves configuring both Tinybird and AWS: 1. Create your Pipe and promote it to Sink Pipe 2. Create the AWS S3 connection 3. Choose the scheduling options 4. Configure destination path and file names 5. Preview and trigger your new Sink Pipe ### Using the UI¶ #### 1. Create a Pipe and promote it to Sink Pipe¶ In the Tinybird UI, create a Pipe and write the query that produces the result you want to export. In the top right "Create API Endpoint" menu, select "Create Sink". In the modal, choose the destination (Amazon S3), and enter the bucket name and region. Follow the step-by-step process on the modal to guide you through the AWS setup steps, or use the docs below. #### 2. Create the AWS S3 Connection¶ ##### 2.1. Create the S3 access policy First, create an IAM policy that grants the IAM role permissions to write to S3. Open the AWS console and navigate to IAM > Policies, then select “Create Policy”: <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fs3-sink-step-1-access-policy.png&w=3840&q=75) On the “Specify Permissions” screen, select “JSON” and paste the policy generated in the UI by clicking on the Copy icon. It'll look like something like this: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject", "s3:PutObjectAcl" ], "Resource": "arn:aws:s3::://*" }, { "Effect": "Allow", "Action": [ "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": "arn:aws:s3:::" } ] } Select “Next”, add a memorable name in the following dialog box (you'll need it later!), and select “Create Policy”. ##### 2.2. Create the IAM role In the AWS console, navigate to IAM > Roles and select “Create Role”: <-figure-> ![image](https://www.tinybird.co/docs/docs/_next/image?url=%2Fdocs%2Fimg%2Fs3-sink-step-3-roles.png&w=3840&q=75) On the “Select trusted entity” dialog box, select the "Custom trust policy” option. Copy the generated JSON and paste in into the Tinybird UI modal. Select "Next". On the “Add permissions” screen, find the policy for S3 access you just created and tick the checkbox to the left of it. Select "Next" and give it a meaningful name and description. Confirm that the trusted entities and permissions granted are the expected ones, and select "Create Role". You'll need the role's ARN (Amazon Resource Name) in order to create the connection in the next step. To save you having to come back and look for it, go to IAM > Roles and browse the search box for the role you just created. Select it to open more role details, including the role's ARN. Copy it down somewhere you can find it easily again. It'll look like something like `arn:aws:iam::111111111111:role/my-awesome-role`. Return to Tinybird's UI and enter the role ARN and Connection name in the modal. The Connection to AWS S3 is now created in Tinybird, and can be reused in multiple Sinks. #### 3. Choose the scheduling options¶ You can configure your Sink to run "on demand" 9meaning you'll need to manually trigger it) or using a cron expression, so it runs automatically when needed. #### 4. Configure destination path and file names¶ Enter the bucket URI where files are generated and the file name template. When generating multiple files, the Sink creates them using this template. You have multiple ways to configure this, see the [File template](https://www.tinybird.co/docs/about:blank#file-template) section. #### 5. Preview and create¶ The final step is to check and confirm that the preview matches what you expect. Congratulations! You've created your first Sink. Trigger it manually using the "Run Sink now" option in the top right menu, or wait for the next scheduled execution. When triggering a Sink Pipe you have the option of overriding several of its settings, like format or compression. Refer to the [Sink Pipes API spec](https://www.tinybird.co/docs/docs/api-reference/sink-pipes-api) for the full list of parameters. Once the Sink Pipe is triggered, it creates a standard Tinybird job that can be followed via the `v0/jobs` API. ### Using the CLI¶ #### 1. Create the AWS S3 Connection¶ To create a connection for an S3 Sink Pipe you need to use a CLI version equal to or higher than 3.5.0. To start: 1. Run the `tb connection create s3_iamrole` command. 2. Copy the suggested policy and replace the two bucket placeholders with your bucket name. 3. Log into your AWS Console. 4. Create a new policy in AWS IAM > Policies using the copied text. In the next step, you'll need the role's ARN (Amazon Resource Name) to create the connection. Go to IAM > Roles and browse the search box for the role you just created. Select it to open more role details, including the role's ARN. Copy it and paste it into the CLI when requested. It'll look like something like `arn:aws:iam::111111111111:role/my-awesome-role`. Then, you will need to type the region where the bucket is located and choose a name to identify your connection within Tinybird. Once you have completed all these inputs, Tinybird will check access to the bucket and create the connection with the connection name you selected. #### 2. Create S3 Sink Pipe¶ To create a Sink Pipe, create a regular .pipe and filter the data you want to export to your bucket in the SQL section as in any other Pipe. Then, specify the Pipe as a sink type and add the needed configuration. Your Pipe should have the following structure: NODE node_0 SQL > SELECT * FROM events WHERE time >= toStartOfMinute(now()) - interval 30 minute) TYPE sink EXPORT_SERVICE s3_iamrole EXPORT_CONNECTION_NAME "test_s3" EXPORT_BUCKET_URI "s3://tinybird-sinks" EXPORT_FILE_TEMPLATE "daily_prices" # Supports partitioning EXPORT_SCHEDULE "*/5 * * * *" # Optional EXPORT_FORMAT "csv" EXPORT_COMPRESSION "gz" # Optional EXPORT_WRITE_STRATEGY "truncate" **Sink Pipe parameters** See the [Sink Pipe parameter docs](https://www.tinybird.co/docs/docs/cli/datafiles/pipe-files#sink-pipe) for more information. For this step, your details will be: | Key | Type | Description | | --- | --- | --- | | EXPORT_CONNECTION_NAME | string | Required. The connection name to the destination service. This the connection created in Step 1. | | EXPORT_BUCKET_URI | string | Required. The path to the destination bucket. Example: `s3://tinybird-export` | | EXPORT_FILE_TEMPLATE | string | Required. The target file name. Can use parameters to dynamically name and partition the files. See File partitioning section below. Example: `daily_prices_{customer_id}` | | EXPORT_FORMAT | string | Optional. The output format of the file. Values: CSV, NDJSON, Parquet. Default value: CSV | | EXPORT_COMPRESSION | string | Optional. Accepted values: `none` , `gz` for gzip, `br` for brotli, `xz` for LZMA, `zst` for zstd. Default: `none` | | EXPORT_SCHEDULE | string | A crontab expression that sets the frequency of the Sink operation or the @on-demand string. | | EXPORT_WRITE_STRATEGY | string | Option. The write mode to define if files with the same name will be replaced (truncate) or not (new). Accepted values: `new` , `truncate` . Default: `new` | When ready, push the datafile to your Workspace using `tb push` or `tb deploy` to create the Sink Pipe. ## File template¶ The export process allows you to partition the result in different files, allowing you to organize your data and get smaller files. The partitioning is defined in the file template and based on the values of columns of the result set. ### Partition by column¶ Add a template variable like `{COLUMN_NAME}` to the filename. For instance, consider the following query schema and result for an export: | customer_id | invoice_id | amount | | --- | --- | --- | | ACME | INV20230608 | 23.45 | | ACME | 12345INV | 12.3 | | GLOBEX | INV-ABC-789 | 35.34 | | OSCORP | INVOICE2023-06-08 | 57 | | ACME | INV-XYZ-98765 | 23.16 | | OSCORP | INV210608-001 | 62.23 | | GLOBEX | 987INV654 | 36.23 | With the given file template `invoice_summary_{customer_id}.csv` you'd get 3 files: `invoice_summary_ACME.csv` | customer_id | invoice_id | amount | | --- | --- | --- | | ACME | INV20230608 | 23.45 | | ACME | 12345INV | 12.3 | | ACME | INV-XYZ-98765 | 23.16 | `invoice_summary_OSCORP.csv` | customer_id | invoice_id | amount | | --- | --- | --- | | OSCORP | INVOICE2023-06-08 | 57 | | OSCORP | INV210608-001 | 62.23 | `invoice_summary_GLOBEX.csv` | customer_id | invoice_id | amount | | --- | --- | --- | | GLOBEX | INV-ABC-789 | 35.34 | | GLOBEX | 987INV654 | 36.23 | ### Values format¶ In the case of DateTime columns, it can be dangerous to partition just by the column. Why? Because you could end up with as many files as seconds, as they're the different values for a DateTime column. In an hour, that's potentially 3600 files. To help partition in a sensible way, you can add a format string to the column name using the following placeholders: | Placeholder | Description | Example | | --- | --- | --- | | %Y | Year | 2023 | | %m | Month as an integer number (01-12) | 06 | | %d | Day of the month, zero-padded (01-31) | 07 | | %H | Hour in 24h format (00-23) | 14 | | %i | Minute (00-59) | 45 | For instance, for a result like this: | timestamp | invoice_id | amount | | --- | --- | --- | | 2023-07-07 09:07:05 | INV20230608 | 23.45 | | 2023-07-07 09:07:01 | 12345INV | 12.3 | | 2023-07-07 09:06:45 | INV-ABC-789 | 35.34 | | 2023-07-07 09:05:35 | INVOICE2023-06-08 | 57 | | 2023-07-06 23:14:05 | INV-XYZ-98765 | 23.16 | | 2023-07-06 23:14:02 | INV210608-001 | 62.23 | | 2023-07-06 23:10:55 | 987INV654 | 36.23 | Note that all 7 events have different times in the column timestamp. Using a file template like `invoices_{timestamp}` would create 7 different files. If you were interested in writing one file per hour, you could use a file template like `invoices_{timestamp, '%Y%m%d-%H'}` . You'd then get only two files for that dataset: `invoices_20230707-09.csv` | timestamp | invoice_id | amount | | --- | --- | --- | | 2023-07-07 09:07:05 | INV20230608 | 23.45 | | 2023-07-07 09:07:01 | 12345INV | 12.3 | | 2023-07-07 09:06:45 | INV-ABC-789 | 35.34 | | 2023-07-07 09:05:35 | INVOICE2023-06-08 | 57 | `invoices_20230706-23.csv` | timestamp | invoice_id | amount | | --- | --- | --- | | 2023-07-06 23:14:05 | INV-XYZ-98765 | 23.16 | | 2023-07-06 23:14:02 | INV210608-001 | 62.23 | | 2023-07-06 23:10:55 | 987INV654 | 36.23 | ### By number of files¶ You also have the option to write the result into X files. Instead of using a column name, use an integer between brackets. Example: `invoice_summary.{8}.csv` This is convenient to reduce the file size of the result, especially when the files are meant to be consumed by other services, like Snowflake where uploading big files is discouraged. The results are written in random order. This means that the final result rows would be written in X files, but you can't count the specific order of the result. There are a maximum of 16 files. ### Combining different partitions¶ It's possible to add more than one partitioning parameter in the file template. This is useful, for instance, when you do a daily dump of data, but want to export one file per hour. Setting the file template as `invoices/dt={timestamp, '%Y-%m-%d'}/H{timestamp, '%H}.csv` would create the following file structure in different days and executions: Invoices ├── dt=2023-07-07 │ └── H23.csv │ └── H22.csv │ └── H21.csv │ └── ... ├── dt=2023-07-06 │ └── H23.csv │ └── H22.csv You can also mix column names and number of files. For instance, setting the file template as `invoices/{customer_id}/dump_{4}.csv` would create the following file structure in different days and executions: Invoices ├── ACME │ └── dump_0.csv │ └── dump_1.csv │ └── dump_2.csv │ └── dump_3.csv ├── OSCORP │ └── dump_0.csv │ └── dump_1.csv │ └── dump_2.csv │ └── dump_3.csv Be careful with excessive partitioning. Take into consideration that the write process will create as many files as combinations of the values of the partitioning columns for a given result set. ## Iterating a Sink Pipe¶ Sink Pipes can be iterated using [version control](https://www.tinybird.co/docs/docs/work-with-data/organize-your-work) just like any other resource in your Data Project. However, you need to understand how connections work in Branches and deployments in order to select the appropriate strategy for your desired changes. **Branches don't execute on creation by default recurrent jobs** , like the scheduled ones for Sink Pipes (they continue executing in Production as usual). To iterate a Sink Pipe, create a new one (or recreate the existing one) with the desired configuration. The new Sink Pipe will start executing from the Branch too (without affecting the unchanged production resource). It will use the new configuration and export the new files into a **branch-specific folder** `my_bucket//branch_/` (the `` is optional). This means you can test the changes without mixing the test resource with your production exports. When you deploy that Branch, the specific folder in the path is automatically ignored and production continues to point to `my_bucket/prefix/` or the new path you changed. Take into account that, for now, while you can change the Sink Pipe configuration using version control, new connections to S3 must be created directly in the Workspace. There is an example about how to create a Pipe Sink to S3 with version control [here](https://github.com/tinybirdco/use-case-examples/tree/main/create_pipe_sink). ## Observability¶ Sink Pipes operations are logged in the [tinybird.sinks_ops_log](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-sinks-ops-log) Service Data Source. Data Transfer incurred by Sink Pipes is tracked in [tinybird.data_transfer](https://www.tinybird.co/docs/docs/monitoring/service-datasources#tinybird-data-transfer) Service Data Source. ## Limits & quotas¶ Check the [limits page](https://www.tinybird.co/docs/docs/get-started/plans/limits) for limits on ingestion, queries, API Endpoints, and more. ## Billing¶ Tinybird uses two metrics for billing Sink Pipes: Processed Data and Data Transfer. A Sink Pipe executes the Pipe's query (Processed Data), and writes the result into a Bucket (Data Transfer). If the resulting files are compressed, Tinybird accounts for the compressed size. ### Processed Data¶ Any Processed Data incurred by a Sink Pipe is charged at the standard rate for your account. The Processed Data is already included in your plan, and counts towards your commitment. If you're on an Enterprise plan, view your plan and commitment on the [Organizations](https://www.tinybird.co/docs/docs/get-started/administration/organizations) tab in the UI. ### Data Transfer¶ Data Transfer depends on your environment. There are two scenarios: - The destination bucket is in the** same** cloud provider and region as your Tinybird Workspace: $0.01 / GB - The destination bucket is in a** different** cloud provider or region as your Tinybird Workspace: $0.10 / GB ### Enterprise customers¶ Tinybird includes 50 GB for free for every Enterprise customer, so you can test the feature and validate your use case. After that, Tinybird can set up a meeting to understand your use case and adjust your contract accordingly, to accommodate the necessary Data Transfer. ## Next steps¶ - Get familiar with the[ Service Data Source](https://www.tinybird.co/docs/docs/monitoring/service-datasources) and see what's going on in your account - Deep dive on Tinybird's[ Pipes concept](https://www.tinybird.co/docs/docs/work-with-data/query/pipes) --- URL: https://www.tinybird.co/docs/sql-reference Last update: 2025-01-08T13:04:07.000Z Content: --- title: "SQL reference · Tinybird Docs" theme-color: "#171612" description: "SQL reference for Tinybird" --- # SQL reference¶ Tinybird supports the following statements, data types, and functions in queries. ## SQL statements¶ The only statement you can use in Tinybird's queries is `SELECT` . The SQL clauses for `SELECT` are fully supported. All other SQL statements are handled by Tinybird's features. ## Data types¶ Tinybird supports a variety of data types to store and process different kinds of information efficiently. Data types define the kind of values that can be stored in a column and determine how those values can be used in queries and operations. See [Data types](https://www.tinybird.co/docs/docs/sql-reference/data-types). The following data types are supported at ingest: - `Int8` , `Int16` , `Int32` , `Int64` , `Int128` , `Int256` - `UInt8` , `UInt16` , `UInt32` , `UInt64` , `UInt128` , `UInt256` - `Float32` , `Float64` - `Decimal` , `Decimal(P, S)` , `Decimal32(S)` , `Decimal64(S)` , `Decimal128(S)` , `Decimal256(S)` - `String` - `FixedString(N)` - `UUID` - `Date` , `Date32` - `DateTime([TZ])` , `DateTime64(P, [TZ])` - `Bool` - `Array(T)` - `Map(K, V)` - `LowCardinality` - `Nullable` - `JSON` If you are ingesting using the NDJSON format and would like to store `Decimal` values containing 15 or more digits, send the values as string instead of numbers to avoid precision issues. In the following example, the first value has a high chance of losing accuracy during ingestion, while the second one is stored correctly: {"decimal_value": 1234567890.123456789} # Last digits might change during ingestion {"decimal_value": "1234567890.123456789"} # Will be stored correctly The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). ## Table engines¶ Table engines are a crucial component of Tinybird's data sources, defining how data is stored, indexed, and accessed. Each table engine is optimized for specific use cases, such as handling large volumes of data, providing high-speed read and write operations, or supporting complex queries and transactions. Tinybird supports a variety of table engines, including: - [ MergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/mergetree) : A general-purpose engine for storing and querying large datasets. - [ AggregatingMergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/aggregatingmergetree) : Suitable for aggregating data and reducing storage volume. - [ ReplacingMergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/replacingmergetree) : Ideal for deduplicating rows and removing duplicate entries. - [ SummingMergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/summingmergetree) : Optimized for summarizing rows and reducing storage volume. - [ CollapsingMergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/collapsingmergetree) : Designed for collapsing rows and deleting old object states in the background. - [ VersionedCollapsingMergeTree](https://www.tinybird.co/docs/docs/sql-reference/engines/versionedcollapsingmergetree) : Allows for collapsing rows and deleting old object states in the background, with support for versioning. - [ Null](https://www.tinybird.co/docs/docs/sql-reference/engines/null) : A special engine for not storing values. Choosing the right table engine for your data source is essential for optimal performance, data integrity, and query efficiency. ## Functions¶ Tinybird provides a comprehensive set of built-in functions to help you transform and analyze your data effectively. These functions can be broadly categorized into: - Aggregate functions: Perform calculations across rows and return a single value, like `count()` , `sum()` , `avg()` . - String functions: Manipulate and analyze text data with operations like substring, concatenation, pattern matching. - Date and time functions: Work with temporal data through date arithmetic, formatting, and time window operations. - Mathematical functions: Handle numerical computations and transformations. - Type conversion functions: Convert between different data types safely. - Array functions: Operate on array columns with filtering, mapping, and reduction operations. - Conditional functions: Implement if-then-else logic and case statements. - Window functions: Perform calculations across a set of rows related to the current row. See [Functions](https://www.tinybird.co/docs/docs/sql-reference/functions). ### Private beta¶ Tinybird supports the following table functions upon request: - `mysql` - `url` ## Settings¶ Tinybird supports the following settings: - `aggregate_functions_null_for_empty` - `join_use_nulls` - `group_by_use_nulls` - `join_algorithm` - `date_time_output_format` You can use the settings by adding `SETTINGS=` to the final node of your Pipe. For example: SELECT id, country_id, name as country_name FROM events e LEFT JOIN country c ON e.country_id = c.id SETTINGS join_use_nulls = 1 --- URL: https://www.tinybird.co/docs/sql-reference/data-types Content: --- title: "Data types · Tinybird Docs" theme-color: "#171612" description: "Data types supported by Tinybird" --- # Data types¶ Data types define how values are stored and processed in a database. They determine what kind of data can be stored in a column (like numbers, text, dates, etc.), how much storage space the data will use, and what operations can be performed on the values. Choosing the right data type is important for both data integrity and query performance. Each column in a table must have a specified data type that matches the kind of data you plan to store. For example, you would use numeric types for calculations, string types for text, and date/time types for temporal data. The following data types are supported at ingest: - `Int8` , `Int16` , `Int32` , `Int64` , `Int128` , `Int256` - `UInt8` , `UInt16` , `UInt32` , `UInt64` , `UInt128` , `UInt256` - `Float32` , `Float64` - `Decimal` , `Decimal(P, S)` , `Decimal32(S)` , `Decimal64(S)` , `Decimal128(S)` , `Decimal256(S)` - `String` - `FixedString(N)` - `UUID` - `Date` , `Date32` - `DateTime([TZ])` , `DateTime64(P, [TZ])` - `Bool` - `Array(T)` - `Map(K, V)` - `LowCardinality` - `Nullable` - `JSON` If you are ingesting using the NDJSON format and would like to store `Decimal` values containing 15 or more digits, send the values as string instead of numbers to avoid precision issues. In the following example, the first value has a high chance of losing accuracy during ingestion, while the second one is stored correctly: {"decimal_value": 1234567890.123456789} # Last digits might change during ingestion {"decimal_value": "1234567890.123456789"} # Will be stored correctly The `JSON` data type is in private beta. If you are interested in using this type, contact Tinybird at [support@tinybird.co](https://www.tinybird.co/docs/mailto:support@tinybird.co) or in the [Community Slack](https://www.tinybird.co/docs/docs/community). --- URL: https://www.tinybird.co/docs/sql-reference/data-types/aggregatefunction Last update: 2025-01-07T15:48:10.000Z Content: --- title: "AggregateFunction · Tinybird Docs" theme-color: "#171612" description: "Documentation for the AggregateFunction data type." --- # AggregateFunction¶ This data type isn't supported at ingest. It is only supported at query time and to create Copy Data Sources or Materialized View Data Sources. A