Replacing and deleting data in your Tinybird Data Sources

Intermediate
Data deletion operations are common in transactional databases where your operational data lives, but often due to a data quality process in your operational database you will also need to update or delete your analytical data in Tinybird.

Whether some of the applications ingesting your operational data were buggy or there was a transient error operating in the production database or a change in some regulation, you might need the capability to delete unneeded data or replace outdated data, which affects your analysis in Tinybird.

While realtime analytical databases are optimized for SELECTs and INSERTs, we keep fully supporting other operations needed in data quality management processes. We do that by hiding the complexity of data replication, partitions management or mutations rewriting, so you just have to worry about your data engineering flows and not the internals of realtime analytical databases. This guide will show you how to selectively delete or update data in Tinybird using the Rest API.

Delete data selectively

Data conditional deletion works by firing a POST request to the delete API endpoint providing the name of one of your Data Sources in Tinybird and a {% code-line %}delete_condition{% code-line-end %} parameter, which is an SQL expression filter. This operation is not (yet) supported either in the User Interface or in the CLI.

Imagine you have a Data Source called {% code-line %}events{% code-line-end %} (you can always follow the ingesting data guide to see how to create one) and that you want to remove all the transactions for November 2019. You’d send a POST request like this:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

Once you make the request, you will see that the {% code-line %}POST{% code-line-end %} request to the delete API endpoint is asynchronous. It returns a Job response, indicating an ID for the job, the status of the job, the {% code-line %}delete_condition{% code-line-end %} and some other metadata. Although the delete operation runs asynchronously (hence the job response), the operation waits synchronously for all the mutations to be rewritten and data replicas to be deleted.

You can periodically poll the {% code-line %}job_url{% code-line-end %} with the given ID to check the status of the deletion process. When it’s {% code-line %}done{% code-line-end %} it means the data matching the SQL expression filter has been removed and all your Pipes and API endpoints will continue running with the remaining data in the Data Source.

Truncate a Data Source

Sometimes you just want to delete all data contained in a Data Source. Most of the time starting from zero. You can do so from within the User Interface – from the Data Source modal window options dropdown – and the APIs. Using the API, the truncate endpoint will delete all rows in a Data Source and can be done as follows:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

If you wish, you can also Truncate a Data Source directly from the UI:

Deleting selectively is only available via API, but truncating it to delete all of its data can be done via the UI.

Replace data selectively

The ability to update data is often not the top priority when designing analytical databases but there are always scenarios where you need to update or replace your analytical data. For example, you might have reconciliation processes over your transactions that affect your original data or maybe your ingestion process was simply faulty and ingested inaccurate data for a period of time.

In Tinybird, it's possible to specify a condition under which only a part of the data is replaced during the ingestion process. For instance, let’s say you want to reingest a CSV with the data for November 2019 and update your Data Source accordingly. In order to update the data, you just need to pass the {% code-line %}replace_condition{% code-line-end %} parameter with the {% code-line %}toDate(date) >= '2019-11-01' and toDate(date) <= '2019-11-30'{% code-line-end %} condition.

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

The response you'd get after making the previous API call looks like this:

If you have several connected materialized views then selective replaces are done in cascade. For example, if Data Source A materializes data in a cascade to Data Source B and from there to Data Source C, then when you replace data in Data Source A, Data Sources B and C will automatically be updated accordingly. All three Data Sources need to have compatible partition keys since replaces are done by partition.

The command {% code-line %}tb dependencies --datasource the_data_source --check-for-partial-replace{% code-line-end %} returns the dependencies that would be recalculated, both Data Sources and Materialized Views, and raises an error if any of the dependencies have incompatible partition keys.

As in the case of the selective deletion, selective replacement also runs as an asynchronous request, so we recommend you check the status of the job periodically. You can easily see the status of the job going to the {% code-line %}job_url{% code-line-end %} returned in the previous response

{% tip-box title="ENSURE YOU HAVE THE RIGHT PERMISSIONS" %}Please note that the provided Auth token must have the DATASOURCES:CREATE scope.{% tip-box-end %}

Replace a Data Source completely

To replace a complete Data Source, you'd make an API call similar to the previous one, without providing a {% code-line %}replace_condition{% code-line-end %}:

{% tip-box title="Use a token with the right scope" %}Replace {% code-line %}<your_token>{% code-line-end %} by a token whose scope is {% code-line %}DATASOURCES:CREATE{% code-line-end %} or {% code-line %}ADMIN{% code-line-end %}{% tip-box-end %}

The request above is replacing a Data Source with the data found in a given URL pointing to a CSV file but you can also do it through our User Interface as follows:

Replacing a Data Source completely can also be done through the User Interface

{% tip-box title="SCHEMAS MUST BE THE SAME" %}When replacing data (either selectively or a whole Data Source) you need to ensure your Data Schema is the same as in the original Data Source. Rows not containing the same schema will go to quarantine.{% tip-box-end %}

ON THIS GUIDE