Copy Pipes

Copy Pipes are an extension of Tinybird's Pipes. Copy Pipes allow you to capture the result of a Pipe at a moment in time, and write the result into a target Data Source. They can be run on a schedule, or executed on demand.

Copy Pipes are great for use cases like:

  • Event-sourced snapshots, such as change data capture (CDC)
  • Copy data from Tinybird to another location in Tinybird to experiment
  • De-duplicate with snapshots

Copy Pipes should not be confused with Materialized Views. Materialized Views continuously re-evaluate a query as new events are inserted, while Copy Pipes create a single snapshot at a given point in time.

Performance

A Copy Pipe executes the Pipe's query on each run to export the result. This means that the size of the copy operation is tied to the size of your data and the complexity of the Pipe's query. As Copy Pipes can run frequently, we strongly recommend that you follow our best practices for faster SQL to optimize your queries.

In particular, you should pay close attention to the filters used in your queries. Queries in Copy Pipes should always have a time window filter that aligns with the execution schedule, for example, a Copy Pipe that runs once a day typically has a filter that filters for yesterday's data.

A Copy Pipe query should always have a time filter, though the appropriate filter will vary depending on your case. If you need help optimizing your Copy Pipes, reach out in our Slack community or email us at support@tinybird.co.

Additionally, filters should be applied as soon as possible in the query, ideally before complex operations such as joins or aggregations,to minimize the amount of data processed.

Configuring Copy Pipes in the CLI

To create a Copy Pipe from the CLI, you need to create a .pipe file. This file follows the same format as any other .pipe file, including defining Nodes that contain your SQL queries. In this file, define the queries that will filter and transform the data as needed. The final result of all queries should be the result that you want to write into a Data Source.

You must define which Node contains the final result. To do this, include the following parameters at the end of the Node:

TYPE COPY
TARGET_DATASOURCE datasource_name
COPY_SCHEDULE --(optional) a cron expression or @on-demand. If not defined, it would default to @on-demand.

There can be only one copy Node per Pipe, and no other outputs, such as Materialized Views or API Endpoints.

Copy Pipes can either be scheduled, or executed on-demand. This is configured using the COPY_SCHEDULE setting. To schedule a Copy Pipe, configure COPY_SCHEDULE with a cron expression. On-demand Copy Pipes are defined by configuring COPY_SCHEDULE with the value @on-demand.

Note that all schedules are executed in the UTC timezone. If you are configuring a schedule that runs at a specific time, be careful to consider that you will need to convert the desired time from your local timezone into UTC.

Here is an example of a Copy Pipe that is scheduled every hour and writes the results of a query into the sales_hour_copy Data Source:

NODE daily_sales
SQL >
    %
    SELECT toStartOfDay(starting_date) day, country, sum(sales) as total_sales
    FROM teams
    WHERE
    day BETWEEN toStartOfDay(now()) - interval 1 day AND toStartOfDay(now())
    and country = {{ String(country, ‘US’)}}
    GROUP BY day, country

TYPE COPY
TARGET_DATASOURCE sales_hour_copy
COPY_SCHEDULE 0 * * * *

Before pushing the Copy Pipe to your Workspace, make sure that the target Data Source already exists and has a schema that matches the output of the query result. Data Sources will not be created automatically when a Copy Pipe runs.

If you push the target Data Source and the Copy Pipe at the same time, be sure to use the --push-deps option in the CLI.

Executing Copy Pipes in the CLI

Copy Pipes can either be scheduled, or executed on-demand.

When a Copy Pipe is pushed with a schedule, it will automatically be executed as per the schedule you defined. If you need to pause the scheduler, you can run tb pipe copy pause [pipe_name], and use tb pipe copy resume [pipe_name] to resume.

Note that you cannot customise the values of dynamic parameters on a scheduled Copy Pipe. Any parameters will use their default values.

When a Copy Pipe is pushed without a schedule, using the @on-demand directive, you can run tb pipe copy run [pipe_name] to trigger the Copy Pipe as needed. You can pass parameter values to the Copy Pipe by using the param flag, e.g., --param key=value.

You can run tb job ls to see any running jobs, as well as any jobs that have finished during the last 48 hours.

If you remove a Copy Pipe from your Workspace, the schedule will automatically stop and no more copies will be executed.

Configuring Copy Pipes in the UI

To create a Copy Pipe from the UI, follow the process to create a standard Pipe. After writing your queries, select the Node that contains the final result, click the actions button to the left of the Node (see Mark 1 below). Then click Create Copy Job (see Mark 2 below).

image

A dialogue window will open.

To configure the frequency, begin by selecting whether the Copy Pipe should be scheduled via a cron expression, or run on-demand, using the Frequency drop down menu (see Mark 1 below). If cron expression is selected, configure the cron expression in the text box (see Mark 2 below). If you are unfamiliar with cron expressions, tools like crontab guru can help. Click Next to continue (see Mark 3 below).

Note that all schedules are executed in the UTC timezone. If you are configuring a schedule that runs at a specific time, be careful to consider that you will need to convert the desired time from your local timezone into UTC.

image

On-demand

If you selected on-demand as the frequency for the Copy Pipe, you can now customize the values for any parameters of the Pipe (if any). You can find any configurable parameters on the left hand side, with text boxes to configure their values (see Mark 1 below). On the right hand side, you will see a preview of the results (see Mark 2 below). Click Next to continue (see Mark 3 below).

image

Finally, you can now configure whether the Copy Pipe should write results into a new, or existing, Data Source, using the radial buttons (see Mark 1 below). If you choose to use an existing Data Source, you can select which one to use from the drop down list of your Data Sources (see Mark 2 below). Note that only Data Sources with a compatible schema will be shown in the drop down. If you choose to create a new Data Source, you will be guided through creating the new Data Source. Click Next to continue (see Mark 3 below).

image

If you chose to create a new Data Source, you will now be taken through the standard Create Data Source wizard.

image

Scheduled

If you selected cron expression as the frequency for the Copy Pipe, you will be shown a preview of the result. You cannot configure parameter values for a scheduled Copy Pipe. Review the results and click Next to continue (see Mark 1 below).

image

Finally, you can now configure whether the Copy Pipe should write results into a new, or existing, Data Source, using the radial buttons (see Mark 1 below). If you choose to use an existing Data Source, you can select which one to use from the drop down list of your Data Sources (see Mark 2 below). Note that only Data Sources with a compatible schema will be shown in the drop down. If you choose to create a new Data Source, you will be guided through creating the new Data Source. Click Next to continue (see Mark 3 below).

image

If you chose to create a new Data Source, you will now be taken through the standard Create Data Source wizard.

image

Executing Copy Pipes in the UI

To execute a Copy Pipe in the UI, navigate to the Pipe, and click on the Copying button in the top right corner (see Mark 1 below). From the options, select Run copy now (see Mark 2 below).

Note that you cannot customise the values of dynamic parameters on a scheduled Copy Pipe. Any parameters will use their default values.

image

Iterating a Copy Pipe

Copy Pipes can be iterated using version control just like any other resource in your Data Project. However, you need to understand how connections work in Branches and deployments in order to select the appropriate strategy for your desired changes.

Branches don't execute on creation by default recurrent jobs, like the scheduled ones for Copy Pipes (they continue executing in your Release as usual).

To iterate a Copy Pipe, create a new one (or recreate the existing one) with the desired configuration. The new Copy Pipe will start executing from the Branch too (without affecting the unchanged production resource). This means you can test the changes without mixing the test resource with your production exports.

In this example, we explain how to change the Copy Pipe time granularity, adding an extra step for backfill the old data.

Monitoring

Tinybird provides a high level metrics page for each Copy Pipe in the UI, as well as exposing low level observability data via the Service Data Sources.

You can view high level status & statistics about your Copy Pipes in the TInybird UI from the Copy Pipe's details page. To access the details page, navigate to the Pipe, and click on the View Copy Job button in the top right corner (see Mark 1 below).

image

The details page shows summaries of the Copy Pipe's current status and configuration, as well charts showing the performance of previous executions.

image

You can also monitor your Copy Pipes using the datasource_ops_log Service Data Source. This Data Source contains data about all of your operations in Tinybird. Logs that relate to Copy Pipes can be identified by a value of copy in the event_type column.

For example, the following query aggregates the Processed Data from Copy Pipes, for the current month, for a given Data Source name.

SELECT toStartOfMonth(timestamp) month, sum(read_bytes + written_bytes) processed_data
FROM tinybird.datasources_ops_log
WHERE datasource_name = '{YOUR_DATASOURCE_NAME}'
    AND event_type = 'copy'
    AND timestamp >= toStartOfMonth(now())
GROUP BY month

Using this data source, you can also write queries to determine average job duration, amount of errors, error messages, and more.

Billing

Processed Data and Storage are the two metrics that Tinybird uses for billing. A Copy Pipe executes the Pipe's query (Processed Data) and writes the result into a Data Source (Storage).

Any processed data and storage incurred by a Copy Pipe is charged at the standard rate for your billing plan.

See the Monitoring section for guidance on monitoring your usage of Copy Pipes.

Limits

Copy Pipes have the following limits, depending on your billing plan:

Copy Pipe limits

PlanCopy Pipes per WorkspaceExecution timeFrequencyActive jobs (running or queued)
Build120 secondsOnce an hour1
Pro330 secondsUp to every 10 minutes3
Enterprise1050% of the scheduling period, 30 minutes maxUp to every minute6

Build and Professional

The schedule applied to a Copy Pipe does not guarantee that the underlying job executes immediately at the configured time. The job is placed into a job queue when the configured time elapses. It is possible that, if the queue is busy, the job could be delayed and executed some time after the scheduled time.

Enterprise

A maximum execution time of 50% of the scheduling period, 30 minutes max, means that if the Copy Pipe is scheduled to run every minute, the operation can take up to 30 seconds. If it is scheduled to run every 5 minutes, the job can last up to 2m30s, and so forth. This is to prevent overlapping jobs, which can impact results.

The schedule applied to a Copy Pipe does not guarantee that the job executes immediately at the configured time. When the configured time elapses, the job is placed into a job queue. It is possible that, if the queue is busy, the job could be delayed and executed some time after the scheduled time.

To reduce the chances of a busy queue affecting your Copy Pipe execution schedule, we recommend distributing the jobs over a wider period of time rather than grouped close together.

For Enterprise customers, these settings can be customized. Reach out to your Customer Success team directly, or email us at support@tinybird.co.