Jun 15, 2023

A new way to create intermediate Data Sources in Tinybird

We are pleased to announce the general availability release of Copy Pipes, a new way to generate snapshots of SQL results and store them in a Tinybird Data Source.

It’s a well-worn and well-loved path: Publishing a high-concurrency, low-latency data API in Tinybird. The API is a powerful and standardized way to share transformations created in Tinybird Pipes as a fully-fledged and documented data product. You write the SQL, publish the query as a dynamic API, and share it with other engineering stakeholders who are able to build data-intensive applications without requiring expensive and slow direct connections to data sources or needing to support an API development framework on top of those data sources.

But sometimes, when preparing data for other stakeholders, especially those working within Tinybird, you’re not looking to publish an API. Rather, you want to turn your queries and data transformations into an intermediary table that you and others within your organization can use to build new data products of their own. For those who follow ELT practices (e.g., using dbt the data warehouse), this pattern should feel familiar.

Sometimes you don't want to publish an API. Sometimes you want to create an intermediate table you can query over, a la dbt in a data warehouse.

This is what Copy Pipes allow you to do in Tinybird. A few months ago, we released the Copy Pipes API, a new API that enables you to sink the results of a Tinybird Pipe into a new Data Source. We released this API in beta based on feedback and needs from a few power users, but since its release, it has been more widely adopted than we initially expected.

In fact, Data Sources originating from Copy Pipes jobs have accounted for roughly 10% of written data into Tinybird since we launched the beta, which is pretty significant considering the staggering amount of data that Tinybird customers have already ingested this year.

Copy Pipes jobs have accounted for over 10% of written data in Tinybird since we launched the beta a few months ago.

Clearly, it has been quite valuable, and we’ve already seen customers accomplishing a wide variety of use cases with it, including:

  • Consolidation of CDC changes
  • Adding real-time capabilities to their OLTP stack
  • Building data snapshots like exact retail inventories
  • Creating lambda architectures, using Copy Pipes to write the historical data, and combining the results later with the latest real-time data.
Copy Pipes are very useful for things like CDC and consolidated inventory snapshots.

Today, we’re excited to announce the official GA launch of Copy Pipes, elevating it to a first-class feature and out of beta. Now, you can create, trigger, pause, and resume copy jobs from a Tinybird Pipe to a Tinybird Data Source from within both the UI and the Tinybird CLI.

Today, we announce the official GA launch of Copy Pipes. Now, you can create, trigger, pause, and resume copy jobs from a Tinybird Pipe to a Tinybird Data Source from within both the UI and the Tinybird CLI. 

Read on for more information about why Copy Pipes are so important and how to use them to achieve these powerful use cases.

If you’re new to Tinybird, you can sign up for free here. And if you have any questions or feedback about this release or any Tinybird features, please join our Slack community! It’s the best place to ask questions and get answers about how to use Tinybird, what’s new, and when we launch new things!

When to use Copy Pipes

If you’ve used Tinybird for some time, you’ve likely created a Data Source from a Materialized View. Materialized Views allow you to shift complex aggregations from query time to ingest time and sink them into a new Data Source that your endpoints can query. This results in much faster endpoint performance, which subsequently enables real-time scenarios such as real-time personalization, user-facing analytics, anomaly detection, and much more.

But we noticed that Materialized Views were often applied inappropriately; people wanted to create periodic “snapshots” of results from a Pipe and were using Materialized Views. While this can technically work, it’s like using a semi-truck to take your child to school. It gets the job done, but you’ll hold up traffic, break a few rear-view mirrors, and embarrass your kid in the process.

Materialized Views are a powerful way to improve performance by storing intermediate query states in a Data Source, but they are not ideal for certain use cases, like those that require consolidation of events over time

Copy Pipes exist so that you can more easily sink the results of a Pipe into a Data Source for use cases where the materialization process isn't ideal. You can use it for things like tracking state changes over time or change data capture (CDC), deduplicating records, or even just testing new Data Source schema configurations like sorting keys, data types, and column sets.

Deduplication is another common strategy for Copy Pipes.

How to create Copy Pipes in the UI & CLI

Previously, Copy Pipes were only available as an API. Now we’ve wrapped that API in your favorite interfaces. Here’s how to use it in each.

Create a Copy Pipe in the UI

To create a Copy Pipe in the Tinybird UI, you start with a Pipe. In this Pipe, you’ll have SQL to transform data from another Data Source, perhaps in multiple nodes. On the node whose result set you want to copy, click the ... icon to the left of the Node name, and select “Create Copy Job”.

You can turn any Pipe into a Copy Pipe by setting up a Copy Job.

A Copy Pipe can be triggered manually or set to run on a schedule. In the next step, you’ll select when you want it to be triggered. You can choose “On Demand” to only run the copy job when triggered or use a cron expression to set a scheduled interval.

Copy Pipes jobs can be set to trigger on demand or on a schedule.

If you have chosen to create an “On Demand” Copy Pipe, you can set the default values to be applied in the event that the Pipe contains parameters.

If you set up an On Demand Copy Pipe and your Pipe has dynamic parameters, you can specify the default parameters to use when the job is triggered.

Finally, you can select the destination, either a new Data Source or an existing one.

When creating a Copy Pipe job, you can create a new Data Source or sink your Pipe results into an existing Data Source with a matching schema.

If you select “Create a new Data Source”, you’ll go through the familiar steps of creating a new Tinybird Data Source. Tinybird will guess the schema, which you can edit before submitting the copy job.

Creating a new Data Source for a Copy Pipe is just like creating any other Data Source in Tinybird

Once you create the Data Source, you can track the status and critical metrics of the copy jobs by clicking the “View Copy job” button in the top right-hand corner of the Pipe UI.

You can monitor Copy Pipes jobs directly in the Tinybird UI.

If you’ve created an “On Demand” Copy Pipe, you can trigger a copy job in the top right corner by clicking “Run copy now”.

You can manage your Copy Pipes jobs on the Copy Pipe page.

Creating a Copy Pipe in the CLI

When using the CLI, you code your Pipes with .pipe files. As with Materialized Views, Copy Pipes are defined by appending the following code to the end of the .pipe file:

For example, a complete .pipe file set up as a Copy Pipe might look like this:

Once you’ve saved the file, you can push it with:

tb push %pip_filepath%

If you’ve created an “On Demand” Copy Pipe, you can trigger it in the CLI with:

tb pipe copy run %pipe_name%

and you can supply values for parameters with the --param flag.

If you want to monitor your copy jobs in the CLI, use tb job ls.

Additional resources for Copy Pipes

Looking for more information on Copy Pipes? You can watch the screencast below:

Also, check out the documentation, or come find us in the Slack community to ask questions and get answers.

To get started with Tinybird, sign up for a free account here, create a Workspace, and start building. You’ll be publishing APIs - or copying Pipes - in minutes.

Do you like this post?

Related posts

Simplifying event sourcing with scheduled data snapshots in Tinybird
Iterate your real-time data pipelines with Git
More Data, More Apps: Improving data ingestion in Tinybird
Iterating terabyte-sized ClickHouse tables in production
To the limits of SQL... and beyond
Why iterating real-time data pipelines is so hard
Designing and implementing a weather data API
Why we just released a huge upgrade to our VS Code Extension
Automating customer usage alerts with Tinybird and Make
Tinybird connects with Confluent for real-time streaming analytics at scale
Jul 18, 2023

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.
Need more? Contact sales for Enterprise support.