How to work with Data Projects¶
Easy
Before diving in how to work with Data Projects, take into account these 5 rules of engagement for data teams:
The Data Project is the source of truth: Use a Git provider to host your Data Project files and control versions.
One Workspace => One Data Project => One repository. You can still host several Data Projects in the same repository and deploy the same Data Project to multiple Workspaces (production, staging, etc.).
Protect your main Environment and use the Git workflow and Pull Requests to make changes.
Run CI/CD pipelines to validate changes before deploying.
Use the Playground or testing Environments with Data Branching for exploration, fine tuning, prototyping, etc.
Anatomy of a Data Project¶
Data Projects are a set of plain-text files (Datafiles) with a simple syntax and a folder structure.
A file with .datasource
extension represents a Tinybird Data Source, and the simplest version contains the schema definition like this:
SCHEMA >
timestamp DateTime,
product String,
user_id String,
action String
json String
A file with .pipe
extension represents a Tinybird Pipe, and the simplest version contains a Node and SQL query like this:
NODE only_buy_events
SQL >
SELECT
toDate(timestamp) date,
product,
JSONExtractFloat(json, 'price') AS price
FROM events
WHERE action = 'buy'
Both are Datafiles, one represents how data is stored and the other how it is processed (and published as an API Endpoint).
Every other Tinybird resource has a Datafile representation. You can check the whole Datafile reference in the CLI documentation.
You can create and scaffold a Tinybird Data Project with the tb init
CLI command. This will create the following folder structure:
tb init
** - /datasources created
** - /datasources/fixtures created
** - /endpoints created
** - /pipes created
** - /tests created
** - /scripts created
** - /deploy created
** '.tinyenv' created
** 'scripts/exec_test.sh' created
Alternatively, to download all resources from an existing Workspace to a local Data Project, use the tb pull --auto --force
command.
datasources: Where you put your
.datasource
files.datasources/fixtures: Place as many CSV or NDJSON files that will be pushed when using the
--fixtures
flag from the CLI. They need to share name with a.datasource
file.endpoints: You can use this folder to create a logical separation between non-Endpoint Pipes and Endpoint Pipes, though it is not necessary. By default, all
.pipe
files will be placed in thepipes/
directory when pulled from Tinybird.pipes: Where you put your
.pipe
files.tests: Where you put data quality and fixture tests.
scripts: Useful scripts for common operations like data migrations, fixtures tests, etc.
deploy: Custom deployment shell scripts.
.tinyenv: Global variables for the Data Project, for instance the
VERSION
.
Migrating from the UI to the Data Project model¶
At Tinybird we put a lot of effort in providing a UI that allows you to streamlines ingest data and create pipelines to prototype and test your solutions.
We keep working on improving the experience to be able to do more things, but once your project is in productionm you might want to migrate from only using the UI to rely more on the CLI and the (Git-synced) Data Project model.
Here are two benefits of combining the CLI and the Data Project model:
Using version control to track changes the same way you do with your code.
Creating scripts and automatizations more easily.
If you’re accustomed to working with Tinybird through the UI, this section provides guidance on migrating from the UI to a Data Project.
Step 1. Install the CLI¶
Follow this quick start guide to install and configure the CLI
Step 2. Sync the remote resources to files¶
When using the UI, you log in to a Workspace and create resources like Data Sources, exploration Pipes, API Endpoints, etc. These resources have a Datafile counterpart you can download to set up a Data Project.
To get the Data Project from a given Workspace:
So, let’s download the files
# Create a directory
mkdir data_project && cd data_project
# Initialize a Tinybird Data Project
tb init
# Authentificate into the region you have the Workspace (US or EU)
tb auth -i
# Download the Datafiles
tb pull --auto
Keep reading to learn more about how to organize your Data Projects, connect them to Git and start working with them as if they were code.
How to organize your Data Projects¶
The same Data Project can be deployed in different Workspaces using different data (production, staging, etc.). However, you should not have more than one Data Project per Workspace.
Organize your Data Project depending on your team and project structures. You can easily share Data Sources from one Data Project to another.
The most common ways to organize Data Projects are:
One Data Project deployed in one Workspace.
One Data Project deployed in multiple Workspace, for instance, pre-production (or staging) and production Workspaces.
One Data Project containing Data Sources that are shared with other Workspaces for building use cases over them. Each Data Project can be deployed to one or more Workspaces.

From prototyping to production-ready¶
The UI is perfect for prototyping, solo-projects, demos, instant feedback on queries, fine tuning, etc. However, once a Workspace is in production, we recommend that you instead make updates directly to the Datafiles and use Git to iterate your Data Project reliably.
Protect the main Environment: This keeps Workspace members from editing Pipes directly from the UI in the main Environment.
Connect the Workspace to Git: The Workspace maintains a reference to the active Git commit. The Datafiles in the Data Project’s Git repository are the single source of truth.
Standardize your Git workflow and Continuous Integration pipeline: Build your CI pipeline and enforce best practices like testing and code reviewing.
Define your release process: Build your CD pipeline depending on how your Data Project is organized. For instance, you can choose to deploy first to a pre-production Workspace, and then to a production Workspace.

We provide some templates for CI/CD pipelines over Data Projects. Read the CI/CD guide to learn more about CI/CD with Tinybird Data Projects.
When to use API vs UI vs CLI¶
You don’t need to use Tinybird’s API to work with Data Projects.
We recommend that you start with the UI until you become familiar with Tinybird’s main concepts. The UI is ideal for testing and debugging queries, receiving instant feedback with data, prototyping new ideas, fine-tuning, and debugging performance, among others. Use the UI for Workspace administration (manging members, permissions, quotas, etc). Even when your main Environment is protected, you can still use the Playground or Environments to test out and prototype within the UI.
Use the CLI to establish the link between the Data Project and Git, and to manage the state of a Workspace. The CLI is the glue between the Workspace and the Data Project in Git, and it is heavily used in CI/CD pipelines.
Iterating Data Projects¶
You might be accustomed to working directly in the UI within your production Workspace, but we no longer recommend this. While the UI experience is great for exploring and prototyping new use cases, or even for deploying hotfixes, using the UI on an unprotected main Environment will cause some problems for you and your team:
The main Environment won’t match the Data Project files, eliminating a single source of truth.
You won’t know who made changes, when, why, and how? With Git and a proper workflow you get this for “free”.
Changes won’t be properly tested. You need to use proper CI/CD or Environments to debug and test changes.
Distributed collaboration and version control will be very difficult.
You won’t have a consistent development workflow or Data Product lifecycle.
You won’t have consistently documented steps necessary to build common use cases.
We are continually improving the experience of iterating Data Projects in Tinybird. Right now, these are the current recommended guidelines:
Playground: Use the Playground for one-off queries on production data and for prototyping new API Endpoints. You can download your Playground queries as a
.pipe
file and then integrate them into your Data Project and Git workflow.Environments: Use Environments in your CI/CD workflow to test changes without affecting your main Environment. You can also create Environments on demand to review new use cases, test new ideas with data, or prototype and assess more complex changes. When creating an Environment, you can choose a Data Branching strategy to copy some data from the main Environment for testing, or to copy only schemas and append data from fixtures later on.
Environments vs Workspaces: Environments are meant to be an ephemeral snapshot of the resources in a specific Workspace, allowing you to test changes made within that Workspace. Workspaces can contain more than one Environment, and are meant for integrating and deploying changes on your production or pre-production data products.
Versioning your Pipes: As a rule of thumb, use the
VERSION
flag for breaking changes to Pipes.Iterating Data Sources: Common strategies to iterate Data Sources and Materialized Views.
Integrate, test and deploy Data Projects¶
Once you have prototyped a new use case, either in the UI or directly working with Datafiles in the Data Project, you want to integrate those changes. This is how you should do so:
Optionally use the Playground or new Environment to prototype.
Pull or update the Datafiles and consolidate them in your Data Project.
Push to a Git branch and create a Pull Request.
Implement tests over your API Endpoints and data.
Merge the branch.