Iterate with staging and production Workspaces

Easy

A Data Project can be deployed to different Workspaces using different data (production, staging, etc.).

Organize your Data Project by team and project structure, and avoid mixing multiple Data Projects in one Workspace. You can easily share Data Sources between Data Projects.

The most common ways to organize Data Projects are:

  • One Data Project deployed in one Workspace.

  • One Data Project deployed to multiple Workspaces, such as pre-production and production Workspaces. This allows you to test changes in a pre-production environment before deploying them to production.

  • One Data Project contains Data Sources that are shared with other Data Projects in other Workspaces. This allows you to reuse Data Sources across multiple projects.

In this section we are going to explore how to use staging (or pre-production) and production Workspaces and how to integrate them with CI/CD pipelines.

Migrating from prefixes

Before Workspaces were introduced in Tinybird, the CLI provided a --prefix flag that allowed you to create staging, production, or development resources in the same “workspace”.

If you are used to using the --prefix flag, we recommend that you to switch to a multiple Workspaces model for the following reasons:

  • You can create multiple isolated Workspaces for testing, staging or production, depending of your needs.

  • Multiple Workspaces provide better out-of-the-box security, by limiting access to production Workspaces and/or sensitive data.

  • Multiple Workspaces follow modern best practices and are easier to iterate on using the CLI and standard tools like Git.

Convinced? This is how you migrate from prefixes to use Workspaces.

Previously, to deploy production and staging resources to the same Workspace you would run commands like these from the CLI:

Deploying with –prefix
tb push datasources/events.datasource --prefix staging
tb push datasources/events.datasource --prefix pro

These commands would create two resources in the same Workspace: staging__events and pro_events. Then you’d use different data on each Data Source to simulate your production and staging Environments.

How does this work with Workspaces?

  • Create production and staging Workspaces

  • Switch among them from the CLI

  • Push resources to any of them

That way you have fully isolated staging and production Environments.

Create workspaces
tb workspace create staging_acme --user_token <token>
tb workspace create pro_acme --user_token <token>

Once created you can switch between Workspaces and push resources to them.

When working with multiple Workspaces you can check the current authenticated one with tb workspace current or alternatively you can print the current workspace in your shell prompt.

Create workspaces
tb workspace create staging_acme --user_token <token>
tb workspace create pro_acme --user_token <token>

To push resources to the staging Workspace:

use staging
tb workspace use staging_acme
tb push --push-deps

To push resources to the production Workspace:

use pro
tb workspace use pro_acme
tb push --push-deps

Read more to learn how to integrate both Workspaces in a Continuous Integration and Deployment (CI/CD) pipeline.

CI/CD with staging and production Workspaces

In this guide we are going to go through this setup:

  • A staging Workspace with staging data. This Workspace is used to iterate and integrate changes in your Data Project before landing in the production Workspace.

  • A production Workspace with production data. This is the Workspace being used by actual users.

  • A CI/CD workflow consisting of: Run CI over the staging Workspace (optionally over the production one as well), deploy manually to the staging Workspace for integration purposes, deploy on merge on the production Workspace.

In this guide, we will go through setting up a CI/CD workflow with staging and production Workspaces. This setup includes:

  • A staging Workspace with staging data. This Workspace is used to iterate and integrate changes to your Data Project before they are deployed to the production Workspace.

  • A production Workspace with production data. This is the Workspace that is used by actual users.

  • A CI/CD workflow consisting of: Run CI over the staging Workspace (optionally over the production one as well), deploy manually to the staging Workspace for integration purposes, deploy on merge on the production Workspace.

This setup allows you to test changes in a staging environment before deploying them to production. It also provides better security by limiting access to the production Workspace.

../_images/staging-prod.png

Guide preparation

You can follow along using the ecommerce_data_project.

Download the project by running:

Git clone the project
git clone https://github.com/tinybirdco/ecommerce_data_project
cd ecommerce_data_project

Then, create the staging and production Workspaces and authenticate using your user admin token (admin user@domain.com). If you don’t know how to authenticate or use the CLI, check out the CLI Quick Start.

Create workspaces
tb workspace create staging_acme --user_token <token>
tb workspace create pro_acme --user_token <token>

Push the Data Project to the production workspace:

Recreating the project
tb workspace use pro_acme
tb push --push-deps --fixtures

** Processing ./datasources/events.datasource
** Processing ./datasources/top_products_view.datasource
** Processing ./datasources/products.datasource
** Processing ./datasources/current_events.datasource
** Processing ./pipes/events_current_date_pipe.pipe
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/top_products_agg.pipe
** Building dependencies
** Running products_join_by_id
** 'products_join_by_id' created
** Running current_events
** 'current_events' created
** Running events
** 'events' created
** Running products
** 'products' created
** Running top_products_view
** 'top_products_view' created
** Running products_join_by_id_pipe
** Materialized pipe 'products_join_by_id_pipe' using the Data Source 'products_join_by_id'
** 'products_join_by_id_pipe' created
** Running top_product_per_day
** Materialized pipe 'top_product_per_day' using the Data Source 'top_products_view'
** 'top_product_per_day' created
** Running events_current_date_pipe
** Materialized pipe 'events_current_date_pipe' using the Data Source 'current_events'
** 'events_current_date_pipe' created
** Running sales
** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json
** 'sales' created
** Running top_products_agg
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_agg.json
** 'top_products_agg' created
** Running top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
** 'top_products_params' created
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** 'top_products' created
** Pushing fixtures
** Warning: datasources/fixtures/products_join_by_id.ndjson file not found
** Warning: datasources/fixtures/current_events.ndjson file not found
** Checking ./datasources/events.datasource (appending 544.0 b)
**  OK
** Checking ./datasources/products.datasource (appending 134.0 b)
**  OK
** Warning: datasources/fixtures/top_products_view.ndjson file not found

Finally, push the Data Project to the staging workspace:

Recreating the project
tb workspace use staging_acme
tb push --push-deps --fixtures

Once you have the Data Project deployed to both Workspaces make sure you connect them to Git and push the CI/CD pipelines to the Git repository.

Configuring CI/CD

As you may have learned from the working with git guide, we provide CI/CD jobs and pipelines in this Git repository

With those jobs in mind, you can include your staging Workspace into your CI/CD pipeline in many ways. A common pattern consists on running CD in the staging Workspace before landing to the production Workspace.

You can add a new job that can be run manually like this (for the case of GitHub):

Staging CD pipeline name: Tinybird - Staging CD Workflow
 on:
   workflow_dispatch:
 jobs:
   staging_cd:  # deploy changes to staging workspace
     uses: tinybirdco/ci/.github/workflows/cd.yml@v1.1.4
     with:
       tb_deploy: false
       data_project_dir: .
     secrets:
       admin_token: ${{ secrets.ST_ADMIN_TOKEN }}  # set (admin user@domain.com) token from the staging workspace in a new secret
       tb_host: https://api.us-east.aws.tinybird.co

This job will run the deployment pipeline defined using the user admin Auth Token (admin user@domain.com) from the staging Workspace. That way you can test out deployments and data migrations in a non-production Workspace first.

In case of errors in the CD pipeline, or when the staging Workspace is left in an inconsistent state, you can just drop the Workspace and recreate it from the Data Project.

What’s the difference between a staging Workspace and an Environment?

At this point you might be asking that question. When should I use a staging Workspace and when should I use an Environment?

Environments are meant to be ephemeral either to test out changes in your data flow without affecting the main one or to automate tasks such us CI/CD pipelines.

The usual workflow of an Environment is:

  • Create it either manually or through an automated CI/CD pipeline

  • Deploy some changes

  • Test them out

  • Destroy the Environment

Environments are meant to be temporary. You can use them to test changes to your data flow without affecting the main one, or to automate tasks like CI/CD pipelines.

On the other hand Workspaces are meant to be permament. Your production Workspace receives ingestion and requests to API endpoints from your production Data Product.

Staging Workspaces on the other hand, are optional and might cover different purposes:

  • You don’t want to test with your production data, so you have a separate well known subset of data in staging. This is good for anonymization, helps with testing, and can provide deterministic outputs.

  • You want to integrate development versions of the Data Project before landing to the production Workspace.

  • You want to test out some complex deployment or data migration before landing to the production Workspace.

Environments can be used as a quick and easy way to test changes to your data pipelines without affecting your production Workspace. However, it’s important to consider the pros and cons of each approach to choose the best option for your specific needs.