Continuous Integration and Deployment (CI/CD)¶
Once you connect your data project and Workspace through Git you need to implement a Continuous Integration (CI) and Deployment (CD) workflow.
This page covers how CI and CD work using a walkthrough example. CI/CD pipelines require the use of:
- Datafiles
- CLI commands
- Tinybird Branches
How Continuous Integration works¶
As you expand and iterate on your data projects, you'll want to continuously validate your API Endpoints. In the same way that you write integration and acceptance tests for source code in a software project, you can write automated tests for your API Endpoints to be run on each Pull/Merge Request.
Continuous Integration could help with:
- Linting: Syntax and formatting on datafiles.
- Correctness: Making sure you can push your changes to a Tinybird Workspace.
- Quality: Running fixture tests and/or data quality tests to validate the changes in the Pull Request.
- Regression: Running automatic regression tests to validate Endpoint performance (both in processed bytes and time spent) and data quality.
The following section uses the CI template, GitHub Actions, and the Tinybird CLI to demonstrate how to test your API Endpoints on any new commit to a Pull Request.
Set these optional environment variables to adapt your CI/CD workflow: TB_VERSION_WARNING=0
(don't print CLI version warning message if there's a new version available) and TB_SKIP_REGRESSION=0
(skip regression tests).
Building the CI/CD pipeline¶
This section demonstrates automating CI/CD pipelines using GitHub as the provider with a GitHub Action, but you can use any suitable platform. The examples below use the Tinybird's CI and CD templates in this repository.
You can integrate those templates, but we strongly recommend to use them just as a guide and build your own pipelines inspired on them. That way you can adapt them to suit your Data Project needs and integrate them better with the CI/CD workflow you use for other parts of your toolset.
All steps are based on Tinybird CLI commands so you can fully reproduce the pipeline locally.
Remember to add a new secret with the Workspace admin Token to the repository's Settings to be able to run the needed commands from the LCI.
1. Trigger the CI workflow¶
Run the CI workflow on each commit to a Pull Request, when labelling, or with other kinds of triggers
on: workflow_dispatch: pull_request: branches: - main types: [opened, reopened, labeled, unlabeled, synchronize, closed]
Key points: The CI workflow is triggered when a new Pull Request is opened, reopened, synchronized or labels are updated, and the base branch has to be main
. On closed, the Tinybird branch created for CI is deleted.
2. Configure the CI job¶
ci: # ci using Branches from Workspace 'web_analytics_starter_kit' uses: tinybirdco/ci/.github/workflows/ci.yml@main with: data_project_dir: . secrets: tb_admin_token: ${{ secrets.TB_ADMIN_TOKEN }} # set Workspace admin Token in GitHub secrets tb_host: https://api.tinybird.co
If your data project directory is not in the root of the Git repository, change the data_project_dir
variable.
About secrets:
tb_host
: The url of the region you want to use. By default, this is populated with the region of the Workspace you had ontb init --git
.tb_admin_token
: The Workspace admin Token. This grants all the permissions for a specific Workspace. You can find more information in the Tokens docs or in theTokens
section of the Tinybird UI.
The CI Workflow¶
A potential CI workflow could run the following steps:
- Configuration: set up dependencies and installs the Tinybird CLI to run the required commands.
- Check the data project syntax and the authentication.
- Create a new ephemeral CI Tinybird Branch.
- Push the changes to the Branch.
- Run tests in the Branch.
- Delete the Branch.
0. Workflow configuration¶
defaults: run: working-directory: ${{ inputs.data_project_dir }} if: ${{ github.event.action != 'closed' }} steps: - uses: actions/checkout@master with: fetch-depth: 300 ref: ${{ github.event.pull_request.head.sha }} - uses: actions/setup-python@v5 with: python-version: "3.11" architecture: "x64" cache: 'pip' - name: Validate input run: | [[ "${{ secrets.tb_admin_token }}" ]] || { echo "Go to the tokens section in your Workspace, copy the 'admin token' and set TB_ADMIN_TOKEN as a Secret in your Git repository"; exit 1; } - name: Set environment variables run: | _ENV_FLAGS="${ENV_FLAGS:=--last-partition --wait}" _NORMALIZED_BRANCH_NAME=$(echo $DATA_PROJECT_DIR | rev | cut -d "/" -f 1 | rev | tr '.-' '_') GIT_BRANCH=${GITHUB_HEAD_REF} echo "GIT_BRANCH=$GIT_BRANCH" >> $GITHUB_ENV echo "_ENV_FLAGS=$_ENV_FLAGS" >> $GITHUB_ENV echo "_NORMALIZED_BRANCH_NAME=$_NORMALIZED_BRANCH_NAME" >> $GITHUB_ENV
Key points: This sets the default working-directory
to the data_project_dir
variable, check outs the main
branch to get the head commit, checks the TB_ADMIN_TOKEN
, and installs Python 3.11.
1. Install the Tinybird CLI¶
- name: Install Tinybird CLI run: | if [ -f "requirements.txt" ]; then pip install -r requirements.txt else pip install tinybird-cli fi - name: Tinybird version run: tb --version
The Tinybird CLI is required to interact with your Workspace, create a test Branch, and run the tests. You can use a requirements.txt
file to pin a tinybird-cli version to avoid automatically install the latest version.
Note that you could run this workflow locally by having a local data project and the CLI authenticated to your Tinybird Workspace.
2. Check the data project syntax and the authentication¶
- name: Check all the datafiles syntax run: tb check - name: Check auth run: tb --host ${{ secrets.tb_host }} --token ${{ secrets.tb_admin_token }} auth info
Check the datafiles syntax and the Tinybird authentication.
3. Create a new Tinybird Branch to deploy changes and run the tests¶
A Branch is an isolated copy of the resources in your Workspace at a specific point in time. It's designed to be temporary and disposable so that you can develop and test changes before deploying them to your Workspace.
A Branch is created on each CI job run. In this example, github.event.pull_request.number
is used as a unique identifier for the Tinybird Branch name, so multiple tests can be run in parallel. If a Branch with this name was created before, it is removed and recreated again.
Branches are created using the tb branch create
command. Once the Pull Request with your changes has been merged, the Tinybird Branch is deleted.
- name: Try to delete previous Branch run: | output=$(tb --host ${{ secrets.tb_host }} --token ${{ secrets.tb_admin_token }} branch ls) BRANCH_NAME="tmp_ci_${_NORMALIZED_BRANCH_NAME}_${{ github.event.pull_request.number }}" # Check if the branch name exists in the output if echo "$output" | grep -q "\b$BRANCH_NAME\b"; then tb \ --host ${{ secrets.tb_host }} \ --token ${{ secrets.tb_admin_token }} \ branch rm $BRANCH_NAME \ --yes else echo "Skipping clean up: The Branch '$BRANCH_NAME' does not exist." fi - name: Create new test Branch with data run: | tb \ --host ${{ secrets.tb_host }} \ --token ${{ secrets.tb_admin_token }} \ branch create tmp_ci_${_NORMALIZED_BRANCH_NAME}_${{ github.event.pull_request.number }} \ ${_ENV_FLAGS}
Use the _ENV_FLAGS
variable to configure which data to attach to the Branch. Use the --last-partition --wait
flag to attach the most recently ingested data in the Workspace. This way, you can run the tests using the same data as in production. Alternatively, leave it empty and use fixtures.
4. Deploy changes to the Tinybird Branch¶
You can push the changes in your current Pull Request to the test Branch previously created in two ways:
Automatic deployment
If you connected your data project and Workspace through Git, use tb deploy
. This command gets the datafiles to be deployed using the result of git diff
between the latest commit deployed to the Workspace and the current git branch HEAD commit.
If you did not connect your data project and Workspace through Git, you can use tb push --only-changes --force --yes
. This command gets the datafiles to be deployed using the result of tb diff
between the local changes in the git branch and the remote changes in the Tinybird branch.
--only-changes
: Deploys the changed datafiles and its dependencies.--force
: Overrides any existing Pipe.--yes
: Confirms any alter to a Data Source.- Use
--no-check
optionally to avoid running regression tests when overwriting a Pipe Endpoint.
Custom deploy command
Alternatively, for more complex changes, you can decide how to deploy the changes to the test Branch. This is convenient, for instance, if additionally to deploy the datafiles you want to automate some other data operation, such as a running a copy Pipe, truncate a Data Source, etc.
For this to work, you have to place an executable shell script file in deploy/$DEPLOYMENT_ID/deploy.sh
with the CLI commands to push the changes. $DEPLOYMENT_ID
should be a global variable and unique to the current Pull Request being deployed.
- name: Deploy changes to the test Branch run: | DEPLOY_FILE=./deploy/${DEPLOYMENT_ID}/deploy.sh if [ ! -f "$DEPLOY_FILE" ]; then echo "$DEPLOY_FILE not found, running default tb deploy command" tb deploy fi - name: Custom deployment to the test Branch run: | DEPLOY_FILE=./deploy/${DEPLOYMENT_ID}/deploy.sh if [ -f "$DEPLOY_FILE" ]; then echo "$DEPLOY_FILE found" if ! [ -x "$DEPLOY_FILE" ]; then echo "Error: You do not have permission to execute '$DEPLOY_FILE'. Run:" echo "> chmod +x $DEPLOY_FILE" echo "and commit your changes" exit 1 else $DEPLOY_FILE fi fi
5. Run the tests¶
You can now run your test suite! This is an optional step but recommended if you want to make sure everything works as expected.
Tinybird provides three type of tests out-of-the-box, but you can include at this step any test needed for your deployment pipeline:
- Data fixture tests: These test very concrete business logic based on fixture data (see
datasources/fixtures
). - Data quality tests: These test very concrete data scenarios.
- Regression tests: These test that requests to your API Endpoints are still working as expected. For these tests to work, you must attach production data(
last-partition
) when creating the test Branch.
To learn more about testing Tinybird data projects, refer to the Implementing test strategies docs.
- name: Get regression labels id: regression_labels uses: SamirMarin/get-labels-action@v0 with: github_token: ${{ secrets.GITHUB_TOKEN }} label_key: regression - name: Run Pipe regression tests run: | source .tinyenv echo ${{ steps.regression_labels.outputs.labels }} REGRESSION_LABELS=$(echo "${{ steps.regression_labels.outputs.labels }}" | awk -F, '{for (i=1; i<=NF; i++) if ($i ~ /^--/) print $i}' ORS=',' | sed 's/,$//') echo ${REGRESSION_LABELS} CONFIG_FILE=./tests/regression.yaml BASE_CMD="tb branch regression-tests" LABELS_CMD="$(echo ${REGRESSION_LABELS} | tr , ' ')" if [ -f ${CONFIG_FILE} ]; then echo "Config file found: ${CONFIG_FILE}" ${BASE_CMD} -f ${CONFIG_FILE} --wait ${LABELS_CMD} else echo "Config file not found at '${CONFIG_FILE}', running with default values" ${BASE_CMD} coverage --wait ${LABELS_CMD} fi - name: Append fixtures run: | if [ -f ./scripts/append_fixtures.sh ]; then echo "append_fixtures script found" ./scripts/append_fixtures.sh fi - name: Run fixture tests run: | if [ -f ./scripts/exec_test.sh ]; then ./scripts/exec_test.sh fi - name: Run data quality tests run: | tb test run -v -c 4
You can find the reference append_fixtures
and exec_test
scripts in this repository.
6. Delete the Branch¶
By default, Branches are not deleted until the Branch has been merged into the main Workspace. The following step runs after the tests:
- name: Try to delete previous Branch run: | output=$(tb --host ${{ secrets.tb_host }} --token ${{ secrets.tb_admin_token }} branch ls) BRANCH_NAME="tmp_ci_${_NORMALIZED_BRANCH_NAME}_${{ github.event.pull_request.number }}" # Check if the branch name exists in the output if echo "$output" | grep -q "\b$BRANCH_NAME\b"; then tb \ --host ${{ secrets.tb_host }} \ --token ${{ secrets.tb_admin_token }} \ branch rm $BRANCH_NAME \ --yes else echo "Skipping clean up: The Branch '$BRANCH_NAME' does not exist." fi
You can have up to simultaneous 3 Branches per Workspace at any time. Contact us at support@tinybird.co if you need to increase this limit.
How Continuous Deployment works¶
Once a Pull Request passes CI and has been reviewed and approved by a peer, it's time to merge it to your main Git branch.
Ideally, changes should be automatically deployed to the Workspace; this is called Continuous Deployment
(or Continuous Delivery).
While efficient, this workflow comes with several challenges, most of them related to handling the current state of your Tinybird Workspace. For instance:
- As opposed to when you deploy a stateless application, deployments to a Workspace are incremental, based on the previous resources in the Workspace.
- Similar issues arising from state handling are certain resources or operations that are created or run programmatically: populating operations or permission handling.
- Deployments are performed in the same Workspace; you need to be aware of this and implement policy to avoid collisions from different Pull Requests deploying at the same time, or regressions.
As deployments rely on Git commits to push resources, your Branches must not be out-of-date when merging. Use you Git provider to control branch freshness.
The CD workflow explained below is a guide relevant to many of the most common use cases. However, occasional complex deployments will require additional knowledge and expertise from the team deploying the change.
Continuous Deployment helps with:
- Correctness: Ensuring you can push your changes to a Tinybird Workspace.
- Deployment: Deploying the changes to the Workspace automatically.
- Data Operations: Centralizing data operations required after resources have been pushed to the Workspace.
The following section uses the generated CD template, GitHub Actions, and the Tinybird CLI to explain how to deploy Pull Request changes after merging.
Configure the CD job¶
CD workflow
name: Tinybird - CD Workflow on: workflow_dispatch: push: branches: - main jobs: cd: # deploy changes to Workspace 'web analytics starter kit' uses: tinybirdco/ci/.github/workflows/cd.yml@main with: data_project_dir: . secrets: tb_admin_token: ${{ secrets.TB_ADMIN_TOKEN }} # set Workspace admin Token in GitHub secrets tb_host: https://api.tinybird.co
This workflow deploys on merge to main
to the Workspace defined by the TB_ADMIN_TOKEN
set as secret in the GitHub repository's Settings.
If your data project directory is not in the root of the Git repository, you can change the data_project_dir
variable.
About secrets:
tb_host
: The url of the region you want to use. By default, this is populated with the region of the Workspace you had ontb init --git
.tb_admin_token
: The Workspace admin Token. This grants all the permissions for a specific Workspace. You can find more information in the Tokens docs or in theTokens
section of the Tinybird UI.
Let's review the generated CD workflow:
The CD workflow¶
The CD pipeline should deploy the changes to the main WOrkspace in the same way they were deployed in CI to a Tinybird Branch. We recommend to run CD on merging a PR to keep your Workspace in sync with the git repository main branch HEAD commit.
The CD workflow performs the following steps:
- Configuration
- Install the Tinybird CLI
- Checks authentication
- Pushes changes
- Post-deployment
0. Workflow configuration¶
Same as the CI workflow.
1. Install the Tinybird CLI and check authentication¶
The Tinybird CLI is required to interact with your Workspace.
You can run this workflow locally by having a local data project and the CLI authenticated to your Tinybird Workspace.
This step is equivalent to, but not identical to, the CI workflow step 1.
- name: Install Tinybird CLI run: | if [ -f "requirements.txt" ]; then pip install -r requirements.txt else pip install tinybird-cli fi - name: Tinybird version run: tb --version - name: Check auth run: tb --host ${{ secrets.tb_host }} --token ${{ secrets.tb_admin_token }} auth info
2. Deploy changes¶
Just use the same exact strategy that you used in CI.
If you did automatic deployment through git, then use tb deploy
, otherwise tb push --only-changes --force
. If you did a custom deployment for this specific PR just make sure the same exact script runs in CD.
- name: Deploy changes to the main Workspace run: | DEPLOY_FILE=./deploy/${DEPLOYMENT_ID}/deploy.sh if [ ! -f "$DEPLOY_FILE" ]; then echo "$DEPLOY_FILE not found, running default tb push command" tb deploy fi - name: Custom deployment to the main Workspace run: | DEPLOY_FILE=./deploy/${DEPLOYMENT_ID}/deploy.sh if [ -f "$DEPLOY_FILE" ]; then echo "$DEPLOY_FILE found" if ! [ -x "$DEPLOY_FILE" ]; then echo "Error: You do not have permission to execute '$DEPLOY_FILE'. Run:" echo "> chmod +x $DEPLOY_FILE" echo "and commit your changes" exit 1 else $DEPLOY_FILE fi fi