Implementing test strategies¶
Intermediate
In the versioning your pipes guide you learned how to use versions as part of the usual development workflow of your API Endpoints.
In this guide you’ll learn about different strategies for testing your data project.
Guide preparation¶
You can follow along using the ecommerce_data_project.
Download the project by running:
git clone https://github.com/tinybirdco/ecommerce_data_project
cd ecommerce_data_project
Then, create a new workspace and authenticate using your Auth Token. If you don’t know how to authenticate or use the CLI, check out the CLI Quick Start.
tb auth -i
** List of available regions:
[1] us-east (https://ui.us-east.tinybird.co)
[2] eu (https://ui.tinybird.co)
[0] Cancel
Use region [1]: 2
Copy the admin token from https://ui.tinybird.co/tokens and paste it here :
Finally, push the data project to Tinybird:
tb push --push-deps --fixtures
** Processing ./datasources/events.datasource
** Processing ./datasources/top_products_view.datasource
** Processing ./datasources/products.datasource
** Processing ./datasources/current_events.datasource
** Processing ./pipes/events_current_date_pipe.pipe
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/top_products_agg.pipe
** Building dependencies
** Running products_join_by_id
** 'products_join_by_id' created
** Running current_events
** 'current_events' created
** Running events
** 'events' created
** Running products
** 'products' created
** Running top_products_view
** 'top_products_view' created
** Running products_join_by_id_pipe
** Materialized pipe 'products_join_by_id_pipe' using the Data Source 'products_join_by_id'
** 'products_join_by_id_pipe' created
** Running top_product_per_day
** Materialized pipe 'top_product_per_day' using the Data Source 'top_products_view'
** 'top_product_per_day' created
** Running events_current_date_pipe
** Materialized pipe 'events_current_date_pipe' using the Data Source 'current_events'
** 'events_current_date_pipe' created
** Running sales
** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json
** 'sales' created
** Running top_products_agg
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_agg.json
** 'top_products_agg' created
** Running top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
** 'top_products_params' created
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** 'top_products' created
** Pushing fixtures
** Warning: datasources/fixtures/products_join_by_id.ndjson file not found
** Warning: datasources/fixtures/current_events.ndjson file not found
** Checking ./datasources/events.datasource (appending 544.0 b)
** OK
** Checking ./datasources/products.datasource (appending 134.0 b)
** OK
** Warning: datasources/fixtures/top_products_view.ndjson file not found
Regression tests¶
When one of your API Endpoints is integrated in a production environment (a web or mobile application, a dashboard, etc.), you want to make sure that any change in the Pipe doesn’t change the output of the endpoint.
In other words, you want the same version of an API Endpoint to return the same data for the same requests.
The CLI provides you with automatic regression tests any time you try to push the same version of a Pipe. Let’s see it with an example:
Imagine you have this version of the top_products
Pipe:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval 7 day
group by date
And you want to parameterize the date filter to this:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
%
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval {{Int(day, 7)}} day
group by date
The new param day
has a default value of 7
. That means by default, the behaviour of the endpoint should be the same.
To illustrate the example, send a couple of requests to the API Endpoint:
curl https://api.tinybird.co/v0/pipes/top_products.json?token={TOKEN}
Now, try to override the endpoint:
tb push endpoints/top_products.pipe --force
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products__checker.json
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true ... ok
==== Test Metrics ====
------------------------------------------------------------------------
| Test Run | Test Passed | Test Failed | % Test Passed | % Test Failed |
------------------------------------------------------------------------
| 1 | 1 | 0 | 100.0 | 0.0 |
------------------------------------------------------------------------
==== Response Time Metrics ====
----------------------------------------------
| Timing Metric (s) | Current | New |
----------------------------------------------
| min response time | 0.255429 | 0.254966 |
| max response time | 0.255429 | 0.254966 |
| mean response time | 0.255429 | 0.254966 |
| median response time | 0.255429 | 0.254966 |
| p90 response time | 0.255429 | 0.254966 |
| min read bytes | 4.11 KB | 4.11 KB |
| max read bytes | 4.11 KB | 4.11 KB |
| mean read bytes | 4.11 KB | 4.11 KB |
| median read bytes | 4.11 KB | 4.11 KB |
| p90 read bytes | 4.11 KB | 4.11 KB |
----------------------------------------------
** 'top_products' created
** Not pushing fixtures
The CLI tests all combinations of parameters by running at least one request for each combination, and comparing the results of the new and old version of the Pipe.
The regression test will also display the statistics of the new vs old Pipe, so we can detect if the new endpoint has any improvement or degradation in performance.
In case you want to validate the requests that contain one specific parameter, you can filter the requests using --match <PARAMETER_NAME>
.
As a test, change the default date range to the last day:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
%
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval {{Int(day, 1)}} day
group by date
And try to override it:
tb push endpoints/top_products.pipe --force
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products__checker.json
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true ... FAIL
==== Test FAILED ====
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true
** check error: 1 != 0 : Number of elements does not match
=====================
Error:
** Failed running endpoints/top_products.pipe: Invalid results, you can bypass checks by running push with the --no-check flag
Since the default period changed, the response changed for the default request, so the Pipe is not overriden. The CLI has prevented a possible regression.
If you are sure the new response is correct, and don’t consider this change a regression, you can force the change through like this:
tb push endpoints/top_products.pipe --force --no-check
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** 'top_products' created
** Not pushing fixtures
In this case, the regression tests won’t be executed. Of course, you do this at your own risk!
How the regression tests work¶
When we run tb pipe regression-test
to check the changes of Pipe against the existing one, or we run tb push endpoints/ -f
to override the existing one, we are going to run some regression test to validate you are not breaking backward compatibility without realizing.
The regression test funcionality is powered by tinybird.pipe_stats_rt
, one of the service Data Sources that are available to the you out of the box. You can find more information about these service Data Sources here.
In this case, a query is run against tinybird.pipe_stats_rt
to try to gather all the combination of parameters you are using in an API Endpoint. This way we have coverage that all the possible combinations have been validated at least once.
SELECT
## Using this function we extract all the parameters used in each requests
extractURLParameterNames(assumeNotNull(url)) as params,
## According to the option `--sample-by-params`, we run one query for each combination of parameters or more
groupArraySample({sample_by_params if sample_by_params > 0 else 1})(url) as endpoint_url
FROM tinybird.pipe_stats_rt
WHERE
pipe_name = '{pipe_name}'
## According to the option `--match`, we will filter only the requests that contain that parameter
## This is specially useful when you want to validate a new parameter you want to introduce or you have optimize the endpoint in that specific case
{ " AND " + " AND ".join([f"has(params, '{match}')" for match in matches]) if matches and len(matches) > 0 else ''}
GROUP BY params
FORMAT JSON
If you have an endpoint with millions of requests per day, we can fallback to a list:
WITH
## Using this function we extract all the parameters used in each requests
extractURLParameterNames(assumeNotNull(url)) as params
SELECT url
FROM tinybird.pipe_stats_rt
WHERE
pipe_name = '{pipe_name}'
## According to the option `--match`, we will filter only the requests that contain that parameter
## This is specially useful when you want to validate a new parameter you want to introduce or you have optimize the endpoint in that specific case
{ " AND " + " AND ".join([f"has(params, '{match}')" for match in matches]) if matches and len(matches) > 0 else ''}
## According to the option `--limit` by default 100
LIMIT {limit}
FORMAT JSON
Continuous integration¶
Regression tests are great to double check your endpoints are correct when overriding them in your Tinybird account.
When you are developing you are most likely looking to validate your endpoints as well. In the same way that you write integration and acceptance tests to your source code, you can write tests for your endpoints to be run on each commit.
The following section will use GitHub Actions and the Tinybird CLI to illustrate how you can test your endpoints on any new commit to a pull request.
Configure the GitHub Action¶
Take a look at the GitHub Action repo from the ecommerce_data project. On each push to a branch it’ll run the following Github Action:
name: CI Workflow
on:
workflow_dispatch:
push:
env:
## We need to define these two secrets in the repository
ADMIN_TOKEN: ${{ secrets.ADMIN_TOKEN }}
USER_TOKEN: ${{ secrets.USER_TOKEN }}
## We define to use EU Region
TB_HOST: https://api.tinybird.co
jobs:
run_validation_tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@master
- uses: actions/setup-python@v3
with:
python-version: "3.8"
architecture: "x64"
## We install the CLI
- name: Install Tinybird CLI
run: pip install tinybird-cli
## We display the version. In this case we are using the latest
- name: Tinybird version
run: tb --version
## We check that all the files have a correct syntax
- name: Check all the files
run: tb check
## We define the WORKSPACE_NAME enviroment variable with the id of the job
- name: Define workspace name
run: echo "WORKSPACE_NAME=tmp_${GITHUB_RUN_ID}" >> $GITHUB_ENV
## We create a temporal workspace where we are going to push the project
- name: Create new workspace
run: |
tb \
--host $TB_HOST \
--token $ADMIN_TOKEN \
workspace create ${{ env.WORKSPACE_NAME }} \
--user_token $USER_TOKEN
## We get the admin token of the new workspace we just have created
- name: Get admin token from new workspace
run: echo "ADMIN_TOKEN=$(curl -s "${TB_HOST}/v0/user/workspaces?token=${USER_TOKEN}" | jq -rc '.workspaces[] | select(.role=="admin" and .name=="${{ env.WORKSPACE_NAME }}").token')" >> $GITHUB_ENV
## We create all the resources (datasources, pipes, endpoints, ...) and we populate materialized views
- name: Create all the resources
run: |
tb \
--host $TB_HOST \
--token ${{ env.ADMIN_TOKEN }} \
push \
--push-deps \
--populate \
--wait \
--fixtures
## We execute the exec_test script that will compare the expected results with the actual result
- name: Running tests
run: ./scripts/exec_test.sh
env:
TB_TOKEN: ${{ env.ADMIN_TOKEN }}
## We remove the temporal workspace even if the tests failed
- name: Cleanup
if: ${{ always() }}
run: |
tb \
--host $TB_HOST \
--token $ADMIN_TOKEN \
workspace delete ${{ env.WORKSPACE_NAME }} \
--user_token $USER_TOKEN \
--yes
Requirements¶
For this GH Action you are going to need configurate two secrets:
TB_HOST
: The url of the region you want to use https://ui.tinybird.co or https://ui.us-east.tinybird.co/ADMIN_TOKEN
: The Admin token is the Auth Token that gives all the permissions for a specific workspace. You can find more information in hereUSER_TOKEN
: The User token is the Auth Token that identifies the user and allows to create/delete workspaces. You can find more information in here
Our recommendation is to create a service account for this purpose in order to have an isolated account
Configure the continuous integration tests¶
The GitHub action will run a set of tests configured with two files.
Let’s see an example for the top_products
API Endpoint with the date_start
and date_end
parameters.
The top_products.test
file is as follows:
tb --token $TB_TOKEN pipe data top_products --date_start 2020-04-24 --date_end 2020-04-24 --format CSV
It does a call to the top_products
Pipe filtering by one specific day and returns the data in CSV
format.
Now for the top_products.test.result
, it contains the expected result for the previous API Endpoint:
"date","top_10"
"2020-04-24","['sku_0001','sku_0002','sku_0003','sku_0004']"
With this approach, you can have your tests for your data project integrated into your development process. All you have to do anytime you create a new branch, besides doing the proper changes in your .datasource
and .pipe
files, is update your test files accordingly.
