Implementing test strategies¶
Intermediate
In the versioning your pipes guide you learned how to use versions as part of the usual development workflow of your API Endpoints.
In this guide you’ll learn about different strategies for testing your data project.
Guide preparation¶
You can follow along using the ecommerce_data_project.
Download the project by running:
git clone https://github.com/tinybirdco/ecommerce_data_project
cd ecommerce_data_project
Then, create a new workspace and authenticate using your Auth Token. If you don’t know how to authenticate or use the CLI, check out the CLI Quick Start.
tb auth -i
** List of available regions:
[1] us-east (https://ui.us-east.tinybird.co)
[2] eu (https://ui.tinybird.co)
[0] Cancel
Use region [1]: 2
Copy the admin token from https://ui.tinybird.co/tokens and paste it here :
Finally, push the data project to Tinybird:
tb push --push-deps --fixtures
** Processing ./datasources/events.datasource
** Processing ./datasources/top_products_view.datasource
** Processing ./datasources/products.datasource
** Processing ./datasources/current_events.datasource
** Processing ./pipes/events_current_date_pipe.pipe
** Processing ./pipes/top_product_per_day.pipe
** Processing ./endpoints/top_products.pipe
** Processing ./endpoints/sales.pipe
** Processing ./endpoints/top_products_params.pipe
** Processing ./endpoints/top_products_agg.pipe
** Building dependencies
** Running products_join_by_id
** 'products_join_by_id' created
** Running current_events
** 'current_events' created
** Running events
** 'events' created
** Running products
** 'products' created
** Running top_products_view
** 'top_products_view' created
** Running products_join_by_id_pipe
** Materialized pipe 'products_join_by_id_pipe' using the Data Source 'products_join_by_id'
** 'products_join_by_id_pipe' created
** Running top_product_per_day
** Materialized pipe 'top_product_per_day' using the Data Source 'top_products_view'
** 'top_product_per_day' created
** Running events_current_date_pipe
** Materialized pipe 'events_current_date_pipe' using the Data Source 'current_events'
** 'events_current_date_pipe' created
** Running sales
** => Test endpoint at https://api.tinybird.co/v0/pipes/sales.json
** 'sales' created
** Running top_products_agg
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_agg.json
** 'top_products_agg' created
** Running top_products_params
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products_params.json
** 'top_products_params' created
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products.json
** 'top_products' created
** Pushing fixtures
** Warning: datasources/fixtures/products_join_by_id.ndjson file not found
** Warning: datasources/fixtures/current_events.ndjson file not found
** Checking ./datasources/events.datasource (appending 544.0 b)
** OK
** Checking ./datasources/products.datasource (appending 134.0 b)
** OK
** Warning: datasources/fixtures/top_products_view.ndjson file not found
Regression tests¶
When one of your API Endpoints is integrated in a production environment (a web or mobile application, a dashboard, etc.), you want to make sure that any change in the Pipe doesn’t change the output of the endpoint.
In other words, you want the same version of an API Endpoint to return the same data for the same requests.
The CLI provides you with automatic regression tests any time you try to push the same version of a Pipe. Let’s see it with an example:
Imagine you have this version of the top_products
Pipe:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval 7 day
group by date
And you want to parameterize the date filter to this:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
%
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval {{Int(day, 7)}} day
group by date
The new param day
has a default value of 7
. That means by default, the behaviour of the endpoint should be the same.
To illustrate the example, send a couple of requests to the API Endpoint:
curl https://api.tinybird.co/v0/pipes/top_products.json?token={TOKEN}
Now, try to override the endpoint:
tb push endpoints/top_products.pipe --force
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products__checker.json
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true ... ok
==== Test Metrics ====
------------------------------------------------------------------------
| Test Run | Test Passed | Test Failed | % Test Passed | % Test Failed |
------------------------------------------------------------------------
| 1 | 1 | 0 | 100.0 | 0.0 |
------------------------------------------------------------------------
==== Response Time Metrics ====
----------------------------------------------
| Timing Metric (s) | Current | New |
----------------------------------------------
| min response time | 0.255429 | 0.254966 |
| max response time | 0.255429 | 0.254966 |
| mean response time | 0.255429 | 0.254966 |
| median response time | 0.255429 | 0.254966 |
| p90 response time | 0.255429 | 0.254966 |
| min read bytes | 4.11 KB | 4.11 KB |
| max read bytes | 4.11 KB | 4.11 KB |
| mean read bytes | 4.11 KB | 4.11 KB |
| median read bytes | 4.11 KB | 4.11 KB |
| p90 read bytes | 4.11 KB | 4.11 KB |
----------------------------------------------
** 'top_products' created
** Not pushing fixtures
The CLI tests all combinations of parameters by running at least one request for each combination, and comparing the results of the new and old version of the Pipe.
The regression test will also display the statistics of the new vs old Pipe, so we can detect if the new endpoint has any improvement or degradation in performance.
In case you want to validate the requests that contain one specific parameter, you can filter the requests using --match <PARAMETER_NAME>
.
As a test, change the default date range to the last day:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
%
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
where date > today() - interval {{Int(day, 1)}} day
group by date
And try to override it:
tb push endpoints/top_products.pipe --force
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** => Test endpoint at https://api.tinybird.co/v0/pipes/top_products__checker.json
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true ... FAIL
==== Test FAILED ====
current https://api.tinybird.co/v0/pipes/top_products.json?&pipe_checker=true
new https://api.tinybird.co/v0/pipes/top_products__checker.json?&pipe_checker=true
** check error: 1 != 0 : Number of elements does not match
=====================
Error:
** Failed running endpoints/top_products.pipe: Invalid results, you can bypass checks by running push with the --no-check flag
Since the default period changed, the response changed for the default request, so the Pipe is not overriden. The CLI has prevented a possible regression.
If you are sure the new response is correct, and don’t consider this change a regression, you can force the change through like this:
tb push endpoints/top_products.pipe --force --no-check
** Processing endpoints/top_products.pipe
** Building dependencies
** Running top_products
** 'top_products' created
** Not pushing fixtures
In this case, the regression tests won’t be executed. Of course, you do this at your own risk!
How the regression tests work¶
When we run tb pipe regression-test
to check the changes of Pipe against the existing one, or we run tb push endpoints/ -f
to override the existing one, we are going to run some regression test to validate you are not breaking backward compatibility without realizing.
The regression test funcionality is powered by tinybird.pipe_stats_rt
, one of the service Data Sources that are available to the you out of the box. You can find more information about these service Data Sources here.
In this case, a query is run against tinybird.pipe_stats_rt
to try to gather all the combination of parameters you are using in an API Endpoint. This way we have coverage that all the possible combinations have been validated at least once.
SELECT
## Using this function we extract all the parameters used in each requests
extractURLParameterNames(assumeNotNull(url)) as params,
## According to the option `--sample-by-params`, we run one query for each combination of parameters or more
groupArraySample({sample_by_params if sample_by_params > 0 else 1})(url) as endpoint_url
FROM tinybird.pipe_stats_rt
WHERE
pipe_name = '{pipe_name}'
## According to the option `--match`, we will filter only the requests that contain that parameter
## This is specially useful when you want to validate a new parameter you want to introduce or you have optimize the endpoint in that specific case
{ " AND " + " AND ".join([f"has(params, '{match}')" for match in matches]) if matches and len(matches) > 0 else ''}
GROUP BY params
FORMAT JSON
If you have an endpoint with millions of requests per day, we can fallback to a list:
WITH
## Using this function we extract all the parameters used in each requests
extractURLParameterNames(assumeNotNull(url)) as params
SELECT url
FROM tinybird.pipe_stats_rt
WHERE
pipe_name = '{pipe_name}'
## According to the option `--match`, we will filter only the requests that contain that parameter
## This is specially useful when you want to validate a new parameter you want to introduce or you have optimize the endpoint in that specific case
{ " AND " + " AND ".join([f"has(params, '{match}')" for match in matches]) if matches and len(matches) > 0 else ''}
## According to the option `--limit` by default 100
LIMIT {limit}
FORMAT JSON
Data Quality Tests¶
Data quality tests are meant to cover scenarios that don’t have to happen in your production data. For example, you can check that the data is not empty, or that the data is not duplicated.
Data quality tests are run with the tb test
command. You should include as many YAML files in the tests
directory of your data project.
For instance, given the ecommerce_data_project
let’s say we want to validate that:
There are no negative sales.
There are products sold every day.
We’ll create a tests/default.yaml
as in this Pull Request
- no_negative_sales:
max_bytes_read: null
max_time: null
sql: |
SELECT
date,
sumMerge(total_sales) total_sales
FROM top_products_view
GROUP by date
HAVING total_sales < 0
- products_by_date:
max_bytes_read: null
max_time: null
sql: |
SELECT count(), date
FROM top_products
GROUP BY date
HAVING count() < 0
Then you run it with tb test run -v
tb test run -v
----------------------------------------------------------------------
| file | test | status | elapsed |
----------------------------------------------------------------------
| ./tests/default.yaml | no_negative_sales | Pass | 0.001300466 ms |
| ./tests/default.yaml | products_by_date | Pass | 0.000197256 ms |
----------------------------------------------------------------------
Configure the Continuous Integration tests¶
Contact us at support@tinybird.co if you need to configure Continuous Integration for your data project.
Add this step to your CI workflow:
## We execute the exec_test script that will compare the expected results with the actual result
- name: Running tests
run: ./scripts/exec_test.sh
env:
TB_TOKEN: ${{ env.ADMIN_TOKEN }}
See a working example in this repository
The GitHub action will run a set of tests configured with two files.
Let’s see an example for the top_products
API Endpoint with the date_start
and date_end
parameters.
The top_products.test
file is as follows:
tb --token $TB_TOKEN pipe data top_products --date_start 2020-04-24 --date_end 2020-04-24 --format CSV
It does a call to the top_products
Pipe filtering by one specific day and returns the data in CSV
format.
Now for the top_products.test.result
, it contains the expected result for the previous API Endpoint:
"date","top_10"
"2020-04-24","['sku_0001','sku_0002','sku_0003','sku_0004']"
With this approach, you can have your tests for your data project integrated into your development process. All you have to do anytime you create a new Environment, besides doing the proper changes in your .datasource
and .pipe
files, is update your test files accordingly.
