Advanced templates¶
This document shows an advanced usage of our datafile system when using the CLI. It’s also related to how to work with query parameters.
Reusing templates¶
When developing multiple use cases, it’s very common to want to reuse certain parts or steps of an analysis, such as data filters or similar table operations. We’re going to use the following repository for this purpose :
git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git
cd ecommerce_data_project_advanced
ecommerce_data_project/
datasources/
events.datasource
mv_top_per_day.datasource
products.datasource
fixtures/
events.csv
products.csv
endpoints/
sales.pipe
top_products_between_dates.pipe
top_products_last_week.pipe
includes/
only_buy_events.incl
top_products.incl
pipes/
top_product_per_day.pipe
First, let’s take a look at the sales.pipe
API Endpoint and the top_product_per_day.pipe
Pipe that materializes to a ‘mv_top_per_day’ Data Source. They both make use of the same Node: only_buy_events
:
NODE only_buy_events
SQL >
SELECT
toDate(timestamp) date,
product,
joinGet('products_join_by_id', 'color', product) as color,
JSONExtractFloat(json, 'price') as price
FROM events
where action = 'buy'
INCLUDE "./includes/only_buy_events.incl"
NODE endpoint
DESCRIPTION >
return sales for a product with color filter
SQL >
%
select date, sum(price) total_sales
from only_buy_events
where color in {{Array(colors, 'black')}}
group by date
INCLUDE "./includes/only_buy_events.incl"
NODE top_per_day
SQL >
SELECT date,
topKState(10)(product) top_10,
sumState(price) total_sales
from only_buy_events
group by date
TYPE materialized
DATASOURCE mv_top_per_day
ENGINE AggregatingMergeTree
ENGINE_SORTING_KEY date
Include variables¶
Using variables¶
It is possible to include variables in a Node template. The main reason to do that is to have a very similar Node or Nodes that can be reused with slight differences. For instance, in our example, we want to have two API Endpoints to display the 10 top products, each filtered by different date intervals:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
{% if '$DATE_FILTER' = 'last_week' %}
where date > today() - interval 7 day
{% else %}
where date between {{Date(start)}} and {{Date(end)}}
{% end %}
group by date
INCLUDE "./includes/top_products.incl" "DATE_FILTER=last_week"
INCLUDE "./includes/top_products.incl" "DATE_FILTER=between_dates"
As you can see, the variable DATE_FILTER
is sent to the top_products
include, where the variable content is retrieved using the $
prefix with the DATE_FILTER
reference.
It is also possible to assign an array of values to an include variable. To do this, the variable needs to be parsed properly using function templates, as explained in the following section.
Variables vs parameters¶
Please note the difference between variables and parameters. Parameters are indeed variables whose value can be changed by the user through the API Endpoint request parameters. Variables only live in the template and can be set when declaring the INCLUDE
or with the set
template syntax:
{% set my_var = 'default' %}
By default, variables will be interpreted as parameters. In order to prevent variables or private parameters from appearing in the auto-generated API Endpoint documentation, they need to start with _
. Example:
%
SELECT
date
FROM my_table
WHERE a > 10
{% if defined(_private_param) %}
and b = {{Int32(_private_param)}}
{% end %}
This is also needed when using variables in template functions.
Template functions¶
This is the list of the available functions that can be used in a template:
defined
defined(param)
: check if a variable is defined
%
SELECT
date
FROM my_table
{% if defined(param) %}
WHERE ...
{% end %}
column
column(name)
: get the column by its name from a variable
%
{% set var_1 = 'name' %}
SELECT
{{column(var_1)}}
FROM my_table
columns
columns(names)
: get columns by their name from a variable
%
{% set var_1 = 'name,age,address' %}
SELECT
{{columns(var_1)}}
FROM my_table
date_diff_in_seconds
date_diff_in_seconds(date_1, date_2, [date_format])
: gets the abs value of the difference in seconds between two datetimes.
date_format
is optional and defaults to '%Y-%m-%d %H:%M:%S'
, so you can pass DateTimes as YYYY-MM-DD hh:mm:ss
when calling the function:
date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19 19:42:34')
Other formats are supported and need to be explicitly passed, like
date_diff_in_seconds('2022-12-19T18:42:23.521Z', '2022-12-19T18:42:23.531Z', date_format='%Y-%m-%dT%H:%M:%S.%fz')
For questions regarding the format, check python strftime-and-strptime-format-codes. For a deep dive into timestamps, time zones and Tinybird, see this guide on best practices for working with timestamps.
%
SELECT
date, events
{% if date_diff_in_seconds(date_end, date_start, date_format="%Y-%m-%dT%H:%M:%Sz") < 3600 %}
FROM my_table_raw
{% else %}
FROM my_table_hourly_agg
{% end %}
WHERE date BETWEEN
parseDateTimeBestEffort({{String(date_start,'2023-01-11T12:24:04Z')}})
AND
parseDateTimeBestEffort({{String(date_end,'2023-01-11T12:24:05Z')}})
date_diff_in_minutes
Same behavior as date_diff_in_seconds
with returning the difference in minutes.
date_diff_in_hours
Same behavior as date_diff_in_seconds
with returning the difference in hours.
date_diff_in_days
date_diff_in_days(date_1, date_2, [date_format])
: gets the absolute value of the difference in seconds between two dates or datetimes.
%
SELECT
date
FROM my_table
{% if date_diff_in_days(date_end, date_start) < 7 %}
WHERE ...
{% end %}
date_format
is optional and defaults to '%Y-%m-%d
so you can pass DateTimes as YYYY-MM-DD
when calling the function.
As with date_diff_in_seconds
, date_diff_in_minutes
, and date_diff_in_hours
, other date_formats are supported.
split_to_array
split_to_array(arr, default, separator=',')
: splits comma separated values into an array
%
SELECT
arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes
FROM my_table
%
SELECT
{{split_to_array(String(param, 'hi, how are you|fine thanks'), separator='|')}}
enumerate_with_last
enumerate_with_last(arr, default)
: creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. It can be used along with the split_to_array
function.
symbol
symbol(x, quote)
: get the value of a variable
%
SELECT
{% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %}
sum({{symbol(_x)}}) as {{symbol(_x)}}
{% if not _last %}, {% end %}
{% end %}
FROM my_table
sql_and
sql_and(<column>__<op>=<transform_type_function> [, ...] )
: creates a list of “WHERE” clauses, along with “AND” separated filters, that checks if a field (<column>) is or isn’t (<op>) in a list/tuple (<transform_type_function>). The parameter <column> is any column in the table, <op> is in
or not_in
and <transform_type_function> is any of the transform type functions (Array(param, 'Int8')
, String(param)
, etc.). If one parameter is not specified, then the filter is ignored.
%
SELECT *
FROM my_table
WHERE 1
{% if defined(param) or defined(param2_not_in) %}
AND {{sql_and(
param__in=Array(param, 'Int32', defined=False),
param2__not_in=Array(param2_not_in, 'String', defined=False))}}
{% end %}
If this is queried with param=1,2
and param2_not_in=ab,bc,cd
, then it translates to:
SELECT *
FROM my_table
WHERE 1
AND param IN [1,2]
AND param2 NOT IN ['ab','bc','cd']
If this is queried just with param=1,2
, but param2_not_in
is not specified, then it translates to:
SELECT *
FROM my_table
WHERE 1
AND param IN [1,2]
Transform types functions¶
Boolean(x)
DateTime64(x)
DateTime(x)
Date(x)
Float32(x)
Float64(x)
Int8(x)
Int16(x)
Int32(x)
Int64(x)
Int128(x)
Int256(x)
UInt8(x)
UInt16(x)
UInt32(x)
UInt64(x)
UInt128(x)
UInt256(x)
String(x)
Array(x)