Advanced templates¶
This document shows an advanced usage of our datafile system when using the CLI. It’s also related to how to work with query parameters.
Reusing templates¶
When developing several use cases, it’s very common to find the need of reusing certain parts or steps of the analysis, like same data filters or similar table operations. We’re going to use for this purpose the following repository:
git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git
cd ecommerce_data_project_advanced
ecommerce_data_project/
datasources/
events.datasource
mv_top_per_day.datasource
products.datasource
fixtures/
events.csv
products.csv
endpoints/
sales.pipe
top_products_between_dates.pipe
top_products_last_week.pipe
includes/
only_buy_events.incl
top_products.incl
pipes/
top_product_per_day.pipe
Let’s take a look first to the sales.pipe
endpoint and the top_product_per_day.pipe
pipe that materializes to a ‘mv_top_per_day’ datasource. They both make use of the same node: only_buy_events
:
NODE only_buy_events
SQL >
SELECT
toDate(timestamp) date,
product,
joinGet('products_join_by_id', 'color', product) as color,
JSONExtractFloat(json, 'price') as price
FROM events
where action = 'buy'
INCLUDE "./includes/only_buy_events.incl"
NODE endpoint
DESCRIPTION >
return sales for a product with color filter
SQL >
%
select date, sum(price) total_sales
from only_buy_events
where color in {{Array(colors, 'black')}}
group by date
INCLUDE "./includes/only_buy_events.incl"
NODE top_per_day
SQL >
SELECT date,
topKState(10)(product) top_10,
sumState(price) total_sales
from only_buy_events
group by date
TYPE materialized
DATASOURCE mv_top_per_day
ENGINE AggregatingMergeTree
ENGINE_SORTING_KEY date
Include variables¶
Using variables¶
However, we’ve the possibility of sending variables to an include. The main reason to do that is to have a very similar node or nodes that can be reused, but with slightly differences. For instance, in our example, we want to have two endpoints to display the 10 top products, but filtered by different date intervals:
NODE endpoint
DESCRIPTION >
returns top 10 products for the last week
SQL >
select
date,
topKMerge(10)(top_10) as top_10
from top_product_per_day
{% if '$DATE_FILTER' = 'last_week' %}
where date > today() - interval 7 day
{% else %}
where date between {{Date(start)}} and {{Date(end)}}
{% end %}
group by date
INCLUDE "./includes/top_products.incl" "DATE_FILTER=last_week"
INCLUDE "./includes/top_products.incl" "DATE_FILTER=between_dates"
As we can see, we’re sending the variable DATE_FILTER
to the top_products
include, where we check the variable content using $
as a prefix to retrieve the variable content.
It’s also possible to assign an array of values to an include variable, but then it’s needed to parse it properly from the template using function templates, as it’s explained in the following section.
Variables vs parameters¶
Please note the difference between variables and parameters. Parameters are indeed variables whose value can be changed by the user through the endpoint. Variables only live in the template and can be set when declaring the INCLUDE
or with the set
template util:
{% set my_var = 'default' %}
By default, variables will be interpreted as parameters. In order to avoid variables or private parameters to appear in the public endpoint page of your endpoint, they need to start with _
. Example:
%
SELECT
date
FROM my_table
WHERE a > 10
{% if defined(_private_param) %}
and b = {{Int32(_private_param)}}
{% end %}
This is also needed when using variables in template functions.
Template functions¶
This is the list of the available functions that can be used in a template:
defined
defined(param)
: check if a variable is defined
%
SELECT
date
FROM my_table
{% if defined(param) %}
WHERE ...
{% end %}
column
column(name)
: get the column by its name from a variable
%
{% set var_1 = 'name' %}
SELECT
{{column(var_1)}}
FROM my_table
columns
columns(names)
: get columns by their name from a variable
%
{% set var_1 = 'name,age,address' %}
SELECT
{{columns(var_1)}}
FROM my_table
date_diff_in_seconds
date_diff_in_seconds(date_1, date_2, [date_format])
: gets the abs value of the difference in seconds between two datetimes.
Date_format is optional and defaults to '%Y-%m-%d %H:%M:%S'
so you can pass DateTimes as YYYY-MM-DD hh:mm:ss
when calling the function:
date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19 19:42:34')
Other formats are supported but need to be explicitly passed, like
date_diff_in_seconds('2022-12-19T18:42:23.521Z', '2022-12-19T18:42:23.531Z', date_format='%Y-%m-%dT%H:%M:%S.%fz')
For doubts regarding the format, check python strftime-and-strptime-format-codes.
%
SELECT
date, events
{% if date_diff_in_seconds(date_end, date_start, date_format="%Y-%m-%dT%H:%M:%Sz") < 3600 %}
FROM my_table_raw
{% else %}
FROM my_table_hourly_agg
{% end %}
WHERE date BETWEEN
parseDateTimeBestEffort({{String(date_start,'2023-01-11T12:24:04Z')}})
AND
parseDateTimeBestEffort({{String(date_end,'2023-01-11T12:24:05Z')}})
date_diff_in_minutes
Same behavior as date_diff_in_seconds
returning the difference in minutes.
date_diff_in_hours
Same behavior as date_diff_in_seconds
returning the difference in hours.
date_diff_in_days
date_diff_in_days(date_1, date_2, [date_format])
: gets the abs value of the difference in seconds between two dates or datetimes.
%
SELECT
date
FROM my_table
{% if date_diff_in_days(date_end, date_start) < 7 %}
WHERE ...
{% end %}
Date_format is optional and defaults to '%Y-%m-%d
so you can pass DateTimes as YYYY-MM-DD
when calling the function.
As with date_diff_in_seconds, date_diff_in_minutes, and date_diff_in_hours other date_formats are supported.
split_to_array
split_to_array(arr, default, separator=',')
: splits comma separated values into an array
%
SELECT
arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes
FROM my_table
%
SELECT
{{split_to_array(String(param, 'hi, how are you|fine thanks'), separator='|')}}
enumerate_with_last
enumerate_with_last(arr, default)
: creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. It can be used along with the split_to_array
function.
symbol
symbol(x, quote)
: get the value of a variable
%
SELECT
{% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %}
sum({{symbol(_x)}}) as {{symbol(_x)}}
{% if not _last %}, {% end %}
{% end %}
FROM my_table
sql_and
sql_and(<column>__<op>=<transform_type_function> [, ...] )
: creates a list of “WHERE” clause “AND” separated filters that check if a field (<column>) is/isn’t (<op>) in a list/tuple (<transform_type_function>). The parameter <column> is any column in the table, <op> is in
or not_in
and <transform_type_function> is any of the transform type functions (Array(param, 'Int8')
, String(param)
, etc.). If one parameter is not specified, then the filter is ignored.
%
SELECT *
FROM my_table
WHERE 1
{% if defined(param) or defined(param2_not_in) %}
AND {{sql_and(
param__in=Array(param, 'Int32', defined=False),
param2__not_in=Array(param2_not_in, 'String', defined=False))}}
{% end %}
If this is queried with param=1,2
and param2_not_in=ab,bc,cd
, then it translates to:
SELECT *
FROM my_table
WHERE 1
AND param IN [1,2]
AND param2 NOT IN ['ab','bc','cd']
If this is queried just with param=1,2
, but param2_not_in
is not specified, then it translates to:
SELECT *
FROM my_table
WHERE 1
AND param IN [1,2]
Transform types functions¶
Boolean(x)
DateTime64(x)
DateTime(x)
Date(x)
Float32(x)
Float64(x)
Int8(x)
Int16(x)
Int32(x)
Int64(x)
Int128(x)
Int256(x)
UInt8(x)
UInt16(x)
UInt32(x)
UInt64(x)
UInt128(x)
UInt256(x)
String(x)
Array(x)