Advanced templates

This document shows an advanced usage of our datafile system when using the CLI. It’s also related to how to work with query parameters.

Reusing templates

When developing multiple use cases, it’s very common to want to reuse certain parts or steps of an analysis, such as data filters or similar table operations. We’re going to use the following repository for this purpose :

Clone demo
git clone https://github.com/tinybirdco/ecommerce_data_project_advanced.git
cd ecommerce_data_project_advanced
File structure
ecommerce_data_project/
    datasources/
        events.datasource
        mv_top_per_day.datasource
        products.datasource
        fixtures/
            events.csv
            products.csv
    endpoints/
        sales.pipe
        top_products_between_dates.pipe
        top_products_last_week.pipe
    includes/
        only_buy_events.incl
        top_products.incl
    pipes/
        top_product_per_day.pipe

First, let’s take a look at the sales.pipe API Endpoint and the top_product_per_day.pipe Pipe that materializes to a ‘mv_top_per_day’ Data Source. They both make use of the same Node: only_buy_events:

includes/only_buy_events.incl
NODE only_buy_events
SQL >
    SELECT
        toDate(timestamp) date,
        product,
        joinGet('products_join_by_id', 'color', product) as color,
        JSONExtractFloat(json, 'price') as price
    FROM events
    where action = 'buy'
endpoints/sales.pipes
INCLUDE "./includes/only_buy_events.incl"

NODE endpoint
DESCRIPTION >
    return sales for a product with color filter
SQL >
    %
    select date, sum(price) total_sales
    from only_buy_events
    where color in {{Array(colors, 'black')}}
    group by date
pipes/top_product_per_day.pipe
INCLUDE "./includes/only_buy_events.incl"

NODE top_per_day
SQL >
  SELECT date,
          topKState(10)(product) top_10,
          sumState(price) total_sales
  from only_buy_events
  group by date

TYPE materialized
DATASOURCE mv_top_per_day
ENGINE AggregatingMergeTree
ENGINE_SORTING_KEY date

When using INCLUDE files to reuse logic in .datasource files, the extension of the file must be .datasource.incl. This is used by CLI commands as tb fmt to identify the type of file and apply the correct formatting.

Include variables

Using variables

It is possible to include variables in a Node template. The main reason to do that is to have a very similar Node or Nodes that can be reused with slight differences. For instance, in our example, we want to have two API Endpoints to display the 10 top products, each filtered by different date intervals:

includes/top_products.incl
NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    select
        date,
        topKMerge(10)(top_10) as top_10
    from top_product_per_day
    {% if '$DATE_FILTER' = 'last_week' %}
        where date > today() - interval 7 day
    {% else %}
        where date between {{Date(start)}} and {{Date(end)}}
    {% end %}
    group by date
endpoints/top_products_last_week.pipe
INCLUDE "./includes/top_products.incl" "DATE_FILTER=last_week"
endpoints/top_products_between_dates.pipe
INCLUDE "./includes/top_products.incl" "DATE_FILTER=between_dates"

As you can see, the variable DATE_FILTER is sent to the top_products include, where the variable content is retrieved using the $ prefix with the DATE_FILTER reference.

It is also possible to assign an array of values to an include variable. To do this, the variable needs to be parsed properly using function templates, as explained in the following section.

Variables vs parameters

Please note the difference between variables and parameters. Parameters are indeed variables whose value can be changed by the user through the API Endpoint request parameters. Variables only live in the template and can be set when declaring the INCLUDE or with the set template syntax:

Using ‘set’ to declare a variable
{% set my_var = 'default' %}

By default, variables will be interpreted as parameters. In order to prevent variables or private parameters from appearing in the auto-generated API Endpoint documentation, they need to start with _. Example:

Define private variables
%
SELECT
  date
FROM my_table
WHERE a > 10
{% if defined(_private_param) %}
  and b = {{Int32(_private_param)}}
{% end %}

This is also needed when using variables in template functions.

Template functions

This is the list of the available functions that can be used in a template:

defined

defined(param): check if a variable is defined

defined function
%
SELECT
  date
FROM my_table
{% if defined(param) %}
  WHERE ...
{% end %}

column

column(name): get the column by its name from a variable

column function
%
{% set var_1 = 'name' %}
SELECT
  {{column(var_1)}}
FROM my_table

columns

columns(names): get columns by their name from a variable

columns function
%
{% set var_1 = 'name,age,address' %}
SELECT
  {{columns(var_1)}}
FROM my_table

date_diff_in_seconds

date_diff_in_seconds(date_1, date_2, [date_format], [backup_date_format], [none_if_error]): gets the abs value of the difference in seconds between two datetimes.

date_format is optional and defaults to '%Y-%m-%d %H:%M:%S', so you can pass DateTimes as YYYY-MM-DD hh:mm:ss when calling the function:

date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19 19:42:34')

Other formats are supported and need to be explicitly passed, like

date_diff_in_seconds('2022-12-19T18:42:23.521Z', '2022-12-19T18:42:23.531Z', date_format='%Y-%m-%dT%H:%M:%S.%fz')

For questions regarding the format, check python strftime-and-strptime-format-codes. For a deep dive into timestamps, time zones and Tinybird, see this guide on best practices for working with timestamps.

date_diff_in_seconds function
%
SELECT
  date, events
{% if date_diff_in_seconds(date_end, date_start, date_format="%Y-%m-%dT%H:%M:%Sz") < 3600 %}
  FROM my_table_raw
{% else %}
  FROM my_table_hourly_agg
{% end %}
  WHERE date BETWEEN
    parseDateTimeBestEffort({{String(date_start,'2023-01-11T12:24:04Z')}})
    AND
    parseDateTimeBestEffort({{String(date_end,'2023-01-11T12:24:05Z')}})

backup_date_format is optional and it allows to specify a secondary format as a backup when the provided date does not match the primary format. This is useful when your default input format is a datetime (2022-12-19 18:42:22) but you receive a date (2022-12-19).

date_diff_in_seconds('2022-12-19 18:42:22', '2022-12-19', backup_date_format='%Y-%m-%d')

none_if_error is optional and defaults to False. If set to True, the function will return None if the provided date does not match any of the provided formats. This is useful to provide an alternate logic in case any of the dates are specified in a different format.

date_diff_in_seconds function using none_if_error
%
SELECT *
FROM employees
{% if date_diff_in_seconds(date_start, date_end, none_if_error=True) is None %}
WHERE starting_date BETWEEN
    now() - interval 4 year AND
    now()
{% else %}
WHERE starting_date BETWEEN
    parseDateTimeBestEffort({{String(date_start, '2023-12-01')}}) AND
    parseDateTimeBestEffort({{String(date_end, '2023-12-02')}})
{% end %}

date_diff_in_minutes

Same behavior as date_diff_in_seconds with returning the difference in minutes.

date_diff_in_hours

Same behavior as date_diff_in_seconds with returning the difference in hours.

date_diff_in_days

date_diff_in_days(date_1, date_2, [date_format]): gets the absolute value of the difference in seconds between two dates or datetimes.

date_diff_in_days function
%
SELECT
  date
FROM my_table
{% if date_diff_in_days(date_end, date_start) < 7 %}
  WHERE ...
{% end %}

date_format is optional and defaults to '%Y-%m-%d so you can pass DateTimes as YYYY-MM-DD when calling the function.

As with date_diff_in_seconds, date_diff_in_minutes, and date_diff_in_hours, other date_formats are supported.

split_to_array

split_to_array(arr, default, separator=','): splits comma separated values into an array

split_to_array function
%
SELECT
  arrayJoin(arrayMap(x -> toInt32(x), {{split_to_array(code, '')}})) as codes
FROM my_table
split_to_array with a custom separator function
%
SELECT
  {{split_to_array(String(param, 'hi, how are you|fine thanks'), separator='|')}}

enumerate_with_last

enumerate_with_last(arr, default): creates an iterable array, returning a boolean value that allows to check if the current element is the last element in the array. It can be used along with the split_to_array function.

symbol

symbol(x, quote): get the value of a variable

enumerate_with_last function
%
SELECT
    {% for _last, _x in enumerate_with_last(split_to_array(attr, 'amount')) %}
        sum({{symbol(_x)}}) as {{symbol(_x)}}
        {% if not _last %}, {% end %}
    {% end %}
FROM my_table

sql_and

sql_and(<column>__<op>=<transform_type_function> [, ...] ): creates a list of “WHERE” clauses, along with “AND” separated filters, that checks if a field (<column>) is or isn’t (<op>) in a list/tuple (<transform_type_function>). The parameter <column> is any column in the table, <op> is in or not_in and <transform_type_function> is any of the transform type functions (Array(param, 'Int8'), String(param), etc.). If one parameter is not specified, then the filter is ignored.

sql_and function
%
SELECT *
FROM my_table
WHERE 1
{% if defined(param) or defined(param2_not_in) %}
    AND {{sql_and(
        param__in=Array(param, 'Int32', defined=False),
        param2__not_in=Array(param2_not_in, 'String', defined=False))}}
{% end %}

If this is queried with param=1,2 and param2_not_in=ab,bc,cd, then it translates to:

sql_and function - generated sql
SELECT *
FROM my_table
WHERE 1
    AND param  IN [1,2]
    AND param2 NOT IN ['ab','bc','cd']

If this is queried just with param=1,2, but param2_not_in is not specified, then it translates to:

sql_and function - generated sql param missing
SELECT *
FROM my_table
WHERE 1
    AND param  IN [1,2]

Transform types functions

  • Boolean(x)

  • DateTime64(x)

  • DateTime(x)

  • Date(x)

  • Float32(x)

  • Float64(x)

  • Int8(x)

  • Int16(x)

  • Int32(x)

  • Int64(x)

  • Int128(x)

  • Int256(x)

  • UInt8(x)

  • UInt16(x)

  • UInt32(x)

  • UInt64(x)

  • UInt128(x)

  • UInt256(x)

  • String(x)

  • Array(x)