Using Dynamic Parameters for Changing Aggregation Types

Advanced

These days, it is not uncommon to have datasets with a per-second resolution for a few years' worth of data. While this poses some demands at the storage and the query layers, it also presents some challenges for the data consumers. Aggregating this data dynamically and on-the-fly is key for resolving the specific demands at scale.

In this guide, we will focus on creating an API endpoint which aggregates the data in different time-frames depending on the amount of data, so only rows needed are sent to the frontend, thereby gaining performance and speed.

When preparing the API endpoint we will focus on three things:

  • Keep the API endpoint interface extremely simple: "I want events data from this start date to this end date".
  • Make the API endpoint return enough data with adequate resolution for the selected date range.
  • Don't add logic to the frontend to do the aggregation of the returned data: you simply request the desired date range knowing that you will receive an amount of data that won't swamp your rendering pipeline.

The data

This example uses a Data Source called events with 100M rows, for a ~5-year timespan.

Building the Pipe

Here, you'll learn how to use our templating language to add more logic to your API endpoints.

Learning more about our templating language:

In addition to the main section of our docs on Tinybird's templating language to pass query parameters to endpoints, you can also learn about variables and the functions that are available within templates in this Advanced Templates section.

The endpoint created in the intro guides returns sales per day of an ecommerce store, and takes a start and end date. The problem of having a fixed period of one day to aggregate data is that the amount of data transferred will vary a lot, depending on the selected dates, as well as the work that the client will have to do on the frontend to render that data. Also, if the start and end dates are close, the grouping window will be too big and the backend won't return data with a high enough level of detail.

Fortunately, we can add conditional logic when defining API endpoints in Tinybird and, depending on the range of dates selected, the endpoint will return data grouped by one of these periods:

  • Weekly
  • Daily
  • Every 4 hours
  • Every 1 hour
  • Every 15 minutes

To do it, we will create a new Pipe named ecommerce_events_dynamic. In the first node, we will add the following code to 1) keep only the buy events, and 2) cast the price column to Float32 changing its name to buy_events_with_price

And then, we will add another transformation node containing the following query (don't worry, we will explain the query in detail a few lines below). We will name this node buy_events_dynamic_agg

The query above makes use of our templating language so:

  • We define a couple of Date parameters and add some default values to be able to test our Pipe while we build it ({{Date(start_date, '2018-01-01')}} and {{Date(end_date, '2018-12-31')}}).
  • We compute the number of days in the interval defined by start_date and end_date: days_interval = day_diff(...).
  • We use the days_interval variable to decide the best granularity for our data.

We know it could look a bit complicated at a first glance, but you will see it becomes easy really fast!

Publishing your API endpoint

Selecting the buy_events_dynamic_agg node when clicking in the publish button will make your API accessible right away. Once it's published you can directly test it using the snippets available in the API endpoint page or using our REST API. Just change the start_date and the end_date parameters to see how the aggregation window changes dynamically.

Use a token with the right scope:

Replace <your_token> by a token whose scope is READ or higher.

As you see, with Tinybird you can dynamically return different responses from your analytics API endpoints depending on the request's parameters. This can give you more granular control over the data you send to the client, either for performance or for privacy reasons.