---
title: "Adapt Postgres queries to Tinybird"
excerpt: "Your Postgres queries won't work the same in Tinybird. These adaptations ensure your migration actually improves performance."
authors: "Xoel Lopez"
categories: "The Data Base"
createdOn: "2025-05-22 00:00:00"
publishedOn: "2021-06-21 00:00:00"
updatedOn: "2025-05-22 00:00:00"
status: "published"
---

Postgres is great as an OLTP database, but if you need real-time queries on hundreds of millions of rows, it's not going to be fast enough. Tinybird is. Most queries from Postgres will look very similar on Tinybird. This guide explains how to adapt most Postgres queries to run on Tinybird.

This post is inspired by [Haki Benita's blog post](https://hakibenita.com/sql-for-data-analysis) on SQL for data analysis. We'll show you how to adapt most of the Postgres queries from Haki's post to run on Tinybird.

## Prerequisites

You don't need an active Tinybird Workspace to read through this post, but it's a good idea to read Haki's post first so you're familiar with the examples. In addition, a working knowledge of Tinybird and SQL is required.

## Common table expressions

Both the `WITH <expression> AS <identifier>` as well as the `WITH <identifier> AS <subquery expression>` syntaxes are supported.

```shell
WITH emails AS (
    SELECT 'ME@hakibenita.com' AS email
)
SELECT * FROM emails
;

WITH emails AS
    (
        SELECT 'ME@hakibenita.com' AS email
    )
SELECT *
FROM emails

Query id: 6b234b03-6dc4-4ddf-8454-b03d34b75b60

┌─email─────────────┐
│ ME@hakibenita.com │
└───────────────────┘

1 rows in set. Elapsed: 0.014 sec.
```

```shell {% title="A CHAINED CTE ON POSTGRES" %}
WITH emails AS (
    SELECT 'ME@hakibenita.com' AS email
),

normalized_emails AS (
    SELECT lower(email) AS email FROM emails
)

SELECT * FROM normalized_emails;

WITH
    emails AS
    (
        SELECT 'ME@hakibenita.com' AS email
    ),
    normalized_emails AS
    (
        SELECT lower(email) AS email
        FROM emails
    )
SELECT *
FROM normalized_emails

Query id: c511a113-1852-4a9f-90bf-99d33eba8254

┌─email─────────────┐
│ me@hakibenita.com │
└───────────────────┘

1 rows in set. Elapsed: 0.127 sec.
```

### On Tinybird

For now, Tinybird only supports the WITH \<expression\> AS \<identifier\> syntax. The previous queries need to be rewritten like this:

```sql {% title="CTES ON TINYBIRD"  %}
WITH (SELECT 'ME@hakibenita.com') AS email
SELECT email
```

There's a difference with CTEs on Postgres VS Tinybird. In Postgres, as the original post says, "CTEs are a great way to split a big query into smaller chunks, perform recursive queries and even to cache intermediate results". On Tinybird, CTEs can only return one row, so those intermediate results can't have multiple rows. For a similar result, on Tinybird you have to use subqueries.

A common pattern is returning a tuple of groupArrays in the CTE, so you can return more than one row in the form of arrays. Then consume the results in the main query with transform for instance or arrayJoin.

In Tinybird, Pipes act like notebooks where each node is a subquery and you can refer to the results of one node in another Node. It's great to see intermediate results and reduce the complexity of your queries. If you'd like to try it out, [sign up here](https://www.tinybird.co/signup?utm_source=blog).

## Generating data

As you see in the original article, [in Postgres](https://hakibenita.com/sql-for-data-analysis#generating-data) there are several ways to do it:

## Union all

This works the same in Tinybird as in Postgres:

```shell {% title="UNION ALL ALSO WORKS" %}
WITH dt AS (
   SELECT 1 AS id, 'haki' AS name
   UNION ALL
   SELECT 2, 'benita'
)
SELECT * FROM dt;

WITH dt AS (
    SELECT
        1 AS id,
        'haki' AS name
    UNION ALL
    SELECT
        2,
        'benita'
)
SELECT *
FROM dt

Query id: e755e5a5-5e5b-4e8a-a262-935f9946d45d

┌─id─┬─name───┐
│  2 │ benita │
└────┴────────┘
┌─id─┬─name─┐
│  1 │ haki │
└────┴──────┘

2 rows in set. Elapsed: 0.051 sec.
```

The VALUES keyword won't work on Tinybird to select data, only to insert it.

## Joining data

The join syntax from Postgres will work on Tinybird, but typically the kinds of analytical data that you'll store on Tinybird will be orders of magnitude bigger than what you'd store on Postgres, and this would make your joins slow. There are ways to make JOINs faster, check the [best practices for writing faster SQL queries](https://tinybird.co/docs/classic/work-with-data/query/sql-best-practices) or contact us for guidance.

## Unnest - arrayJoin

[arrayJoin](https://tinybird.co/docs/sql-reference/functions/array-functions#arrayjoin) is the Tinybird equivalent of unnest on Postgres. So this Postgres query:

```sql {% title="ON Tinybird, UNNEST EXPANDS AN ARRAY INTO ROWS" %}
WITH dt AS (
    SELECT unnest(array[1, 2]) AS n
)
SELECT * FROM dt;
```

Would be rewritten on Tinybird like this:

```sql {% title="ON Tinybird, ARRAYJOIN EXPANDS AN ARRAY INTO ROWS"  %}
SELECT arrayJoin([1, 2]) AS dt
```

## Generating series of data

### Generate_series

The generate_series doesn't exist on Tinybird, but with the numbers function a lot can be accomplished. This is its basic usage:

```sql {% title="NUMBERS PRODUCES ROWS"  %}
SELECT * FROM numbers(10)
```

A similar result can be obtained with the [range](https://tinybird.co/docs/sql-reference/functions/array-functions#range) function, that returns arrays. If you only provide an argument, it behaves like numbers. And with range you can also specify a start, end and step:

```sql {% title="RANGE PRODUCES ARRAYS"  %}
SELECT range(10), range(0, 10, 2)
```

This, combined with arrayJoin lets us do the same as generate_series:

```sql {% title="RANGE OF INTEGERS USING START, END AND STEP"  %}
SELECT arrayJoin(range(0, 10, 2)) AS number
```

## Generating time series

generate_series can produce results with other types different than integers, while range only outputs integers. But with some smart logic you can achieve the same results. For example, on Postgres you'd generate a with a datetime for each hour in a day this way, as in the original [post](https://hakibenita.com/sql-for-data-analysis#generating-data):

```sql {% title="THE GENERATE_SERIES FUNCTION OF POSTGRES" %}
WITH daterange AS (
    SELECT *
    FROM generate_series(
        '2021-01-01 UTC'::timestamptz, -- start
        '2021-01-02 UTC'::timestamptz, -- stop
        interval '1 hour'   -- step
    ) AS t(hh)
) SELECT * FROM daterange;

           hh
────────────────────────
 2021-01-01 00:00:00+00
 2021-01-01 01:00:00+00
 2021-01-01 02:00:00+00
 ...
 2021-01-01 22:00:00+00
 2021-01-01 23:00:00+00
 2021-01-02 00:00:00+00
```

### Generate a time series specifying the start date and the number of intervals

On Tinybird, you can achieve the same this way:

```sql {% title="TIME SERIES ON Tinybird GIVEN START DATE, NUMBER OF INTERVALS AND INTERVAL SIZE"  %}
WITH toDate('2021-01-01') as start
SELECT addHours(toDate(start), number) AS hh
FROM (
    SELECT arrayJoin(range(0, 24)) AS number
)
```

### Generate a time series specifying the start and end date and the step

Another way of doing the same thing:

```sql {% title="TIME SERIES ON Tinybird GIVEN THE START AND END DATE AND THE STEP SIZE"  %}
WITH toStartOfDay(toDate('2021-01-01')) AS start,
toStartOfDay(toDate('2021-01-02')) AS end
SELECT arrayJoin(arrayMap(x -> toDateTime(x), range(toUInt32(start), toUInt32(end), 3600))) as hh
```

### Generate a time series using timeSlots

Using the [timeSlots](https://tinybird.co/docs/sql-reference/functions/date-time-functions#timeslots) function, you can specify the start (DateTime), duration (seconds) and step (seconds) and it generates an array of DateTime values.

```sql {% title="TIME SERIES ON Tinybird USING THE TIMESLOTS FUNCTION"  %}
WITH toDateTime('2021-01-01 00:00:00') AS start
SELECT arrayJoin(timeSlots(start, toUInt32(24 * 3600), 3600)) AS hh
```

## Generating a random value

The [rand](https://tinybird.co/docs/sql-reference/functions/random-functions#rand) function in Tinybird is akin to random in Postgres, with the difference that rand returns a random UInt32 number between 0 and 4294967295. So to get random floats between 0 and 1 like random, you have to divide the result by 4294967295.

```sql {% title="GENERATING A RANDOM VALUE ON Tinybird WITH THE RAND FUNCTION"  %}
SELECT
    rand() random_int,
    random_int / 4294967295 random_float
```

To get more than one row, you'd simply do

```sql {% title="GENERATING SEVERAL RANDOM VALUES ON Tinybird"  %}
SELECT
    rand() random_int,
    random_int / 4294967295 random_float
FROM numbers(100)
```

## Generating random integers within a range

You would use the floor or ceil function (not round, for the reasons explained [here](https://hakibenita.com/sql-for-data-analysis#random)) in addition to the result of rand multiplied by the max of the range of integers you want to generate, like this:

```sql {% title="GENERATING SEVERAL RANDOM INTEGERS IN A GIVEN RANGE ON Tinybird"  %}
SELECT
    ceil(rand() / 4294967295 * 3) AS n
FROM
    numbers(10)
```

And here you can see that the distribution is uniform (this wouldn't happen if you had use round):

```sql {% title="THE DISTRIBUTION IS UNIFORM"  %}
SELECT
    ceil(rand() / 4294967295 * 3) AS n,
    count(*)
FROM
    numbers(10000)
GROUP BY n
ORDER BY n
```

## Sampling data from a list

This is how you'd take samples with replacement from a list in Postgres:

```sql {% title="RANDOM CHOICE IN POSTGRES" %}
SELECT
    (array['red', 'green', 'blue'])[ceil(random() * 3)] AS color
FROM
    generate_series(1, 5);
```

In Tinybird, this is how you'd do it:

```sql {% title="RANDOM CHOICE IN Tinybird"  %}
SELECT
    ['red', 'green', 'blue'][toInt32(ceil(rand() / 4294967295 * 3))] AS color
FROM
    numbers(5)
```

To get only one value, you'd remove the FROM numbers(5) part. Note that to define an array on Tinybird you can do it either calling array('red', 'green', 'blue') or with ['red', 'green', 'blue'] like in the code snippet.

## Sampling data from a table

Sorting data by rand() can be used to get a random sample, like here:

```sql {% title="SAMPLE DATA USING RAND"  %}
SELECT *
FROM events_mat
ORDER BY rand()
LIMIT 100
```

But this is slow, as a full scan of the table has to be run here.

A more efficient way to do it's using the SAMPLE clause. You can pass an integer to it (should be large enough, typically above 1000000):

```sql {% title="THE SAMPLE CLAUSE, PASSING AN INTEGER"  %}
SELECT * FROM events_mat SAMPLE 1000000
```

And you can also pass a float between 0 and 1, to indicate the fraction of the data that will be sampled.

```sql {% title="THE SAMPLE CLAUSE, PASSING A FLOAT"  %}
SELECT * FROM events_mat SAMPLE 0.01
```

## Descriptive statistics on a numeric series

Tinybird also comes with lots of statistical function, like Postgres does (see [this section](https://hakibenita.com/sql-for-data-analysis#describing-a-series) of the original post). The first query, written on Postgres this way:

```sql {% title="DESCRIPTIVE STATISTICS ON POSTGRES" %}
WITH s AS (
    SELECT * FROM (VALUES (1), (2), (3)) AS t(n)
)
SELECT
    count(*),
    avg(n),
    stddev(n),
    min(n),
    percentile_cont(array[0.25, 0.5, 0.75]) WITHIN GROUP (ORDER BY n),
    max(n)
FROM
    s;

count │   avg  │   stddev  │ min │ percentile_cont │ max
──────┼────────┼───────────┼─────┼─────────────────┼─────
    3 │ 2.0000 │ 1.0000000 │   1 │ {1.5,2,2.5}     │   3
```

Would be done on Tinybird like this:

```sql {% title="DESCRIPTIVE STATISTICS ON Tinybird"  %}
SELECT
    count(*),
    avg(n),
    stddevSamp(n),
    min(n),
    quantiles(0.25, 0.5, 0.75)(n),
    max(n)
FROM
    (SELECT arrayJoin([1,2,3]) AS n)
```

## Descriptive statistics on categorical series

Tinybird can also be used to get some statistics from discrete values. While on Postgres you'd do this:

```sql {% title="DESCRIPTIVE STATISTICS OF CATEGORICAL VALUES ON POSTGRES" %}
WITH s AS (SELECT unnest(array['a', 'a', 'b', 'c']) AS v)
SELECT
    count(*),
    count(DISTINCT V) AS unique,
    mode() WITHIN GROUP (ORDER BY V) AS top
FROM
    s;

 count │ unique │ top
───────┼────────┼─────
     4 │      3 │ a
```

On Tinybird you'd do:

```sql {% title="DESCRIPTIVE STATISTICS OF CATEGORICAL VALUES ON Tinybird"  %}
SELECT
    count(*) AS count,
    uniq(v) AS unique,
    topK(1)(v) top
FROM
    (SELECT arrayJoin(['a', 'b', 'c', 'd']) AS v)
```

[uniq](https://tinybird.co/docs/sql-reference/functions/aggregate-functions#uniq) will provide approximate results when your data is very big. If you need exact results you can use [uniqExact](https://tinybird.co/docs/sql-reference/functions/aggregate-functions#uniqexact), but be aware that uniq will generally be faster than uniqExact. Check out the [topK](https://tinybird.co/docs/sql-reference/functions/aggregate-functions#topk) docs as well if you're interested.

As a side note, if you have categorical columns, most likely you'll have [better performance](https://altinity.com/blog/2019/3/27/low-cardinality) and lower storage cost data types. The performance of using LowCardinality will be better than using the base data types even on columns with more than a few millions of different values.

## Subtotals and aggregations

The same operations done in [this section](https://hakibenita.com/sql-for-data-analysis#subtotals) of Haki's post can be done with Tinybird.

Given a table that contains this data:

```sql {% title="EMPLOYEES"  %}
SELECT * FROM employees
```

Finding the number of employees with each role is straightforward, same syntax as on Postgres:

```sql {% title="EMPLOYEES PER ROLE"  %}
SELECT department, role, count(*) count
FROM employees
GROUP BY department, role
```

### Using rollup and cube

The ROLLUP modifier is also available on Tinybird, although the syntax is slightly different than on Postgres. This query on Postgres:

```sql {% title="GROUP BY WITH ROLLUP ON POSTGRES" %}
SELECT department, role, COUNT(*)
FROM employees
GROUP BY ROLLUP(department, role);
```

would be written on Tinybird like this:

```sql {% title="GROUP BY WITH ROLLUP ON Tinybird"  %}
SELECT department, role, COUNT(*)
FROM employees
GROUP BY department, role WITH ROLLUP
```

It allows you to have more subtotals (but not all). To have all the subtotals for all the possible combinations of grouping keys, you need to use the CUBE modifier:

```sql {% title="GROUP BY WITH CUBE ON Tinybird"  %}
SELECT department, role, COUNT(*)
FROM employees
GROUP BY department, role WITH CUBE
```

## Pivot tables and conditional expressions

Pivot tables let you reshape data when you want typically a column with keys, a column with categories and a column with values, and you want to aggregate those values and use the categories column as columns of a new table.

On Postgres you could do it this way:

```sql {% title="PIVOT TABLE CREATED MANUALLY ON POSTGRES" %}
SELECT
    role,
    COUNT(*) FILTER (WHERE department = 'R&D') as "R&D",
    COUNT(*) FILTER (WHERE department = 'Sales') as "Sales"
FROM
    employees
GROUP BY
    role;

   role    │ R&D │ Sales
───────────┼─────┼───────
 Manager   │   1 │     1
 Developer │   2 │     2
```

On Tinybird, you could do the same this way:

```sql {% title="PIVOT TABLE CREATED MANUALLY ON POSTGRES"  %}
SELECT
    role,
    countIf(department = 'R&D') as "R&D",
    countIf(department = 'R&D') as "Sales"
FROM
    employees
GROUP BY
    role
```

## Running and Cumulative Aggregations

Aggregations over sliding windows are a common solution. This can be done with Window functions.

This also can be done with the [groupArrayMovingSum](https://tinybird.co/docs/sql-reference/functions/aggregate-functions#grouparraymovingsum) and [groupArrayMovingAvg](https://tinybird.co/docs/sql-reference/functions/aggregate-functions#grouparraymovingavg) functions, available in stable releases since a long time ago already. This is an example of its usage:

Given this dataset:

```sql {% title="GOOGLE TRENDS DATA FOR THE TERM 'AMAZON'"  %}
SELECT date, amazon as value FROM amazon_trends
```

You can compute a 7-day moving average of value like this:

```sql {% title="7-DAY MOVING AVERAGE"  %}
SELECT * FROM
    (SELECT
        groupArray(date) as date_arr,
        groupArray(value) as value_arr,
        groupArrayMovingAvg(7)(value) mov_avg
    FROM (SELECT date, amazon as value FROM amazon_trends))
ARRAY JOIN *
```

The periods parameter is optional. If you omit it, all the previous rows are used for the aggregation.

## Linear regression

Given this data

```sql {% title="DATA TO PERFORM LINEAR REGRESSION"  %}
SELECT
    arrayJoin([[1.2, 1], [2, 1.8], [3.1, 2.9]])[1] x,
    arrayJoin([[1.2, 1], [2, 1.8], [3.1, 2.9]])[2] y
```

On Postgres you can see in the [original post](https://hakibenita.com/sql-for-data-analysis#linear-regression) that you'd do linear regression like this:

```sql {% title="LINEAR REGRESSION ON POSTGRES" %}
WITH t AS (SELECT * FROM (VALUES
    (1.2, 1.0),
    (2.0, 1.8),
    (3.1, 2.9)
) AS t(x, y))
SELECT
    regr_slope(y, x) AS slope,
    regr_intercept(y, x) AS intercept,
    sqrt(regr_r2(y, x)) AS r
FROM
    t;

       slope        │      intercept       │ r
────────────────────┼──────────────────────┼───
 1.0000000000000002 │ -0.20000000000000048 │ 1
```

There's not a function on Tinybird like regr_r2 that gives you the R2 coefficient, but it wouldn't be hard to calculate it yourself as the formula is [simple](https://www.google.com/search?q=r2+formula).

## Filling null values

This part is called "[interpolation](https://hakibenita.com/sql-for-data-analysis#interpolation)" in Haki's post. Filling null values with Pandas is a one-liner. Consider this table

```sql {% title="A TABLE WITH A INT AND A STRING COLUMN, WITH SOME NULL VALUES"  %}
SELECT * FROM num_str
```

## Fill null values with a constant value

The way to replace all the null values by a constant value is using the coalesce function, that works in Tinybird the same way it does in Postgres, using [coalesce](https://tinybird.co/docs/sql-reference/functions/functions-for-nulls#coalesce):

```sql {% title="FILLING NULL VALUES WITH A CONSTANT"  %}
SELECT
    n,
    coalesce(v, 'X') AS v
FROM
    num_str
```

## Back and forward filling data

groupArray, as the other aggregate functions on Tinybird, skips null values. So the solution involves replacing them by another value (make sure that the new value doesn't appear in the column before). This is done with the ifNull function. Add some array magic in, and this is how you'd do it:

```sql {% title="BACK AND FORWARD FILLING VALUES"  %}
SELECT
  values.1 n,
  values.2 v,
  values.3 v_ffill,
  values.4 v_bfill
FROM
  (SELECT
    arrayJoin(
      arrayZip(
        groupArray(n) AS n,
        arrayMap(x -> x != 'wadus' ? x : null, groupArray(v_nulls_replaced)) AS v,
        arrayFill(x -> x != 'wadus', groupArray(v_nulls_replaced)) AS v_ffill,
        arrayReverseFill(x -> x != 'wadus', groupArray(v_nulls_replaced)) AS v_bfill
      )
    ) values
  FROM
     (SELECT
        *,
        ifNull(v, 'wadus') v_nulls_replaced
      FROM num_str
      ORDER BY n ASC)
  )
```

To understand what's going on here, import this Pipe with a step-by-step explanation and the results of the transformations that are taking place. Tinybird lets you run each subquery in a node of a notebook-like UI. This lets you build and debug complex queries in a cleaner way. If you'd like to use it, sign up [here](https://www.tinybird.co/signup).

## Filling gaps in time-series, reshaping indexes

Sometimes, you'll group a time-series by a Date or DateTime column, and it can happen that the intervals between rows aren't always the same because there were no values found for some dates or datetimes. In Pandas, the solution would be creating a new date_range index and then re-indexing the original Series/DataFrame with that index.

On Tinybird, the same can be accomplished with the WITH FILL modifier. Here's a simple example of it:

```sql {% title="FILLING GAPS IN TIME ON Tinybird"  %}
SELECT
    toDate((number * 2) * 86400) AS d1,
    'string_value' as string_col,
    toInt32(rand() / exp2(32) * 100) as n
FROM numbers(10)
ORDER BY
    d1 WITH FILL STEP 1
```

The `STEP 1` part isn't necessary here as it's the default, but know that you can set a different value than 1.

## Linear interpolation

Imagine you have a table like this, containing a time-series with some rows missing. You could fill those missing gaps with the WITH FILL expression previously shown, but that way you'd just get zeroes when there's a missing value, while the actual missing value is probably closer to the previous and the next values than to zero.

```sql {% title="TIME-SERIES WITH MISSING ROWS"  %}
SELECT *, bar(value, 0, 100, 20)
FROM trends_with_gaps
ORDER BY date WITH FILL STEP 1
```

Linear interpolation is the simplest way to fill those missing values. In it, missing values are replaced by the average of the previous and the next known values. On Postgres, Haki's post explains how to do it [here](https://hakibenita.com/sql-for-data-analysis#linear-interpolation).

On Tinybird, this can be done with arrays:

```sql {% title="INTERPOLATING VALUES"  %}
SELECT
    date,
    value,
    value_interpolated,
    bar(value_interpolated, 0, 100, 20) AS value_interpolated_bar
FROM
(
    SELECT
        groupArray(date) AS dt_arr,
        groupArray(value) AS value_arr,
        arrayFill(x -> ((x.1) > 0), arrayZip(value_arr, dt_arr)) AS value_lower,
        arrayReverseFill(x -> ((x.1) > 0), arrayZip(value_arr, dt_arr)) AS value_upper,
        arrayMap((l, u, v, dt) -> if(v > 0, v, (l.1) + ((((u.1) - (l.1)) / ((u.2) - (l.2))) * (dt - (l.2)))), value_lower, value_upper, value_arr, dt_arr) AS value_interpolated
    FROM (
            SELECT * FROM trends_with_gaps
            ORDER BY date WITH FILL STEP 1
        )
)
ARRAY JOIN
    dt_arr AS date,
    value_interpolated,
    value_arr AS value
```

For a step-by-step explanation of how this works, and to see how you could construct this query iteratively with a notebook-like interface on Tinybird, import this Pipe.

## Binning and histograms

The original post talks about custom binning, equal-width and equal-height binning. The way to do custom binning is very similar on Tinybird, which also supports CASE statements. Equal height binning could be achieved with the quantiles function, already described before, in the descriptive statistics section. The most interesting use-case of equal-width binning is creating histograms, which is very easy on Tinybird. It even comes with a histogram function, which receives a number of bins and the data, and returns a list of tuples containing the lower and upper bounds of each bucket, as well as its height:

```sql {% title="THE HISTOGRAM FUNCTION"  %}
SELECT histogram(10)(value) AS values FROM trends_with_gaps
```

## Conclusion

While Postgres and Tinybird share many SQL features, Tinybird is specifically optimized for real-time analytics on large datasets. The main differences are:

1. CTEs in Tinybird are more limited but can be replaced with Pipes
2. Tinybird provides specialized functions for time series analysis
3. Array operations use different syntax but achieve similar results
4. Statistical functions have different names but similar functionality

To get started with Tinybird, [sign up here](https://www.tinybird.co/signup) and try these examples with your own data.
