Working with Time

Easy

In this guide you'll learn how to work with Time in Tinybird

Following up on our previous guide Best practices for Timestamps, in this guide we will dive into the functions for working Time in ClickHouse.

We'll examine the main types in using, storing, filtering and querying data. We'll also examine functions that may behave in ways you don't expect when coming in from other databases, and finish with example queries and some good test data to play with.

In particular, we're going to pay attention to the right way of handling Timezones and DST when querying and aggregating, a very common activity in Tinybird.

Getting Practical

If you want to follow along with this guide, you can find all of the sample queries & data in this repo.

The repo contains several examples, which the rest of this Guide will walk through.

In our companion Workspace, this section is in the Pipe 'simple_types_and_transforms'

There are basically two main Types, with several sub-types, when dealing with Time in ClickHouse - the first for dates, and the second for specific time instants.

For dates, we will choose one type or another depending on the range of dates we will need to store.

For time instants, it will depend not only on the range, but also on the precision. The default records seconds, and you can also work with micro or nanoseconds.

Date and Time Types

Type Range of valuesParameters
Date[1970-01-01, 2149-06-06]
Date32[1900-01-01, 2299-12-31]
DateTime[1970-01-01 00:00:00, 2106-02-07 06:28:15]Timezone: More about that later.
DateTime64[1900-01-01 00:00:00, 2299-12-31 23:59:59.99999999]Precision: [0-9]. Usually: 3(miliseconds), 6(microseconds), 9(nanoseconds). Timezone: More about that later.

Note that this standard of appending -32 or -64 carries through to most other functions dealing with Date and DateTime, as we will soon see.

Also note that the main Clickhouse docs have an exhaustive listing of the different types and functions available - but it can be tricky to figure out which one you should use, and how you should use it, which is why we've put together this guide.

How can I transform my data into these types?

Date from a String

Various ways of transforming a String into a Date

SELECT
  '2023-04-05' AS date,
  toDate(date),
  toDate32(date),
  DATE(date),
  CAST(date, 'Date')
timetoDate(time)toDate32(time)DATE(time)CAST(time, 'Date')
2023-04-052023-04-052023-04-052023-04-052023-04-05

How to store a Date before 1970?

You can use the toDate32 function, which has a range of [1900-01-01, 2299-12-31]. If you try to use the normal toDate function, you'll get the boundary of the available range - this might trip up users coming from other databases who would expect to get an Error.

select toDate('1903-04-15'), toDate32('1903-04-16')
toDate('1903-04-15')toDate32('1903-04-16')
1970-01-011903-04-16

Parsing other formats into DateTime

You can use the parseDateTimeBestEffortOrNull function to parse a string into a DateTime, and if it fails for some reason, it will return a NULL value. This is your best default option, and there are several others which give different outcome behaviors, like parseDateTimeBestEffort (gives an error on a bad input), or parseDateTimeBestEffortOrZero (returns 0 on bad input). You can always check the main Clickhouse documentation for functions mapping to specific edge cases.

Also remember to use the -32 or -64 suffixes for the functions that return a DateTime, depending on the range of dates you need to store.

We know these to generally work with typical output strings experienced in the wild, such as those produced by most JavaScript libraries, Python logging, the default formats produced by AWS and GCP, and many others. But we also can't guarantee it'll work, so test your data! And if something doesn't work as expected, reach out in the Slack community.

How soon is Now()?

In our companion Workspace, this section is in a Pipe called 'what_is_now'

The function now gives a seconds-accuracy current timestamp. You can use the now64 function to get precision down to nanoseconds if necessary.

They will default to the timezone of the server, which is UTC in Tinybird. You can use the toDateTime function to convert to a different timezone, or toDate to convert to a date.

Be mindful of using toDate and ensuring you are picking the calendar day relative to the Timezone you are thinking about, otherwise you will end up with the UTC day instead by default. You might want to use the convenience functions today and yesterday instead.

SELECT 
now(),
toDate(now()),
toDateTime(now()),
toDateTime64(now(),3),
now64(),
toDateTime64(now64(),3),
toUnixTimestamp64Milli(toDateTime64(now64(),3)),
today(),
yesterday()
now()toDate(now())toDateTime(now())toDateTime64(now(),3)now64()toDateTime64(now64(),3)toUnixTimestamp64Milli(toDateTime64(now64(),3))today()yesterday()
2021-03-19 19:15:002021-03-192021-03-19 19:15:002021-03-19 19:15:00.0002021-03-19 19:15:00.4782021-03-19 19:15:00.47816161813002021-03-192021-03-18

Filtering by Date in ClickHouse

This section in our companion Workspace is in a Pipe called 'filter_by_date'

To filter by date in ClickHouse, you need to make sure that you are comparing dates of the same type on both sides of the condition.

If you are following along with our example Workspace, this code is in the 'filter_by_date' pipe. As you already know, we are dealing with timestamps stored in a column named timestamp_utc, and they look like:

2023-03-19 19:15:00
2023-03-19 19:17:23
2023-03-20 00:02:59

It is extremely common that you want to filter your data by a given day. It is quite simple, but there are many mistakes that anyone can make. Let's examine them step by step.

The first thing we could imagine is that ClickHouse is clever enough to understand what we want to do if we pass a String containing a date. As you can see, this query:

SELECT
    timestamp_utc
FROM sales_data_raw
WHERE timestamp_utc = '2023-03-19'

Produces no results. Well actually this makes perfect sense as we're comparing different types, a DateTime and a String.

What if we force our parameter to be a proper date? We can easily do it:

SELECT
    timestamp_utc
FROM sales_data_raw
WHERE timestamp_utc = toDate('2023-03-19')

Nope. We may realize that we are comparing a Date and a DateTime.

Ok, so what if I convert my Date into a DateTime?

SELECT
    timestamp_utc
FROM sales_data_raw
WHERE timestamp_utc = toDateTime('2023-03-19')

Well, no results. This is because the DateTime produced has a Time component of midnight, which is still just a single moment in time, not a range of time - which is exactly what a Date is - all the moments between the midnights on a given calendar day.

So what we really have to do is make sure we're comparing the same kind of dates in both sides. But it will be instructive to see the results of all the transformations so far:

SELECT
    timestamp_utc,
    toDate(timestamp_utc),
    toDate('2023-03-19'),
    toDateTime('2023-03-19')
FROM sales_data_raw
WHERE toDate(timestamp_utc) = toDate('2023-03-19')

You can see that:

timestamp_utctoDate(timestamp_utc)toDate('2023-03-19')toDateTime('2023-03-19')
2023-03-19 19:15:002023-03-192023-03-192023-03-19 00:00:00

So, we can see that in order to select the day we want, we need to convert both sides of the condition to Date. We do it in the WHERE clause in the example given.

We need to remember that, if filtering by date, we need a date in both sides of the condition.

What's a good Timestamp schema in ClickHouse

Here's a few tips for the Types you'll want to use when storing Timestamps and friends in ClickHouse.

For a start, it's a good idea to include the expected Timezone, or any modifications like normalization, in the timestamp column name. This will help you avoid confusion when you're working with data from different timezones, particulary when joining multiple data sources.

Next, your Timezone names are low cardinality strings, so you can use the LowCardinality(String) modifier to save space. Same goes for the offset, which is an Int32.

Finally, you can use the String type to store the local timestamp in a format that is easy to parse, but isn't going to be eagerly converted by ClickHouse unexpectedly. The length of the string can vary by the precision you need, but it's a good idea to keep it consistent across all your data sources. There is also the FixedString type, which is a fixed length string, but while it's more efficient to process it's also not universally supported by all ClickHouse drivers, so String is a better default.

Here's an example:

timestamp_utc [DateTime('UTC')], timezone [lowCardinalityString], offset [Int32], timestamp_local [String], <other data>

The nitty-gritty of ClickHouse DateTime functions

In our companion Workspace, this section is in a Pipe called 'NittyGritty_DateTime_Operations'

Here we use some more of our test data, specifically the recording of metadata about Store Hours in Tokyo, to see what happens with the various functions you would expect to use, and what the outcomes are.

Things that work as expected

SELECT 
-- these are simple
timezone,
store_open_time_utc,
store_open_time_local, 
-- This next one will be converted to the ClickHouse server's default timezone
parseDateTimeBestEffort(store_open_time_local) as parse_naive, 
-- This will be in the specified Timezone, but it must be a Constant - you can't look it up!
parseDateTimeBestEffort(store_open_time_local, 'Asia/Tokyo') as parse_tz_lookup, 
-- this next one stringify's the DateTime so you can pick it in UTC without TZ conversion
parseDateTimeBestEffort(substring(store_open_time_local, 1, 19)) as parse_notz, 
-- These next ones work, but you must get the offsets right for each DST change!
store_open_time_utc + store_open_timezone_offset as w_plus,
date_add(SECOND, store_open_timezone_offset, store_open_time_utc) as w_date_add,
addSeconds(store_open_time_utc, store_open_timezone_offset) as w_add_seconds,
--- This gives you nice UTC formatting as you'd expect
formatDateTime(store_open_time_utc, '%FT%TZ'),
-- Because BestEffort will convert your string to UTC, you're going to get UTC displayed again here
formatDateTime(parseDateTimeBestEffort(store_open_time_local), '%FT%T')
FROM store_hours_raw
where store = 'Tokyo'
Limit 1
timezonestore_open_time_utcstore_open_time_localparse_naiveparse_tz_lookupw_plusw_date_addw_add_secondsformatDateTime(store_open_time_utc, '%FT%TZ')formatDateTime(parseDateTimeBestEffort(store_open_time_local), '%FT%T')
Asia/Tokyo2023-03-20 00:00:002023-03-20T09:00:00+09:002023-03-20 00:00:002023-03-20 09:00:002023-03-20 09:00:002023-03-20 09:00:002023-03-20 09:00:002023-03-20T00:00:00Z2023-03-20T00:00:00

Things that might surprise you

Here we have a column with a time in UTC, and another column with the local timezone. ClickHouse has an opinion that a DateTime column must always be one specific Timezone.

It also has the opinion that you must always supply the Timezone as a String, and cannot lookup from another column on a per-row basis.

There's also a bug in older ClickHouse versions where toTimezone will silently give the wrong answer if you give it a lookup instead of a String, so use toDateTime instead.

SELECT 
-- toString(store_open_time_utc, timezone) as w_tostr_lookup,   << This causes an error
-- toDateTime(store_open_time_utc, timezone),    << This also causes an error
  toString(store_open_time_utc, 'Asia/Tokyo') as w_tostr_const,  -- This is correct.
  toTimezone(store_open_time_utc, timezone) as w_totz_lookup,  -- This silently gives the wrong answer due to the bug!
  toDateTime(store_open_time_utc, 'Asia/Tokyo') as todt_const,  -- This is the best method
  toTimezone(store_open_time_utc, 'Asia/Tokyo') as w_totz_const,  -- this works, but toTimezone can fail if misused, so better to use toDateTime
-- Because BestEffort will convert your string to UTC, you might think you'll get the local time
-- But as the ClickHouse column is UTC, that is what you're going to get
  formatDateTime(parseDateTimeBestEffort(store_open_time_local), '%FT%T'),
--   formatDateTime(store_open_time_utc, '%FT%T%z')  -- << Supported in ClickHouse 22.9
  toDate32('1903-04-15'),  -- This gives the expected date
  toDate('1903-04-15')  -- This silently hits the date boundary at 1970-01-01, so be careful

Understanding ClickHouse Timezone handling

ClickHouse DateTimes always have a Timezone. Here are the key points to note:

  • A DateTime is stored in ClickHouse as a Unix timestamp which represents the number of seconds in UTC since 1970-01-01.
  • ClickHouse stores a single timezone associated with a column and uses it to handle transformations during representation/export or other calculations.
  • ClickHouse enforces the single timezone per column rule in the behavior of native functions it provides for manipulating DateTime data.
  • ClickHouse assumes that any timezone offset or DST change is a multiple of 15 minutes. This is true for most modern timezones, but not for various historical corner cases.

Let's explore some areas to be aware of.

How a Timezone is selected for the Column

Here are some general rules to observe when selecting a timezone for a column:

  • If no timezone is specified, the column will represent the data in the Clickhouse server's timezone, which is UTC in Tinybird.
  • If a timezone is specified as a string, the column will represent the data as that timezone.
  • If multiple timezones are in a query that produces a column without already creating the column with a timezone, the first timezone in the query wins (e.g. a CASE statement to pick different timezones).
  • If the timezone is not represented as a constant (e.g. by lookup to another column or table), you should get an error message.

DateTime Operations without Timezone Conversion

ClickHouse provides some native operations that work with DateTime without handling timezone conversion. Mostly these are for adding or subtracting some unit of time. These operations behave exactly as you would expect.

It's worth noting that you, as the programmer, are responsible for correctly selecting the amount to add or subtract for a given timestamp. Many incorrect datasets are produced here by incorrect chaining of timezone translations or handling DST incorrectly.

DateTime to String with a Timezone

In older versions of ClickHouse, you cannot convert a DateTime to an ISO8601 String with timezone information. However, the %z operator was introduced in version 22.9, allowing you to use 'formatDateTime(timestamp_utc, '%FT%T%z')'.

How to Normalize your Data in ClickHouse

The Pipe for this one is 'sales_normalized' in the companion Workspace

The process of converting timestamps on the data so that they are all in the same timezone is called "time zone normalization" or "time zone conversion". This is a common practice in data analytics to ensure that data from multiple sources, which may be in different time zones, can be accurately compared and analyzed. By converting all timestamps to a common time zone, it is possible to compare data points from different sources and derive meaningful insights from the data.

Consider a sample dataset containing sales data from multiple stores which are each in different timezones, each store is open from 0900 to 1700 local time, and we're going to shift them all to UTC:

The simplest way to do this in ClickHouse is to chain standard conversion functions - first, convert from your canonical stored UTC Timestamp to the store-local timezone, then extract a String representation of that local time, then convert that back into a DateTime object. ClickHouse will by default present this in the server Timezone, so we have to know that this dataset has been normalized and name it appropriately.

In this example we are using a CASE statement to handle each store, and the result is presented as a Materialized View in Tinybird. This has the benefit of pre-calculating the time zone shifts, you could also pre-aggregate to some time interval like 5 or 15 mins, resulting in a highly efficient dataset for further analysis. In our schema we have a String identifying the store, an Int32 for the sale_amount, and a DateTime in UTC for timestamp_utc of the transaction.

SELECT store, sale_amount,
  CASE 
    WHEN store = 'New York' THEN toDateTime(toString(toTimezone(timestamp_utc, 'America/New_York')))
    WHEN store = 'Tokyo' THEN toDateTime(toString(toTimezone(timestamp_utc, 'Asia/Tokyo')))
    WHEN store = 'London' THEN toDateTime(toString(toTimezone(timestamp_utc,'Europe/London')))
    WHEN store = 'Madrid' THEN toDateTime(toString(toTimezone(timestamp_utc, 'Europe/Madrid')))
    WHEN store = 'Auckland' THEN toDateTime(toString(toTimezone(timestamp_utc, 'Pacific/Auckland')))
    WHEN store = 'Chatham' THEN toDateTime(toString(toTimezone(timestamp_utc, 'Pacific/Chatham')))
    else toDateTime('1970-01-01 00:00:00')
  END AS timestamp_normalized
FROM sales_data_raw

Note that the timezones used are all input as Strings, as ClickHouse requires this. We use an else statement so that ClickHouse doesn't mark the timestamp_normalized column as Nullable, which drastically impacts performance.

Good Timezone Test Data

"If you gaze long into an abyss, the abyss will gaze back into you." - Friedrich Nietzsche

One of the benefits of being a Serverless SaaS company is that we don't have to guess what kind of weird scenarios might need testing - we can just take note of what our Customers find challenging and then look to solve it.

In this case, we generally observed that Customers run into issues with time based aggregations, particularly when Timezone conversions are involved, which is what led to this guide.

We amalgamated their various cases and produced a test data set, which is comprised of a facts table listing generated transactions for a set of retail stores across the world, along with some additional columns to aid in checking processing correctness, and a dimension table giving details about the store hours and timezones, again with additional information for correctness checking.

In the companion Workspace, these are the 'sales_data_raw' and 'store_hours_raw' tables respectively. The Dataset is generated by this Notebook in the Repo, and pregenerated Fixtures from it are also in the Repo - these are what is presented in the Workspace.

The tricky use case

We also included one extra store as a Canary in what is deliberately the most complex case that we need to handle (remember the second Timestamp Commandment).

Let's say we have a store in the Chatham Islands, which typically has a +1345hr time zone. This store records the start of the new business day at 0900 each day instead of 0000 (midnight) due to some 'local regulations'. Let us also say that this store remains open 24hrs a day, and we are recording sales through a DST change.

The Chatham Islands DST changes back one hour at 0345 local time on April 2nd, so the April 2nd calendar day will have 25hrs elapsed, however due to the store business day changeover at 0900, this will actually occur during the April 1st store business day (open at 0900 Apr1, close at 0859 Apr2).

Confusing? Yes, exactly the point.

In order to more easily observe this behavior, our test data records a fixed price transaction at a fixed interval of every 144 seconds throughout the business day, so your tests can have patterns to observe. Why 144 seconds is a useful number here is left as an exercise for the reader.

Why is this a good test scenario

  • When viewed in UTC, the DST change is at 2015hrs on April 1st, and the day has 25hrs elapsed, which may confuse an inexperienced developer.
  • We also have a requirement that the business day windowing does not map to the calendar day (the 'local regulations' above), which mimics a typical unusual requirement in business analysis.
  • You can also see that the offset changes from +1345 to +1245 between Apr1 and Apr2 - this is what trips up naive addSeconds timezone conversions twice a year, as they would only be correct for part of the day if not carefully bounded.
  • In addition to this, the Chatham Islands timezone has a delightful offset in seconds which looks a lot like a typo when it changes from 49500 to 45900.
  • It is also a place where almost nobody has a store as the local population is only 800 people, so great for a Canary that can be filtered from real data.

This is what makes it an excellent timezone for test data.

Test Data Schema

Let's take a look at the Schema, and why the various fields are useful. Note that we have used simple Types here to make it easier to understand, but you could also use the FixedString and other more complex types we described earlier in your real use cases.

Store Hours:

`store` String `json:$.store` ,
`store_close_time_local` String `json:$.store_close_time_local` ,
`store_close_time_utc` DateTime `json:$.store_close_time_utc` ,
`store_close_timezone_offset` Int32 `json:$.store_close_timezone_offset` ,
`store_date` Date `json:$.store_date` ,
`store_open_seconds_elapsed` Int32 `json:$.store_open_seconds_elapsed` ,
`store_open_time_local` String `json:$.store_open_time_local` ,
`store_open_time_utc` DateTime `json:$.store_open_time_utc` ,
`store_open_timezone_offset` Int32 `json:$.store_open_timezone_offset` ,
`timezone` String `json:$.timezone` ,

Sales Data:

`store` String `json:$.store` ,
`sale_amount` Int32 `json:$.sale_amount` ,
`timestamp_local` String `json:$.timestamp_local` ,
`timestamp_utc` DateTime `json:$.timestamp_utc` ,
`timestamp_offset` Int32 `json:$.timestamp_offset`

Schema Notes

The tables are joined on the 'store' column, in this case a String for readability but likely to be a more efficient LowCardinalityString or an Int32 in a real case.

Local times are stored as Strings so that the underlying Database is never tempted to change them to some other time representation. Having a static value to check your computations against is very useful in test data.

In the Sales Data, note that we keep the String of the local timestamp in the facts table. In a large dataset this would bloat the table and probably not be necessary. Strictly speaking, the offset is also unnecessary as you should be able to recalculate what it was given the timestamp in UTC and the exact timezone of the event producing service. Practically speaking however, if you are going to need this information a lot for your analysis, you are simply trading off later compute time against storage.

Note that we also keep a lot of metadata about the store business day in the Store Hours table - this helps us to ensure that not only our analytic calculations are correct, but that our generated data is correct (have a look at the Test Data Validation below). This is a suprisingly common issue where test data is produced - it looks good to the known edge cases, but doesn't end up correct to the unknown edgecases - it's in exactly these scenarios that you want to know more about how it was produced so you can fix it.

Test Data Validation

In the Data Generator Notebook, we also inspect the generated data to ensure that it conforms to the changes expected over the difficult time window of 1st April to 3rd April in the Chatham Islands.

Here's the output - consider what we've discussed above about the changes over these few days, and then take a look at the timestamps and calculations to see how each plays out.

Day: 2023-04-01
opening time UTC: 2023-03-31T19:15:00+00:00
opening time local: 2023-04-01T09:00:00+13:45
timezone offset at store open: 49500
total hours elapsed during store business day: 25
total count of sales during business day: 625 and sales amount in cents: 31250
total hours elapsed during calendar day: 24
total count of sales during calendar day: 600 and sales amount in cents: 30000
closing time UTC: 2023-04-01T20:14:59+00:00
closing time local: 2023-04-02T08:59:59+12:45
timezone offset at store close: 45900

Day: 2023-04-02
opening time UTC: 2023-04-01T20:15:00+00:00
opening time local: 2023-04-02T09:00:00+12:45
timezone offset at store open: 45900
total hours elapsed during store business day: 24
total count of sales during business day: 600 and sales amount in cents: 30000
total hours elapsed during calendar day: 25
total count of sales during calendar day: 625 and sales amount in cents: 31250
closing time UTC: 2023-04-02T20:14:59+00:00
closing time local: 2023-04-03T08:59:59+12:45
timezone offset at store close: 45900

Day: 2023-04-03
opening time UTC: 2023-04-02T20:15:00+00:00
opening time local: 2023-04-03T09:00:00+12:45
timezone offset at store open: 45900
total hours elapsed during store business day: 24
total count of sales during business day: 600 and sales amount in cents: 30000
total hours elapsed during calendar day: 24
total count of sales during calendar day: 600 and sales amount in cents: 30000
closing time UTC: 2023-04-03T20:14:59+00:00
closing time local: 2023-04-04T08:59:59+12:45
timezone offset at store close: 45900

This test data generator has been through several iterations to capture the specifics of various scenarios our customers have raised with us, hopefully you will also find it helpful.

Querying correctly with Timezones and DST

Now that we have looked at the test data, let's work through understanding some examples of the correct way to query it, and where it can go wrong easily.

This section is in the companion Worspace as Pipe 'Aggregate_Querying'

Naively querying by UTC

If we naively query by UTC here, we get an answer that looks correct - sales start at 9am and stop at 5pm when London is on winter time, which matches UTC.

SELECT
  store,
  toStartOfFifteenMinutes(timestamp_utc) AS period_utc,
  sum(sale_amount) AS sale_amount,
  count() as sale_count
FROM sales_data_raw
where store = 'London' AND toDate(period_utc) = toDate('2023-03-24')
GROUP BY
    store,
    period_utc
ORDER BY period_utc ASC
storeperiod_utcsale_amountsale_count
London2023-03-24 09:00:00884187
London2023-03-24 09:15:00291593
London2023-03-24 09:30:00355093

The essential mistake here is blindly converting the UTC timestamp in 'period_utc' to a date, however as the timezone aligns with UTC, it has no impact on these results.

Naively querying a split timezone

However if we naively query Auckland applying a calendar day to the UTC periods, we get two split blocks at the start and end of day.

This is because you're actually getting the end of the day you want, and the start of the next day, because of the ~+13 UTC offset.

select * from sales_15min_utc_mv
where store = 'Auckland' AND toDate(period_utc) = toDate('2023-03-24')
ORDER BY period_utc ASC
storeperiod_utcsale_amountsale_count
Auckland2023-03-24 00:00:00715386
Auckland2023-03-24 00:15:00517955
Auckland2023-03-24 00:30:00671067

The same mistake is made as above, with the blind conversion of the UTC timestamp to a date, however in this case it's more obvious that it's wrong as very few timezones map 0900 local exactly to midnight.

Try the same query again with the Tokyo store, and see how the mistake is easily hidden.

Aggregating by Timestamp over a period

A convenience of modern Timezone and DST definitions is that they can all be expressed in 15 minute increments (in fact, underlying ClickHouse assumes this is true to drive processing efficiency), so if you aren't sure what aggregation period is best for you, start with 15mins and see how your data set grows, as you can query any larger period as groups of 15min periods.

To remind you again of our test data: We have a table containing sales data for a set of stores. Our data has columns for a store, a timestamp in UTC, and a sale_amount. In Tinybird, we can quickly and efficiently pre-calculate the 15min periods as a Materialized View with the following query

SELECT
    store,
    toStartOfFifteenMinutes(timestamp_utc) AS period_utc,
    sumSimpleState(sale_amount) AS sale_amount,
    countState() AS sale_count
FROM sales_data_raw
GROUP BY
    store,
    period_utc
ORDER BY period_utc ASC

Using this, we can then produce any larger period for a given window of that period, see the next example for how to break it into local days.

In our example Workspace, I've already created this Materialized View for you to play with.

Querying by Day and Timezone

Building on the previous example, this query produces a correct aggregate by the local timezone day for a given store, timezone, start_day and end_day (inclusive). We also feature use of Tinybird parameters, such as you would use in an API Endpoint.

Note that we're querying the tricky Chatham store here, and over the days with the DST change, so you can see the pattern of transactions emerge.

In this example, I have used Tinybird parameters for the likely user-submitted values of timezone, start_day and end_day, and we are selecting the data from our sales_15min_utc_mv Materialized View described above.

%
Select
  store,
  toDate(toTimezone(period_utc, {{String(timezone, 'Pacific/Chatham')}})) as period_local,
  sum(sale_amount) as sale_amount,
  countMerge(sale_count) as sale_count
from sales_15min_utc_mv
where
  store = {{String(store, 'Chatham')}}
  and period_local >= toDate({{String(start_day, '20230331')}})
  and period_local <= toDate({{String(end_day, '20230403')}})
group by store, period_local
order by period_local asc
storeperiod_localsale_amountsale_count
Chatham2023-03-3130000600
Chatham2023-04-0130000600
Chatham2023-04-0231250625
Chatham2023-04-0330000600

Note that we have been careful to convert the DateTime to the target timezone before reducing to the local calendar day using toDate, and we have been careful to ensure that our user input is converted to the same Date type for comparison, and we have renamed the column to reflect this change.

Also note that our timezone parameter is a String, which is required by ClickHouse. It's up to you to ensure you supply the right timezone string and Store filters here.

Because our Datasource in this case is a Materialized View, we are also careful to use the correct CountMerge function to get the final results of the incremental field.

Validating the results

We can validate the results by comparing them to the same query over the raw data, and we can see that the results are identical.

select 
  store,
  toDate(substring(timestamp_local, 1, 10)) as period_local,
  sum(sale_amount) as sale_amount,
  count() as sale_count
from sales_data_raw
where 
  store = 'Chatham'
  and toDate(substring(timestamp_local, 1, 10)) >= toDate('20230331')
  and toDate(substring(timestamp_local, 1, 10)) <= toDate('20230403')
group by store, period_local
order by period_local asc
storeperiod_localsale_amountsale_count
Chatham2023-03-3130000600
Chatham2023-04-0130000600
Chatham2023-04-0231250625
Chatham2023-04-0330000600

Querying with a lookup table

In our test data, the store day does not match the local calendar day - if you recall it was specified that it starts at 0900 local time. Fortunately, we can use a lookup table to map the store day to our UTC timestamps, and then use that to query the data. This is often a good idea if you have a lot of data, as it can be more efficient than using a function to calculate the mapping each time.

select 
  store,
  store_date,
  sum(sale_amount),
  countMerge(sale_count) as sale_count 
from sales_15min_utc_mv
join store_hours_raw using store
where period_utc >= store_hours_raw.store_open_time_utc
and period_utc < store_hours_raw.store_close_time_utc
and store_date >= toDate('2023-03-31') and store_date <= toDate('2023-04-03')
and store = 'Chatham'
group by store, store_date
order by store_date asc
storeperiod_localsale_amountsale_count
Chatham2023-03-3130000600
Chatham2023-04-0131250625
Chatham2023-04-0230000600
Chatham2023-04-0330000600

Note that the extra hour of business moves to the 1st of April, as that store day doesn't close until 0859 on the 2nd of April, which is after the DST change.

Using Materialized Views to pre-calculate the data

In several of these examples we've pointed to a Materialized View, which is a pre-calculated table that is updated incrementally as new data arrives.

These are also prepared for you in the companion Workspace, specifically for the UTC 15min rollup, and the normalized 15min rollup.

I hope you have the Time of your Life

So there you have it, a pile of examples to work with Time in Tinybird.

If there's more examples you'd like to see, or another tricky case that would be worth 'the treatment' from our Data Engineers, drop us a line in the Slack community!