Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

DateTime columns can eat up storage space when you're working with billions of rows in ClickHouse®. Converting dates to compact numeric formats like 202312 or 20231215 can cut your storage costs by 30-50% while making partition pruning and range queries faster.

This guide walks through ClickHouse®'s three numeric date conversion functions:

toYYYYMM()
toYYYYMMDD()
toYYYYMMDDhhmmss()

It gives practical examples for ingestion patterns, performance optimizations, and building APIs that leverage these compact formats.

What are the toYYYYMM functions in ClickHouse®?

ClickHouse gives you three built-in functions that convert dates and timestamps into compact integers: toYYYYMM(), toYYYYMMDD(), and toYYYYMMDDhhmmss(). Instead of storing full DateTime values, you get clean numeric formats that take up less space and query faster.

Think of it like compressing your date data into a more efficient format. Where you might have stored "2023-12-15 14:30:22", you can now store just "202312" for the month, "20231215" for the date, or "20231215143022" for the full timestamp.

toYYYYMM signature and return type

The toYYYYMM() function takes any Date or DateTime value and spits out a UInt32 number in YYYYMM format. December 2023 becomes 202312, January 2024 becomes 202401, and so on.

SELECT toYYYYMM(toDate('2023-12-15')) AS month_numeric;
-- Result: 202312

SELECT toYYYYMM(toDateTime('2023-12-15 14:30:22')) AS month_from_datetime;
-- Result: 202312

Notice how both Date and DateTime inputs give you the same result, the function strips away everything except year and month.

toYYYYMMDD signature and return type

Moving up in precision, toYYYYMMDD() captures year, month, and day as a UInt32 in YYYYMMDD format. This gives you date-level granularity without the storage overhead of full DateTime columns.

SELECT toYYYYMMDD(toDate('2023-12-15')) AS date_numeric;
-- Result: 20231215

SELECT toYYYYMMDD(now()) AS today_numeric;
-- Result depends on current date

toYYYYMMDDhhmmss signature and return type

The most precise option, toYYYYMMDDhhmmss(), only works with DateTime values (not Date values) and returns a UInt64 in YYYYMMDDhhmmss format. You get the complete timestamp down to the second.

SELECT toYYYYMMDDhhmmss(toDateTime('2023-12-15 14:30:45')) AS full_timestamp;
-- Result: 20231215143045

Here's the catch: you lose subsecond precision. If your original DateTime had milliseconds, they disappear in the conversion.

Why store dates as compact integers

Converting dates to numeric formats isn't just about saving space, though that's certainly part of it. The real benefits show up when you're working with large datasets where every byte and every millisecond matters.

Smaller on-disk footprint

A UInt32 takes up 4 bytes, while a DateTime takes 8 bytes. When you're storing millions or billions of rows, that difference adds up fast. Plus, sequential integers compress much better than DateTime values because they have more predictable bit patterns.

We've seen storage reductions of 30-50% when switching from DateTime to numeric date columns, especially after compression kicks in. Your backup files get smaller, data transfers go faster, and you use less memory during queries.

Faster partition pruning

ClickHouse can eliminate irrelevant partitions much faster when working with simple integer comparisons versus DateTime parsing. When you partition a table by toYYYYMM(date_column), ClickHouse just needs to check if an integer falls within a range.

-- Fast numeric partition pruning
SELECT * FROM events WHERE month_partition >= 202301 AND month_partition <= 202312;

-- Slower DateTime-based filtering  
SELECT * FROM events WHERE toYYYYMM(event_time) >= 202301 AND toYYYYMM(event_time) <= 202312;

The first query can skip entire partitions using simple math. The second query has to apply the conversion function to every row before it can filter.

Simplified sharding keys

Integer values work beautifully as distribution keys in distributed ClickHouse setups. You can create balanced shards using modulo operations on numeric date values, and cross-shard joins become more predictable because you're working with consistent data types.

Basic syntax and quick examples

The conversion functions follow straightforward patterns, though each has its own quirks about input types and precision.

Convert DateTime column to UInt32

You can apply the conversion functions directly to columns in your tables or combine them with ClickHouse's built-in date functions like now() or today().

-- Convert current timestamp
SELECT 
    toYYYYMM(now()) AS current_month,
    toYYYYMMDD(now()) AS current_date,
    toYYYYMMDDhhmmss(now()) AS current_timestamp;

-- Convert table columns
SELECT 
    user_id,
    toYYYYMM(created_at) AS signup_month,
    toYYYYMMDD(created_at) AS signup_date
FROM users 
LIMIT 5;

Filter by numeric period value

Once you have numeric date values, filtering becomes a matter of simple integer comparisons. No more wrestling with date parsing or timezone conversions in your WHERE clauses.

-- Filter by specific month
SELECT COUNT(*) FROM events 
WHERE toYYYYMM(event_time) = 202312;

-- Filter by date range
SELECT * FROM orders 
WHERE toYYYYMMDD(order_date) BETWEEN 20231201 AND 20231231;

-- Filter by hour range using full timestamp
SELECT * FROM logs 
WHERE toYYYYMMDDhhmmss(log_time) >= 20231215140000 
  AND toYYYYMMDDhhmmss(log_time) < 20231215150000;

Converting back to DateTime safely

Going from numeric formats back to DateTime values requires more care because you lose information during the original conversion. You can't perfectly reconstruct what you started with.

Using toDateTime from UInt32

ClickHouse doesn't provide direct reverse functions, but you can reconstruct DateTime values using string manipulation and parsing functions. The catch is that you lose precision.

-- Convert YYYYMM back to first day of month
SELECT toDate(concat(toString(202312), '01')) AS first_day_of_month;
-- Result: 2023-12-01

-- Convert YYYYMMDD back to Date
SELECT toDate(toString(20231215)) AS reconstructed_date;  
-- Result: 2023-12-15

When you convert from toYYYYMM() results, you can only get back to the first day of that month, the original day information is gone forever.

Restoring time zone context

The toYYYYMMDDhhmmss() function preserves more information, making reconstruction more accurate. However, you still lose subsecond precision and any timezone context from the original DateTime.

-- Reconstruct DateTime from YYYYMMDDhhmmss
SELECT parseDateTimeBestEffort('20231215143045') AS reconstructed_datetime;
-- Result: 2023-12-15 14:30:45

The reconstructed DateTime uses your session's default timezone, which might not match the original value's timezone.

Handling strings and time zones

Real-world data often comes as strings in various formats, and timezone handling becomes crucial when you're converting between different representations.

ClickHouse string to DateTime conversion

Before you can apply numeric conversion functions to string data, you need to parse the strings into proper DateTime values. ClickHouse's parseDateTimeBestEffort() function handles most common string formats automatically.

-- Parse string dates before numeric conversion
SELECT 
    raw_date,
    toYYYYMM(parseDateTimeBestEffort(raw_date)) AS month_numeric
FROM (
    SELECT '2023-12-15 14:30:00' AS raw_date
    UNION ALL
    SELECT 'Dec 15, 2023 2:30 PM'
    UNION ALL  
    SELECT '15/12/2023'
);

Parsing with parseDateTimeBestEffort

The parseDateTimeBestEffort() function is remarkably flexible, it can handle ISO dates, American formats, European formats, and even natural language dates. For production code, consider using parseDateTimeBestEffortOrNull() to avoid errors on unparseable strings.

-- Complete workflow from string to numeric with error handling
WITH parsed_dates AS (
    SELECT 
        user_input,
        parseDateTimeBestEffortOrNull(user_input) AS parsed_dt
    FROM user_date_inputs
)
SELECT 
    user_input,
    toYYYYMM(parsed_dt) AS month_num,
    toYYYYMMDD(parsed_dt) AS date_num
FROM parsed_dates
WHERE parsed_dt IS NOT NULL;

Performance benchmarks on disk and memory

The performance benefits of numeric date formats become obvious when you're working with large datasets, though the exact improvements depend on your specific query patterns and data distribution.

Compression ratio results

In practice, storing dates as UInt32 values typically achieves 20-40% better compression ratios compared to DateTime columns. The improvement comes from more predictable bit patterns in sequential numeric values.

DateTime columns: Require 8 bytes per value before compression
UInt32 date columns: Require 4 bytes per value before compression
Additional benefits: Better compression ratios, smaller backup files, faster network transfers

Query latency comparison

Numeric date filtering consistently outperforms DateTime-based operations, especially for range queries and partition pruning. The difference becomes more pronounced as your tables grow beyond 100 million rows.

-- Numeric filtering (faster approach)
SELECT COUNT(*) FROM large_table 
WHERE date_partition BETWEEN 202301 AND 202312;

-- DateTime conversion filtering (slower approach)
SELECT COUNT(*) FROM large_table 
WHERE toYYYYMM(timestamp_col) BETWEEN 202301 AND 202312;

The first query can use partition pruning and index lookups. The second query has to compute the conversion function for every row before filtering.

Ingestion patterns with materialized views

You can implement numeric date conversion during data ingestion to avoid runtime conversion overhead, though this approach requires planning your schema carefully.

One-time conversion during insert

The most efficient approach computes numeric date values during insertion using computed columns or explicit conversion in your INSERT statements.

-- Create table with computed numeric date columns
CREATE TABLE events (
    event_id UInt64,
    user_id UInt32,
    event_time DateTime,
    event_month UInt32 MATERIALIZED toYYYYMM(event_time),
    event_date UInt32 MATERIALIZED toYYYYMMDD(event_time)
) ENGINE = MergeTree()
PARTITION BY event_month
ORDER BY (user_id, event_time);

With this setup, ClickHouse automatically computes the numeric date columns whenever you insert data, and you can use them for fast filtering without any conversion overhead.

On-the-fly conversion at query time

Converting during SELECT queries offers more flexibility but costs additional CPU cycles. This approach works well when you only occasionally need numeric date formats or when storage space isn't a primary concern.

The tradeoff is clear: pre-computed columns give you faster queries at the cost of additional storage and slightly slower inserts. Query-time conversion keeps your schema simpler but makes every query do extra work.

Building a log analytics API with Tinybird and numeric date partitioning

Tinybird's managed ClickHouse platform is perfect for log analytics where you need to efficiently filter by time periods. Here's how to build a complete log analytics pipeline using numeric date conversion for optimal performance.

Step 1: Create a data source for application logs

First, create a data source to store application logs with both original timestamps and pre-computed numeric dates. Create an app_logs.datasource file:

SCHEMA >
    `timestamp` DateTime64(3, 'UTC'),
    `log_level` LowCardinality(String),
    `service` LowCardinality(String),
    `endpoint` String,
    `method` LowCardinality(String),
    `status_code` UInt16,
    `response_time_ms` UInt32,
    `user_id` String,
    `ip_address` String,
    `user_agent` String,
    `error_message` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, service, log_level"

Deploy the data source:

tb --cloud deploy

Ingest sample log data using the Events API:

curl -X POST \
  -H "Authorization: Bearer $TINYBIRD_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.tinybird.co/v0/events?name=app_logs" \
  -d '[
    {"timestamp": "2024-12-01 14:30:00.123", "log_level": "INFO", "service": "api-gateway", "endpoint": "/api/users", "method": "GET", "status_code": 200, "response_time_ms": 45, "user_id": "user_123", "ip_address": "192.168.1.100", "user_agent": "Mozilla/5.0", "error_message": ""},
    {"timestamp": "2024-12-01 14:30:15.456", "log_level": "ERROR", "service": "user-service", "endpoint": "/api/users/profile", "method": "POST", "status_code": 500, "response_time_ms": 2500, "user_id": "user_456", "ip_address": "192.168.1.101", "user_agent": "curl/7.68.0", "error_message": "Database connection timeout"},
    {"timestamp": "2024-12-01 14:31:00.789", "log_level": "WARN", "service": "payment-service", "endpoint": "/api/payments", "method": "POST", "status_code": 429, "response_time_ms": 100, "user_id": "user_789", "ip_address": "192.168.1.102", "user_agent": "PostmanRuntime/7.26.8", "error_message": "Rate limit exceeded"}
  ]'

Step 2: Create a materialized view for aggregated log metrics

Create a materialized view that pre-computes log analytics using numeric date formats for efficient time-based partitioning. Create a log_metrics_hourly.datasource file:

SCHEMA >
    `year_month` UInt32,
    `year_month_day` UInt32,
    `year_month_day_hour` UInt64,
    `service` LowCardinality(String),
    `log_level` LowCardinality(String),
    `status_code_category` LowCardinality(String),
    `total_requests` UInt64,
    `error_count` UInt64,
    `avg_response_time_ms` Float32,
    `p95_response_time_ms` Float32,
    `unique_users` UInt64,
    `unique_ips` UInt64

ENGINE "SummingMergeTree"
ENGINE_PARTITION_KEY "year_month"
ENGINE_SORTING_KEY "year_month_day_hour, service, log_level, status_code_category"

Create a materialized view pipe log_aggregation_mv.pipe:

NODE log_hourly_aggregation
SQL >
    SELECT 
        toYYYYMM(timestamp) AS year_month,
        toYYYYMMDD(timestamp) AS year_month_day,
        toYYYYMMDDhhmmss(toStartOfHour(timestamp)) AS year_month_day_hour,
        service,
        log_level,
        -- Categorize status codes for better analysis
        CASE 
            WHEN status_code BETWEEN 200 AND 299 THEN '2xx_success'
            WHEN status_code BETWEEN 300 AND 399 THEN '3xx_redirect' 
            WHEN status_code BETWEEN 400 AND 499 THEN '4xx_client_error'
            WHEN status_code BETWEEN 500 AND 599 THEN '5xx_server_error'
            ELSE 'other'
        END AS status_code_category,
        count() AS total_requests,
        countIf(log_level = 'ERROR') AS error_count,
        avg(response_time_ms) AS avg_response_time_ms,
        quantile(0.95)(response_time_ms) AS p95_response_time_ms,
        uniq(user_id) AS unique_users,
        uniq(ip_address) AS unique_ips
    FROM app_logs
    WHERE user_id != ''  -- Filter out empty user IDs
    GROUP BY 
        year_month,
        year_month_day, 
        year_month_day_hour,
        service,
        log_level,
        status_code_category

TYPE MATERIALIZED
DATASOURCE log_metrics_hourly

Deploy the materialized view:

tb --cloud deploy

Step 3: Create log analytics API endpoints

Create a pipe that queries the materialized view for fast log analytics. Create log_analytics_api.pipe:

TOKEN "log_analytics_read" READ

NODE hourly_log_metrics
SQL >
    %
    SELECT 
        year_month_day_hour,
        service,
        log_level,
        status_code_category,
        sum(total_requests) AS total_requests,
        sum(error_count) AS error_count,
        avgMerge(avg_response_time_ms) AS avg_response_time,
        quantileMerge(0.95)(p95_response_time_ms) AS p95_response_time,
        sum(unique_users) AS total_unique_users,
        sum(unique_ips) AS total_unique_ips,
        -- Calculate error rate
        round((sum(error_count) * 100.0) / sum(total_requests), 2) AS error_rate_percent
    FROM log_metrics_hourly
    WHERE year_month_day >= {{Int32(start_date, 20241201)}}
      AND year_month_day <= {{Int32(end_date, 20241231)}}
    {\% if defined(service_filter) %}
      AND service = {{String(service_filter)}}
    {\% end %}
    {\% if defined(log_level_filter) %}
      AND log_level = {{String(log_level_filter)}}
    {\% end %}
    {\% if defined(status_category_filter) %}
      AND status_code_category = {{String(status_category_filter)}}
    {\% end %}
    GROUP BY 
        year_month_day_hour,
        service,
        log_level,
        status_code_category
    ORDER BY year_month_day_hour DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT

Create a summary API for daily service health. Create service_health_summary.pipe:

TOKEN "service_health_read" READ

NODE daily_service_health
SQL >
    %
    SELECT 
        year_month_day,
        service,
        sum(total_requests) AS daily_requests,
        sum(error_count) AS daily_errors, 
        round((sum(error_count) * 100.0) / sum(total_requests), 2) AS daily_error_rate,
        avgMerge(avg_response_time_ms) AS avg_daily_response_time,
        quantileMerge(0.95)(p95_response_time_ms) AS p95_daily_response_time,
        sum(unique_users) AS daily_active_users
    FROM log_metrics_hourly
    WHERE year_month_day >= {{Int32(start_date, 20241201)}}
      AND year_month_day <= {{Int32(end_date, 20241231)}}
    {\% if defined(service_filter) %}
      AND service = {{String(service_filter)}}
    {\% end %}
    GROUP BY year_month_day, service
    ORDER BY year_month_day DESC, daily_requests DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 50)}}
    {\% end %}

TYPE ENDPOINT

Deploy the API endpoints:

tb --cloud deploy

Step 4: Query the log analytics APIs

Your log analytics APIs are now available with efficient numeric date filtering:

# Get hourly metrics for the last week
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Analyze errors for a specific service
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&service_filter=user-service&log_level_filter=ERROR" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Get daily service health summary
curl "https://api.tinybird.co/v0/pipes/service_health_summary.json?start_date=20241201&end_date=20241207" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Monitor 5xx server errors across all services
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&status_category_filter=5xx_server_error" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

The APIs return efficient log analytics with numeric date-based partitioning:

{
    "meta": [
        {"name": "year_month_day_hour", "type": "UInt64"},
        {"name": "service", "type": "String"},
        {"name": "log_level", "type": "String"},
        {"name": "status_code_category", "type": "String"},
        {"name": "total_requests", "type": "UInt64"},
        {"name": "error_count", "type": "UInt64"},
        {"name": "avg_response_time", "type": "Float32"},
        {"name": "p95_response_time", "type": "Float32"},
        {"name": "total_unique_users", "type": "UInt64"},
        {"name": "total_unique_ips", "type": "UInt64"},
        {"name": "error_rate_percent", "type": "Float64"}
    ],
    "data": [
        {
            "year_month_day_hour": 2024120114,
            "service": "user-service",
            "log_level": "ERROR",
            "status_code_category": "5xx_server_error",
            "total_requests": 1,
            "error_count": 1,
            "avg_response_time": 2500.0,
            "p95_response_time": 2500.0,
            "total_unique_users": 1,
            "total_unique_ips": 1,
            "error_rate_percent": 100.0
        }
    ],
    "rows": 1,
    "statistics": {
        "elapsed": 0.002,
        "rows_read": 1,
        "bytes_read": 128
    }
}

This approach provides:

Efficient time-based partitioning using toYYYYMM() for optimal query performance
Pre-aggregated metrics via materialized views for sub-millisecond API responses
Flexible log analysis with service, error level, and status code filtering
Production monitoring insights including error rates, response times, and user activity
Scalable architecture that handles millions of log events per second

Your log data becomes available as production-ready APIs optimized for time-series analysis and monitoring dashboards. Start building with Tinybird's free plan to create your own log analytics APIs.

Additional resources

Frequently asked questions about numeric date formats

Why does toYYYYMM return UInt32 instead of UInt16?

UInt32 accommodates year values well beyond the UInt16 limit of 65,535. Even though current YYYYMM values like 202312 fit comfortably in UInt16 range, using UInt32 prevents issues as years progress beyond 6553 (which would create values exceeding UInt16 capacity).

It's a future-proofing decision that costs you an extra 2 bytes per value but eliminates potential overflow issues down the road.

How do null DateTime values behave with numeric conversion functions?

NULL DateTime inputs consistently return NULL numeric outputs across all three conversion functions. This maintains standard SQL NULL semantics and prevents unexpected errors in your queries.

SELECT 
    toYYYYMM(NULL) AS null_month,        -- Returns NULL
    toYYYYMMDD(NULL) AS null_date,       -- Returns NULL  
    toYYYYMMDDhhmmss(NULL) AS null_time; -- Returns NULL

Can numeric date columns improve JOIN performance compared to DateTime?

Integer joins are typically faster than DateTime joins because numeric hash tables are more efficient and comparison operations are simpler. However, the performance difference is usually marginal unless you're joining very large tables where every millisecond counts.

The bigger advantage comes from using numeric dates as partition keys or in WHERE clauses rather than in JOIN conditions specifically.

How do time zone changes affect numeric date conversion results?

The toYYYYMM family of functions uses your session's timezone setting when converting DateTime values. If your session timezone changes between queries, the same DateTime value might produce different numeric results.

-- Same DateTime, different session timezones can yield different dates
SET timezone = 'UTC';
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS utc_date;

SET timezone = 'US/Pacific';  
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS pacific_date;

Maintain consistent timezone settings across your application to avoid unexpected date shifts, especially around midnight boundaries.

Are numeric date formats compatible with standard BI tools?

Most BI tools can work with integer date columns, though you might need to format them for display purposes. Many tools allow custom formatting rules to convert 20231215 back to 2023-12-15 for user-friendly dashboards.

-- Format for BI tool display
SELECT 
    event_month,
    concat(
        substring(toString(event_month), 1, 4), 
        '-', 
        substring(toString(event_month), 5, 2)
    ) AS formatted_month
FROM events;

Next steps with Tinybird

Tinybird eliminates the operational overhead of running ClickHouse while providing access to key performance benefits, including support for numeric date handling. The managed platform handles scaling, backups, and monitoring, allowing users to focus on building analytics features rather than managing database infrastructure.

Get started with Tinybird and publish your first real-time API with optimized date filtering in minutes.

Skip the infra work. Deploy your first ClickHouse® project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

How to convert dates to compact numeric formats in ClickHouse®

What are the toYYYYMM functions in ClickHouse®?

toYYYYMM signature and return type

toYYYYMMDD signature and return type

toYYYYMMDDhhmmss signature and return type

Why store dates as compact integers

Smaller on-disk footprint

Faster partition pruning

Simplified sharding keys

Basic syntax and quick examples

Convert DateTime column to UInt32

Filter by numeric period value

Converting back to DateTime safely

Using toDateTime from UInt32

Restoring time zone context

Handling strings and time zones

ClickHouse string to DateTime conversion

Parsing with parseDateTimeBestEffort

Performance benchmarks on disk and memory

Compression ratio results

Query latency comparison

Ingestion patterns with materialized views

One-time conversion during insert

On-the-fly conversion at query time

Building a log analytics API with Tinybird and numeric date partitioning

Step 1: Create a data source for application logs

Step 2: Create a materialized view for aggregated log metrics

Step 3: Create log analytics API endpoints

Step 4: Query the log analytics APIs

Additional resources

Frequently asked questions about numeric date formats

Why does toYYYYMM return UInt32 instead of UInt16?

How do null DateTime values behave with numeric conversion functions?

Can numeric date columns improve JOIN performance compared to DateTime?

How do time zone changes affect numeric date conversion results?

Are numeric date formats compatible with standard BI tools?

Next steps with Tinybird

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

How to convert dates to compact numeric formats in ClickHouse^®

Skip the infra work. Deploy your first ClickHouse^®
project now.