---
title: "How to convert dates to compact numeric formats in ClickHouse®"
excerpt: "Learn how to convert DateTime values to compact numeric formats using ClickHouse®'s toYYYYMM, toYYYYMMDD, and toYYYYMMDDhhmmss functions for faster queries and storage."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-09-30 17:17:42"
publishedOn: "2025-09-30 17:17:42"
updatedOn: "2025-09-30 17:17:42"
status: "published"
---

DateTime columns can eat up storage space when you're working with billions of rows in ClickHouse{% sup %}®{% /sup %}. Converting dates to compact numeric formats like 202312 or 20231215 can cut your storage costs by 30-50% while making partition pruning and range queries faster.

This guide walks through ClickHouse{% sup %}®{% /sup %}'s three numeric date conversion functions:

- `toYYYYMM()`
- `toYYYYMMDD()`
- `toYYYYMMDDhhmmss()`

It gives practical examples for ingestion patterns, performance optimizations, and building APIs that leverage these compact formats.

## What are the toYYYYMM functions in ClickHouse{% sup %}®{% /sup %}?

ClickHouse{% sup %}®{% /sup %} gives you three built-in functions that convert dates and timestamps into compact integers: `toYYYYMM()`, `toYYYYMMDD()`, and `toYYYYMMDDhhmmss()`. Instead of storing full DateTime values, you get clean numeric formats that take up less space and query faster.

Think of it like compressing your date data into a more efficient format. Where you might have stored "2023-12-15 14:30:22", you can now store just "202312" for the month, "20231215" for the date, or "20231215143022" for the full timestamp.

### toYYYYMM signature and return type

The `toYYYYMM()` function takes any Date or DateTime value and spits out a UInt32 number in YYYYMM format. December 2023 becomes 202312, January 2024 becomes 202401, and so on.

```sql
SELECT toYYYYMM(toDate('2023-12-15')) AS month_numeric;
-- Result: 202312

SELECT toYYYYMM(toDateTime('2023-12-15 14:30:22')) AS month_from_datetime;
-- Result: 202312
```

Notice how both Date and DateTime inputs give you the same result, the function strips away everything except year and month.

### toYYYYMMDD signature and return type

Moving up in precision, `toYYYYMMDD()` captures year, month, and day as a UInt32 in YYYYMMDD format. This gives you date-level granularity without the storage overhead of full DateTime columns.

```sql
SELECT toYYYYMMDD(toDate('2023-12-15')) AS date_numeric;
-- Result: 20231215

SELECT toYYYYMMDD(now()) AS today_numeric;
-- Result depends on current date
```

### toYYYYMMDDhhmmss signature and return type

The most precise option, `toYYYYMMDDhhmmss()`, only works with DateTime values (not Date values) and returns a UInt64 in YYYYMMDDhhmmss format. You get the complete timestamp down to the second.

```sql
SELECT toYYYYMMDDhhmmss(toDateTime('2023-12-15 14:30:45')) AS full_timestamp;
-- Result: 20231215143045
```

Here's the catch: you lose subsecond precision. If your original DateTime had milliseconds, they disappear in the conversion.

## Why store dates as compact integers

Converting dates to numeric formats isn't just about saving space, though that's certainly part of it. The real benefits show up when you're working with large datasets where every byte and every millisecond matters.

### Smaller on-disk footprint

A UInt32 takes up 4 bytes, while a DateTime takes 8 bytes. When you're storing millions or billions of rows, that difference adds up fast. Plus, sequential integers compress much better than DateTime values because they have more predictable bit patterns.

We've seen storage reductions of 30-50% when switching from DateTime to numeric date columns, especially after compression kicks in. Your backup files get smaller, data transfers go faster, and you use less memory during queries.

### Faster partition pruning

ClickHouse{% sup %}®{% /sup %} can eliminate irrelevant partitions much faster when working with simple integer comparisons versus DateTime parsing. When you partition a table by `toYYYYMM(date_column)`, ClickHouse{% sup %}®{% /sup %} just needs to check if an integer falls within a range.

```sql
-- Fast numeric partition pruning
SELECT * FROM events WHERE month_partition >= 202301 AND month_partition <= 202312;

-- Slower DateTime-based filtering
SELECT * FROM events WHERE toYYYYMM(event_time) >= 202301 AND toYYYYMM(event_time) <= 202312;
```

The first query can skip entire partitions using simple math. The second query has to apply the conversion function to every row before it can filter.

### Simplified sharding keys

Integer values work beautifully as distribution keys in distributed ClickHouse{% sup %}®{% /sup %} setups. You can create balanced shards using modulo operations on numeric date values, and cross-shard joins become more predictable because you're working with consistent data types.

## Basic syntax and quick examples

The conversion functions follow straightforward patterns, though each has its own quirks about input types and precision.

### Convert DateTime column to UInt32

You can apply the conversion functions directly to columns in your tables or combine them with ClickHouse{% sup %}®{% /sup %}'s built-in date functions like `now()` or `today()`.

```sql
-- Convert current timestamp
SELECT
    toYYYYMM(now()) AS current_month,
    toYYYYMMDD(now()) AS current_date,
    toYYYYMMDDhhmmss(now()) AS current_timestamp;

-- Convert table columns
SELECT
    user_id,
    toYYYYMM(created_at) AS signup_month,
    toYYYYMMDD(created_at) AS signup_date
FROM users
LIMIT 5;
```

### Filter by numeric period value

Once you have numeric date values, filtering becomes a matter of simple integer comparisons. No more wrestling with date parsing or timezone conversions in your WHERE clauses.

```sql
-- Filter by specific month
SELECT COUNT(*) FROM events
WHERE toYYYYMM(event_time) = 202312;

-- Filter by date range
SELECT * FROM orders
WHERE toYYYYMMDD(order_date) BETWEEN 20231201 AND 20231231;

-- Filter by hour range using full timestamp
SELECT * FROM logs
WHERE toYYYYMMDDhhmmss(log_time) >= 20231215140000
  AND toYYYYMMDDhhmmss(log_time) < 20231215150000;
```

## Converting back to DateTime safely

Going from numeric formats back to DateTime values requires more care because you lose information during the original conversion. You can't perfectly reconstruct what you started with.

### Using toDateTime from UInt32

ClickHouse{% sup %}®{% /sup %} doesn't provide direct reverse functions, but you can reconstruct DateTime values using string manipulation and parsing functions. The catch is that you lose precision.

```sql
-- Convert YYYYMM back to first day of month
SELECT toDate(concat(toString(202312), '01')) AS first_day_of_month;
-- Result: 2023-12-01

-- Convert YYYYMMDD back to Date
SELECT toDate(toString(20231215)) AS reconstructed_date;
-- Result: 2023-12-15
```

When you convert from `toYYYYMM()` results, you can only get back to the first day of that month, the original day information is gone forever.

### Restoring time zone context

The `toYYYYMMDDhhmmss()` function preserves more information, making reconstruction more accurate. However, you still lose subsecond precision and any timezone context from the original DateTime.

```sql
-- Reconstruct DateTime from YYYYMMDDhhmmss
SELECT parseDateTimeBestEffort('20231215143045') AS reconstructed_datetime;
-- Result: 2023-12-15 14:30:45
```

The reconstructed DateTime uses your session's default timezone, which might not match the original value's timezone.

## Handling strings and time zones

Real-world data often comes as strings in various formats, and timezone handling becomes crucial when you're converting between different representations.

### ClickHouse{% sup %}®{% /sup %} string to DateTime conversion

Before you can apply numeric conversion functions to string data, you need to parse the strings into proper DateTime values. ClickHouse{% sup %}®{% /sup %}'s `parseDateTimeBestEffort()` function handles most common string formats automatically.

```sql
-- Parse string dates before numeric conversion
SELECT
    raw_date,
    toYYYYMM(parseDateTimeBestEffort(raw_date)) AS month_numeric
FROM (
    SELECT '2023-12-15 14:30:00' AS raw_date
    UNION ALL
    SELECT 'Dec 15, 2023 2:30 PM'
    UNION ALL
    SELECT '15/12/2023'
);
```

### Parsing with parseDateTimeBestEffort

The `parseDateTimeBestEffort()` function is remarkably flexible, it can handle ISO dates, American formats, European formats, and even natural language dates. For production code, consider using `parseDateTimeBestEffortOrNull()` to avoid errors on unparseable strings.

```sql
-- Complete workflow from string to numeric with error handling
WITH parsed_dates AS (
    SELECT
        user_input,
        parseDateTimeBestEffortOrNull(user_input) AS parsed_dt
    FROM user_date_inputs
)
SELECT
    user_input,
    toYYYYMM(parsed_dt) AS month_num,
    toYYYYMMDD(parsed_dt) AS date_num
FROM parsed_dates
WHERE parsed_dt IS NOT NULL;
```

## Performance benchmarks on disk and memory

The performance benefits of numeric date formats become obvious when you're working with large datasets, though the exact improvements depend on your specific query patterns and data distribution.

### Compression ratio results

In practice, storing dates as UInt32 values typically achieves 20-40% better compression ratios compared to DateTime columns. The improvement comes from more predictable bit patterns in sequential numeric values.

- **DateTime columns**: Require 8 bytes per value before compression
- **UInt32 date columns**: Require 4 bytes per value before compression
- **Additional benefits**: Better compression ratios, smaller backup files, faster network transfers

### Query latency comparison

Numeric date filtering consistently outperforms DateTime-based operations, especially for range queries and partition pruning. The difference becomes more pronounced as your tables grow beyond 100 million rows.

```sql
-- Numeric filtering (faster approach)
SELECT COUNT(*) FROM large_table
WHERE date_partition BETWEEN 202301 AND 202312;

-- DateTime conversion filtering (slower approach)
SELECT COUNT(*) FROM large_table
WHERE toYYYYMM(timestamp_col) BETWEEN 202301 AND 202312;
```

The first query can use partition pruning and index lookups. The second query has to compute the conversion function for every row before filtering.

## Ingestion patterns with materialized views

You can implement numeric date conversion during data ingestion to avoid runtime conversion overhead, though this approach requires planning your schema carefully.

### One-time conversion during insert

The most efficient approach computes numeric date values during insertion using computed columns or explicit conversion in your INSERT statements.

```sql
-- Create table with computed numeric date columns
CREATE TABLE events (
    event_id UInt64,
    user_id UInt32,
    event_time DateTime,
    event_month UInt32 MATERIALIZED toYYYYMM(event_time),
    event_date UInt32 MATERIALIZED toYYYYMMDD(event_time)
) ENGINE = MergeTree()
PARTITION BY event_month
ORDER BY (user_id, event_time);
```

With this setup, ClickHouse{% sup %}®{% /sup %} automatically computes the numeric date columns whenever you insert data, and you can use them for fast filtering without any conversion overhead.

### On-the-fly conversion at query time

Converting during SELECT queries offers more flexibility but costs additional CPU cycles. This approach works well when you only occasionally need numeric date formats or when storage space isn't a primary concern.

The tradeoff is clear: pre-computed columns give you faster queries at the cost of additional storage and slightly slower inserts. Query-time conversion keeps your schema simpler but makes every query do extra work.

## Building a log analytics API with Tinybird and numeric date partitioning

[Tinybird's](https://www.tinybird.co) managed ClickHouse{% sup %}®{% /sup %} platform is perfect for log analytics where you need to efficiently filter by time periods. Here's how to build a complete log analytics pipeline using numeric date conversion for optimal performance.

### Step 1: Create a data source for application logs

First, create a data source to store application logs with both original timestamps and pre-computed numeric dates. Create an `app_logs.datasource` file:

```tinybird
SCHEMA >
    `timestamp` DateTime64(3, 'UTC'),
    `log_level` LowCardinality(String),
    `service` LowCardinality(String),
    `endpoint` String,
    `method` LowCardinality(String),
    `status_code` UInt16,
    `response_time_ms` UInt32,
    `user_id` String,
    `ip_address` String,
    `user_agent` String,
    `error_message` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, service, log_level"
```

Deploy the data source:

```bash
tb --cloud deploy
```

Ingest sample log data using the [Events API](https://www.tinybird.co/docs/forward/get-data-in/events-api):

```bash
curl -X POST \
  -H "Authorization: Bearer $TINYBIRD_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.tinybird.co/v0/events?name=app_logs" \
  -d '[
    {"timestamp": "2024-12-01 14:30:00.123", "log_level": "INFO", "service": "api-gateway", "endpoint": "/api/users", "method": "GET", "status_code": 200, "response_time_ms": 45, "user_id": "user_123", "ip_address": "192.168.1.100", "user_agent": "Mozilla/5.0", "error_message": ""},
    {"timestamp": "2024-12-01 14:30:15.456", "log_level": "ERROR", "service": "user-service", "endpoint": "/api/users/profile", "method": "POST", "status_code": 500, "response_time_ms": 2500, "user_id": "user_456", "ip_address": "192.168.1.101", "user_agent": "curl/7.68.0", "error_message": "Database connection timeout"},
    {"timestamp": "2024-12-01 14:31:00.789", "log_level": "WARN", "service": "payment-service", "endpoint": "/api/payments", "method": "POST", "status_code": 429, "response_time_ms": 100, "user_id": "user_789", "ip_address": "192.168.1.102", "user_agent": "PostmanRuntime/7.26.8", "error_message": "Rate limit exceeded"}
  ]'
```

### Step 2: Create a materialized view for aggregated log metrics

Create a materialized view that pre-computes log analytics using numeric date formats for efficient time-based partitioning. Create a `log_metrics_hourly.datasource` file:

```tinybird
SCHEMA >
    `year_month` UInt32,
    `year_month_day` UInt32,
    `year_month_day_hour` UInt64,
    `service` LowCardinality(String),
    `log_level` LowCardinality(String),
    `status_code_category` LowCardinality(String),
    `total_requests` UInt64,
    `error_count` UInt64,
    `avg_response_time_ms` Float32,
    `p95_response_time_ms` Float32,
    `unique_users` UInt64,
    `unique_ips` UInt64

ENGINE "SummingMergeTree"
ENGINE_PARTITION_KEY "year_month"
ENGINE_SORTING_KEY "year_month_day_hour, service, log_level, status_code_category"
```

Create a materialized view pipe `log_aggregation_mv.pipe`:

```tinybird
NODE log_hourly_aggregation
SQL >
    SELECT
        toYYYYMM(timestamp) AS year_month,
        toYYYYMMDD(timestamp) AS year_month_day,
        toYYYYMMDDhhmmss(toStartOfHour(timestamp)) AS year_month_day_hour,
        service,
        log_level,
        -- Categorize status codes for better analysis
        CASE
            WHEN status_code BETWEEN 200 AND 299 THEN '2xx_success'
            WHEN status_code BETWEEN 300 AND 399 THEN '3xx_redirect'
            WHEN status_code BETWEEN 400 AND 499 THEN '4xx_client_error'
            WHEN status_code BETWEEN 500 AND 599 THEN '5xx_server_error'
            ELSE 'other'
        END AS status_code_category,
        count() AS total_requests,
        countIf(log_level = 'ERROR') AS error_count,
        avg(response_time_ms) AS avg_response_time_ms,
        quantile(0.95)(response_time_ms) AS p95_response_time_ms,
        uniq(user_id) AS unique_users,
        uniq(ip_address) AS unique_ips
    FROM app_logs
    WHERE user_id != ''  -- Filter out empty user IDs
    GROUP BY
        year_month,
        year_month_day,
        year_month_day_hour,
        service,
        log_level,
        status_code_category

TYPE MATERIALIZED
DATASOURCE log_metrics_hourly
```

Deploy the materialized view:

```bash
tb --cloud deploy
```

### Step 3: Create log analytics API endpoints

Create a pipe that queries the materialized view for fast log analytics. Create `log_analytics_api.pipe`:

```tinybird
TOKEN "log_analytics_read" READ

NODE hourly_log_metrics
SQL >
    %
    SELECT
        year_month_day_hour,
        service,
        log_level,
        status_code_category,
        sum(total_requests) AS total_requests,
        sum(error_count) AS error_count,
        avgMerge(avg_response_time_ms) AS avg_response_time,
        quantileMerge(0.95)(p95_response_time_ms) AS p95_response_time,
        sum(unique_users) AS total_unique_users,
        sum(unique_ips) AS total_unique_ips,
        -- Calculate error rate
        round((sum(error_count) * 100.0) / sum(total_requests), 2) AS error_rate_percent
    FROM log_metrics_hourly
    WHERE year_month_day >= {{Int32(start_date, 20241201)}}
      AND year_month_day <= {{Int32(end_date, 20241231)}}
    {\% if defined(service_filter) %}
      AND service = {{String(service_filter)}}
    {\% end %}
    {\% if defined(log_level_filter) %}
      AND log_level = {{String(log_level_filter)}}
    {\% end %}
    {\% if defined(status_category_filter) %}
      AND status_code_category = {{String(status_category_filter)}}
    {\% end %}
    GROUP BY
        year_month_day_hour,
        service,
        log_level,
        status_code_category
    ORDER BY year_month_day_hour DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT
```

Create a summary API for daily service health. Create `service_health_summary.pipe`:

```tinybird
TOKEN "service_health_read" READ

NODE daily_service_health
SQL >
    %
    SELECT
        year_month_day,
        service,
        sum(total_requests) AS daily_requests,
        sum(error_count) AS daily_errors,
        round((sum(error_count) * 100.0) / sum(total_requests), 2) AS daily_error_rate,
        avgMerge(avg_response_time_ms) AS avg_daily_response_time,
        quantileMerge(0.95)(p95_response_time_ms) AS p95_daily_response_time,
        sum(unique_users) AS daily_active_users
    FROM log_metrics_hourly
    WHERE year_month_day >= {{Int32(start_date, 20241201)}}
      AND year_month_day <= {{Int32(end_date, 20241231)}}
    {\% if defined(service_filter) %}
      AND service = {{String(service_filter)}}
    {\% end %}
    GROUP BY year_month_day, service
    ORDER BY year_month_day DESC, daily_requests DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 50)}}
    {\% end %}

TYPE ENDPOINT
```

Deploy the API endpoints:

```bash
tb --cloud deploy
```

### Step 4: Query the log analytics APIs

Your log analytics APIs are now available with efficient numeric date filtering:

```bash
# Get hourly metrics for the last week
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Analyze errors for a specific service
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&service_filter=user-service&log_level_filter=ERROR" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Get daily service health summary
curl "https://api.tinybird.co/v0/pipes/service_health_summary.json?start_date=20241201&end_date=20241207" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Monitor 5xx server errors across all services
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&status_category_filter=5xx_server_error" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"
```

The APIs return efficient log analytics with numeric date-based partitioning:

```json
{
    "meta": [
        {"name": "year_month_day_hour", "type": "UInt64"},
        {"name": "service", "type": "String"},
        {"name": "log_level", "type": "String"},
        {"name": "status_code_category", "type": "String"},
        {"name": "total_requests", "type": "UInt64"},
        {"name": "error_count", "type": "UInt64"},
        {"name": "avg_response_time", "type": "Float32"},
        {"name": "p95_response_time", "type": "Float32"},
        {"name": "total_unique_users", "type": "UInt64"},
        {"name": "total_unique_ips", "type": "UInt64"},
        {"name": "error_rate_percent", "type": "Float64"}
    ],
    "data": [
        {
            "year_month_day_hour": 2024120114,
            "service": "user-service",
            "log_level": "ERROR",
            "status_code_category": "5xx_server_error",
            "total_requests": 1,
            "error_count": 1,
            "avg_response_time": 2500.0,
            "p95_response_time": 2500.0,
            "total_unique_users": 1,
            "total_unique_ips": 1,
            "error_rate_percent": 100.0
        }
    ],
    "rows": 1,
    "statistics": {
        "elapsed": 0.002,
        "rows_read": 1,
        "bytes_read": 128
    }
}
```

This approach provides:

- **Efficient time-based partitioning** using `toYYYYMM()` for optimal query performance
- **Pre-aggregated metrics** via materialized views for sub-millisecond API responses
- **Flexible log analysis** with service, error level, and status code filtering
- **Production monitoring insights** including error rates, response times, and user activity
- **Scalable architecture** that handles millions of log events per second

Your log data becomes available as production-ready APIs optimized for time-series analysis and monitoring dashboards. [Start building with Tinybird's free plan](https://cloud.tinybird.co/signup) to create your own log analytics APIs.

## Additional resources

- [Date and Time Functions](https://clickhouse.com/docs/en/sql-reference/functions/date-time-functions)
- [Type Conversion Functions](https://clickhouse.com/docs/en/sql-reference/functions/type-conversion-functions)
- [Tinybird Logs Explorer Template](https://www.tinybird.co/templates/logs-explorer-template)

## Frequently asked questions about numeric date formats

### Why does toYYYYMM return UInt32 instead of UInt16?

UInt32 accommodates year values well beyond the UInt16 limit of 65,535. Even though current YYYYMM values like 202312 fit comfortably in UInt16 range, using UInt32 prevents issues as years progress beyond 6553 (which would create values exceeding UInt16 capacity).

It's a future-proofing decision that costs you an extra 2 bytes per value but eliminates potential overflow issues down the road.

### How do null DateTime values behave with numeric conversion functions?

NULL DateTime inputs consistently return NULL numeric outputs across all three conversion functions. This maintains standard SQL NULL semantics and prevents unexpected errors in your queries.

```sql
SELECT
    toYYYYMM(NULL) AS null_month,        -- Returns NULL
    toYYYYMMDD(NULL) AS null_date,       -- Returns NULL
    toYYYYMMDDhhmmss(NULL) AS null_time; -- Returns NULL
```

### Can numeric date columns improve JOIN performance compared to DateTime?

Integer joins are typically faster than DateTime joins because numeric hash tables are more efficient and comparison operations are simpler. However, the performance difference is usually marginal unless you're joining very large tables where every millisecond counts.

The bigger advantage comes from using numeric dates as partition keys or in WHERE clauses rather than in JOIN conditions specifically.

### How do time zone changes affect numeric date conversion results?

The `toYYYYMM` family of functions uses your session's timezone setting when converting DateTime values. If your session timezone changes between queries, the same DateTime value might produce different numeric results.

```sql
-- Same DateTime, different session timezones can yield different dates
SET timezone = 'UTC';
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS utc_date;

SET timezone = 'US/Pacific';
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS pacific_date;
```

Maintain consistent timezone settings across your application to avoid unexpected date shifts, especially around midnight boundaries.

### Are numeric date formats compatible with standard BI tools?

Most BI tools can work with integer date columns, though you might need to format them for display purposes. Many tools allow custom formatting rules to convert `20231215` back to `2023-12-15` for user-friendly dashboards.

```sql
-- Format for BI tool display
SELECT
    event_month,
    concat(
        substring(toString(event_month), 1, 4),
        '-',
        substring(toString(event_month), 5, 2)
    ) AS formatted_month
FROM events;
```

## Next steps with Tinybird

Tinybird eliminates the operational overhead of running ClickHouse{% sup %}®{% /sup %} while providing access to key performance benefits, including support for numeric date handling. The managed platform handles scaling, backups, and monitoring, allowing users to focus on building analytics features rather than managing database infrastructure.

[Get started with Tinybird](https://cloud.tinybird.co/signup) and publish your first real-time API with optimized date filtering in minutes.
