DateTime columns can eat up storage space when you're working with billions of rows in ClickHouse. Converting dates to compact numeric formats like 202312 or 20231215 can cut your storage costs by 30-50% while making partition pruning and range queries faster.
This guide walks through ClickHouse's three numeric date conversion functions:
toYYYYMM()
toYYYYMMDD()
toYYYYMMDDhhmmss()
It gives practical examples for ingestion patterns, performance optimizations, and building APIs that leverage these compact formats.
What are the toYYYYMM functions in ClickHouse?
ClickHouse gives you three built-in functions that convert dates and timestamps into compact integers: toYYYYMM()
, toYYYYMMDD()
, and toYYYYMMDDhhmmss()
. Instead of storing full DateTime values, you get clean numeric formats that take up less space and query faster.
Think of it like compressing your date data into a more efficient format. Where you might have stored "2023-12-15 14:30:22", you can now store just "202312" for the month, "20231215" for the date, or "20231215143022" for the full timestamp.
toYYYYMM signature and return type
The toYYYYMM()
function takes any Date or DateTime value and spits out a UInt32 number in YYYYMM format. December 2023 becomes 202312, January 2024 becomes 202401, and so on.
SELECT toYYYYMM(toDate('2023-12-15')) AS month_numeric;
-- Result: 202312
SELECT toYYYYMM(toDateTime('2023-12-15 14:30:22')) AS month_from_datetime;
-- Result: 202312
Notice how both Date and DateTime inputs give you the same result, the function strips away everything except year and month.
toYYYYMMDD signature and return type
Moving up in precision, toYYYYMMDD()
captures year, month, and day as a UInt32 in YYYYMMDD format. This gives you date-level granularity without the storage overhead of full DateTime columns.
SELECT toYYYYMMDD(toDate('2023-12-15')) AS date_numeric;
-- Result: 20231215
SELECT toYYYYMMDD(now()) AS today_numeric;
-- Result depends on current date
toYYYYMMDDhhmmss signature and return type
The most precise option, toYYYYMMDDhhmmss()
, only works with DateTime values (not Date values) and returns a UInt64 in YYYYMMDDhhmmss format. You get the complete timestamp down to the second.
SELECT toYYYYMMDDhhmmss(toDateTime('2023-12-15 14:30:45')) AS full_timestamp;
-- Result: 20231215143045
Here's the catch: you lose subsecond precision. If your original DateTime had milliseconds, they disappear in the conversion.
Why store dates as compact integers
Converting dates to numeric formats isn't just about saving space, though that's certainly part of it. The real benefits show up when you're working with large datasets where every byte and every millisecond matters.
Smaller on-disk footprint
A UInt32 takes up 4 bytes, while a DateTime takes 8 bytes. When you're storing millions or billions of rows, that difference adds up fast. Plus, sequential integers compress much better than DateTime values because they have more predictable bit patterns.
We've seen storage reductions of 30-50% when switching from DateTime to numeric date columns, especially after compression kicks in. Your backup files get smaller, data transfers go faster, and you use less memory during queries.
Faster partition pruning
ClickHouse can eliminate irrelevant partitions much faster when working with simple integer comparisons versus DateTime parsing. When you partition a table by toYYYYMM(date_column)
, ClickHouse just needs to check if an integer falls within a range.
-- Fast numeric partition pruning
SELECT * FROM events WHERE month_partition >= 202301 AND month_partition <= 202312;
-- Slower DateTime-based filtering
SELECT * FROM events WHERE toYYYYMM(event_time) >= 202301 AND toYYYYMM(event_time) <= 202312;
The first query can skip entire partitions using simple math. The second query has to apply the conversion function to every row before it can filter.
Simplified sharding keys
Integer values work beautifully as distribution keys in distributed ClickHouse setups. You can create balanced shards using modulo operations on numeric date values, and cross-shard joins become more predictable because you're working with consistent data types.
Basic syntax and quick examples
The conversion functions follow straightforward patterns, though each has its own quirks about input types and precision.
Convert DateTime column to UInt32
You can apply the conversion functions directly to columns in your tables or combine them with ClickHouse's built-in date functions like now()
or today()
.
-- Convert current timestamp
SELECT
toYYYYMM(now()) AS current_month,
toYYYYMMDD(now()) AS current_date,
toYYYYMMDDhhmmss(now()) AS current_timestamp;
-- Convert table columns
SELECT
user_id,
toYYYYMM(created_at) AS signup_month,
toYYYYMMDD(created_at) AS signup_date
FROM users
LIMIT 5;
Filter by numeric period value
Once you have numeric date values, filtering becomes a matter of simple integer comparisons. No more wrestling with date parsing or timezone conversions in your WHERE clauses.
-- Filter by specific month
SELECT COUNT(*) FROM events
WHERE toYYYYMM(event_time) = 202312;
-- Filter by date range
SELECT * FROM orders
WHERE toYYYYMMDD(order_date) BETWEEN 20231201 AND 20231231;
-- Filter by hour range using full timestamp
SELECT * FROM logs
WHERE toYYYYMMDDhhmmss(log_time) >= 20231215140000
AND toYYYYMMDDhhmmss(log_time) < 20231215150000;
Converting back to DateTime safely
Going from numeric formats back to DateTime values requires more care because you lose information during the original conversion. You can't perfectly reconstruct what you started with.
Using toDateTime from UInt32
ClickHouse doesn't provide direct reverse functions, but you can reconstruct DateTime values using string manipulation and parsing functions. The catch is that you lose precision.
-- Convert YYYYMM back to first day of month
SELECT toDate(concat(toString(202312), '01')) AS first_day_of_month;
-- Result: 2023-12-01
-- Convert YYYYMMDD back to Date
SELECT toDate(toString(20231215)) AS reconstructed_date;
-- Result: 2023-12-15
When you convert from toYYYYMM()
results, you can only get back to the first day of that month, the original day information is gone forever.
Restoring time zone context
The toYYYYMMDDhhmmss()
function preserves more information, making reconstruction more accurate. However, you still lose subsecond precision and any timezone context from the original DateTime.
-- Reconstruct DateTime from YYYYMMDDhhmmss
SELECT parseDateTimeBestEffort('20231215143045') AS reconstructed_datetime;
-- Result: 2023-12-15 14:30:45
The reconstructed DateTime uses your session's default timezone, which might not match the original value's timezone.
Handling strings and time zones
Real-world data often comes as strings in various formats, and timezone handling becomes crucial when you're converting between different representations.
ClickHouse string to DateTime conversion
Before you can apply numeric conversion functions to string data, you need to parse the strings into proper DateTime values. ClickHouse's parseDateTimeBestEffort()
function handles most common string formats automatically.
-- Parse string dates before numeric conversion
SELECT
raw_date,
toYYYYMM(parseDateTimeBestEffort(raw_date)) AS month_numeric
FROM (
SELECT '2023-12-15 14:30:00' AS raw_date
UNION ALL
SELECT 'Dec 15, 2023 2:30 PM'
UNION ALL
SELECT '15/12/2023'
);
Parsing with parseDateTimeBestEffort
The parseDateTimeBestEffort()
function is remarkably flexible, it can handle ISO dates, American formats, European formats, and even natural language dates. For production code, consider using parseDateTimeBestEffortOrNull()
to avoid errors on unparseable strings.
-- Complete workflow from string to numeric with error handling
WITH parsed_dates AS (
SELECT
user_input,
parseDateTimeBestEffortOrNull(user_input) AS parsed_dt
FROM user_date_inputs
)
SELECT
user_input,
toYYYYMM(parsed_dt) AS month_num,
toYYYYMMDD(parsed_dt) AS date_num
FROM parsed_dates
WHERE parsed_dt IS NOT NULL;
Performance benchmarks on disk and memory
The performance benefits of numeric date formats become obvious when you're working with large datasets, though the exact improvements depend on your specific query patterns and data distribution.
Compression ratio results
In practice, storing dates as UInt32 values typically achieves 20-40% better compression ratios compared to DateTime columns. The improvement comes from more predictable bit patterns in sequential numeric values.
- DateTime columns: Require 8 bytes per value before compression
- UInt32 date columns: Require 4 bytes per value before compression
- Additional benefits: Better compression ratios, smaller backup files, faster network transfers
Query latency comparison
Numeric date filtering consistently outperforms DateTime-based operations, especially for range queries and partition pruning. The difference becomes more pronounced as your tables grow beyond 100 million rows.
-- Numeric filtering (faster approach)
SELECT COUNT(*) FROM large_table
WHERE date_partition BETWEEN 202301 AND 202312;
-- DateTime conversion filtering (slower approach)
SELECT COUNT(*) FROM large_table
WHERE toYYYYMM(timestamp_col) BETWEEN 202301 AND 202312;
The first query can use partition pruning and index lookups. The second query has to compute the conversion function for every row before filtering.
Ingestion patterns with materialized views
You can implement numeric date conversion during data ingestion to avoid runtime conversion overhead, though this approach requires planning your schema carefully.
One-time conversion during insert
The most efficient approach computes numeric date values during insertion using computed columns or explicit conversion in your INSERT statements.
-- Create table with computed numeric date columns
CREATE TABLE events (
event_id UInt64,
user_id UInt32,
event_time DateTime,
event_month UInt32 MATERIALIZED toYYYYMM(event_time),
event_date UInt32 MATERIALIZED toYYYYMMDD(event_time)
) ENGINE = MergeTree()
PARTITION BY event_month
ORDER BY (user_id, event_time);
With this setup, ClickHouse automatically computes the numeric date columns whenever you insert data, and you can use them for fast filtering without any conversion overhead.
On-the-fly conversion at query time
Converting during SELECT queries offers more flexibility but costs additional CPU cycles. This approach works well when you only occasionally need numeric date formats or when storage space isn't a primary concern.
The tradeoff is clear: pre-computed columns give you faster queries at the cost of additional storage and slightly slower inserts. Query-time conversion keeps your schema simpler but makes every query do extra work.
Building a log analytics API with Tinybird and numeric date partitioning
Tinybird's managed ClickHouse platform is perfect for log analytics where you need to efficiently filter by time periods. Here's how to build a complete log analytics pipeline using numeric date conversion for optimal performance.
Step 1: Create a data source for application logs
First, create a data source to store application logs with both original timestamps and pre-computed numeric dates. Create an app_logs.datasource
file:
SCHEMA >
`timestamp` DateTime64(3, 'UTC'),
`log_level` LowCardinality(String),
`service` LowCardinality(String),
`endpoint` String,
`method` LowCardinality(String),
`status_code` UInt16,
`response_time_ms` UInt32,
`user_id` String,
`ip_address` String,
`user_agent` String,
`error_message` String
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, service, log_level"
Deploy the data source:
tb --cloud deploy
Ingest sample log data using the Events API:
curl -X POST \
-H "Authorization: Bearer $TINYBIRD_TOKEN" \
-H "Content-Type: application/json" \
"https://api.tinybird.co/v0/events?name=app_logs" \
-d '[
{"timestamp": "2024-12-01 14:30:00.123", "log_level": "INFO", "service": "api-gateway", "endpoint": "/api/users", "method": "GET", "status_code": 200, "response_time_ms": 45, "user_id": "user_123", "ip_address": "192.168.1.100", "user_agent": "Mozilla/5.0", "error_message": ""},
{"timestamp": "2024-12-01 14:30:15.456", "log_level": "ERROR", "service": "user-service", "endpoint": "/api/users/profile", "method": "POST", "status_code": 500, "response_time_ms": 2500, "user_id": "user_456", "ip_address": "192.168.1.101", "user_agent": "curl/7.68.0", "error_message": "Database connection timeout"},
{"timestamp": "2024-12-01 14:31:00.789", "log_level": "WARN", "service": "payment-service", "endpoint": "/api/payments", "method": "POST", "status_code": 429, "response_time_ms": 100, "user_id": "user_789", "ip_address": "192.168.1.102", "user_agent": "PostmanRuntime/7.26.8", "error_message": "Rate limit exceeded"}
]'
Step 2: Create a materialized view for aggregated log metrics
Create a materialized view that pre-computes log analytics using numeric date formats for efficient time-based partitioning. Create a log_metrics_hourly.datasource
file:
SCHEMA >
`year_month` UInt32,
`year_month_day` UInt32,
`year_month_day_hour` UInt64,
`service` LowCardinality(String),
`log_level` LowCardinality(String),
`status_code_category` LowCardinality(String),
`total_requests` UInt64,
`error_count` UInt64,
`avg_response_time_ms` Float32,
`p95_response_time_ms` Float32,
`unique_users` UInt64,
`unique_ips` UInt64
ENGINE "SummingMergeTree"
ENGINE_PARTITION_KEY "year_month"
ENGINE_SORTING_KEY "year_month_day_hour, service, log_level, status_code_category"
Create a materialized view pipe log_aggregation_mv.pipe
:
NODE log_hourly_aggregation
SQL >
SELECT
toYYYYMM(timestamp) AS year_month,
toYYYYMMDD(timestamp) AS year_month_day,
toYYYYMMDDhhmmss(toStartOfHour(timestamp)) AS year_month_day_hour,
service,
log_level,
-- Categorize status codes for better analysis
CASE
WHEN status_code BETWEEN 200 AND 299 THEN '2xx_success'
WHEN status_code BETWEEN 300 AND 399 THEN '3xx_redirect'
WHEN status_code BETWEEN 400 AND 499 THEN '4xx_client_error'
WHEN status_code BETWEEN 500 AND 599 THEN '5xx_server_error'
ELSE 'other'
END AS status_code_category,
count() AS total_requests,
countIf(log_level = 'ERROR') AS error_count,
avg(response_time_ms) AS avg_response_time_ms,
quantile(0.95)(response_time_ms) AS p95_response_time_ms,
uniq(user_id) AS unique_users,
uniq(ip_address) AS unique_ips
FROM app_logs
WHERE user_id != '' -- Filter out empty user IDs
GROUP BY
year_month,
year_month_day,
year_month_day_hour,
service,
log_level,
status_code_category
TYPE MATERIALIZED
DATASOURCE log_metrics_hourly
Deploy the materialized view:
tb --cloud deploy
Step 3: Create log analytics API endpoints
Create a pipe that queries the materialized view for fast log analytics. Create log_analytics_api.pipe
:
TOKEN "log_analytics_read" READ
NODE hourly_log_metrics
SQL >
%
SELECT
year_month_day_hour,
service,
log_level,
status_code_category,
sum(total_requests) AS total_requests,
sum(error_count) AS error_count,
avgMerge(avg_response_time_ms) AS avg_response_time,
quantileMerge(0.95)(p95_response_time_ms) AS p95_response_time,
sum(unique_users) AS total_unique_users,
sum(unique_ips) AS total_unique_ips,
-- Calculate error rate
round((sum(error_count) * 100.0) / sum(total_requests), 2) AS error_rate_percent
FROM log_metrics_hourly
WHERE year_month_day >= {{Int32(start_date, 20241201)}}
AND year_month_day <= {{Int32(end_date, 20241231)}}
{\% if defined(service_filter) %}
AND service = {{String(service_filter)}}
{\% end %}
{\% if defined(log_level_filter) %}
AND log_level = {{String(log_level_filter)}}
{\% end %}
{\% if defined(status_category_filter) %}
AND status_code_category = {{String(status_category_filter)}}
{\% end %}
GROUP BY
year_month_day_hour,
service,
log_level,
status_code_category
ORDER BY year_month_day_hour DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 100)}}
{\% end %}
TYPE ENDPOINT
Create a summary API for daily service health. Create service_health_summary.pipe
:
TOKEN "service_health_read" READ
NODE daily_service_health
SQL >
%
SELECT
year_month_day,
service,
sum(total_requests) AS daily_requests,
sum(error_count) AS daily_errors,
round((sum(error_count) * 100.0) / sum(total_requests), 2) AS daily_error_rate,
avgMerge(avg_response_time_ms) AS avg_daily_response_time,
quantileMerge(0.95)(p95_response_time_ms) AS p95_daily_response_time,
sum(unique_users) AS daily_active_users
FROM log_metrics_hourly
WHERE year_month_day >= {{Int32(start_date, 20241201)}}
AND year_month_day <= {{Int32(end_date, 20241231)}}
{\% if defined(service_filter) %}
AND service = {{String(service_filter)}}
{\% end %}
GROUP BY year_month_day, service
ORDER BY year_month_day DESC, daily_requests DESC
{\% if defined(limit) %}
LIMIT {{Int32(limit, 50)}}
{\% end %}
TYPE ENDPOINT
Deploy the API endpoints:
tb --cloud deploy
Step 4: Query the log analytics APIs
Your log analytics APIs are now available with efficient numeric date filtering:
# Get hourly metrics for the last week
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
# Analyze errors for a specific service
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&service_filter=user-service&log_level_filter=ERROR" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
# Get daily service health summary
curl "https://api.tinybird.co/v0/pipes/service_health_summary.json?start_date=20241201&end_date=20241207" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
# Monitor 5xx server errors across all services
curl "https://api.tinybird.co/v0/pipes/log_analytics_api.json?start_date=20241201&end_date=20241207&status_category_filter=5xx_server_error" \
-H "Authorization: Bearer $TINYBIRD_TOKEN"
The APIs return efficient log analytics with numeric date-based partitioning:
{
"meta": [
{"name": "year_month_day_hour", "type": "UInt64"},
{"name": "service", "type": "String"},
{"name": "log_level", "type": "String"},
{"name": "status_code_category", "type": "String"},
{"name": "total_requests", "type": "UInt64"},
{"name": "error_count", "type": "UInt64"},
{"name": "avg_response_time", "type": "Float32"},
{"name": "p95_response_time", "type": "Float32"},
{"name": "total_unique_users", "type": "UInt64"},
{"name": "total_unique_ips", "type": "UInt64"},
{"name": "error_rate_percent", "type": "Float64"}
],
"data": [
{
"year_month_day_hour": 2024120114,
"service": "user-service",
"log_level": "ERROR",
"status_code_category": "5xx_server_error",
"total_requests": 1,
"error_count": 1,
"avg_response_time": 2500.0,
"p95_response_time": 2500.0,
"total_unique_users": 1,
"total_unique_ips": 1,
"error_rate_percent": 100.0
}
],
"rows": 1,
"statistics": {
"elapsed": 0.002,
"rows_read": 1,
"bytes_read": 128
}
}
This approach provides:
- Efficient time-based partitioning using
toYYYYMM()
for optimal query performance - Pre-aggregated metrics via materialized views for sub-millisecond API responses
- Flexible log analysis with service, error level, and status code filtering
- Production monitoring insights including error rates, response times, and user activity
- Scalable architecture that handles millions of log events per second
Your log data becomes available as production-ready APIs optimized for time-series analysis and monitoring dashboards. Start building with Tinybird's free plan to create your own log analytics APIs.
Additional resources
Frequently asked questions about numeric date formats
Why does toYYYYMM return UInt32 instead of UInt16?
UInt32 accommodates year values well beyond the UInt16 limit of 65,535. Even though current YYYYMM values like 202312 fit comfortably in UInt16 range, using UInt32 prevents issues as years progress beyond 6553 (which would create values exceeding UInt16 capacity).
It's a future-proofing decision that costs you an extra 2 bytes per value but eliminates potential overflow issues down the road.
How do null DateTime values behave with numeric conversion functions?
NULL DateTime inputs consistently return NULL numeric outputs across all three conversion functions. This maintains standard SQL NULL semantics and prevents unexpected errors in your queries.
SELECT
toYYYYMM(NULL) AS null_month, -- Returns NULL
toYYYYMMDD(NULL) AS null_date, -- Returns NULL
toYYYYMMDDhhmmss(NULL) AS null_time; -- Returns NULL
Can numeric date columns improve JOIN performance compared to DateTime?
Integer joins are typically faster than DateTime joins because numeric hash tables are more efficient and comparison operations are simpler. However, the performance difference is usually marginal unless you're joining very large tables where every millisecond counts.
The bigger advantage comes from using numeric dates as partition keys or in WHERE clauses rather than in JOIN conditions specifically.
How do time zone changes affect numeric date conversion results?
The toYYYYMM
family of functions uses your session's timezone setting when converting DateTime values. If your session timezone changes between queries, the same DateTime value might produce different numeric results.
-- Same DateTime, different session timezones can yield different dates
SET timezone = 'UTC';
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS utc_date;
SET timezone = 'US/Pacific';
SELECT toYYYYMMDD(toDateTime('2023-12-31 23:30:00')) AS pacific_date;
Maintain consistent timezone settings across your application to avoid unexpected date shifts, especially around midnight boundaries.
Are numeric date formats compatible with standard BI tools?
Most BI tools can work with integer date columns, though you might need to format them for display purposes. Many tools allow custom formatting rules to convert 20231215
back to 2023-12-15
for user-friendly dashboards.
-- Format for BI tool display
SELECT
event_month,
concat(
substring(toString(event_month), 1, 4),
'-',
substring(toString(event_month), 5, 2)
) AS formatted_month
FROM events;
Next steps with Tinybird
Tinybird eliminates the operational overhead of running ClickHouse while providing access to key performance benefits, including support for numeric date handling. The managed platform handles scaling, backups, and monitoring, allowing users to focus on building analytics features rather than managing database infrastructure.
Get started with Tinybird and publish your first real-time API with optimized date filtering in minutes.