Dynamic Partitioning in ClickHouse®: Step-by-Step Guide

Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Dynamic partitioning in ClickHouse simplifies data management by automatically organizing data into partitions based on a specified key, such as time or category. This approach eliminates manual partition management, making it ideal for handling large or unpredictable data streams. Here's the key takeaway:

What It Does: Automatically segments data into partitions using the PARTITION BY clause.
Why It Matters: Simplifies handling streaming data, supports efficient retention policies, and enables better storage management.
How It Works: ClickHouse creates partitions dynamically during data insertion, leveraging MinMax indexes for faster data filtering and partition pruning.

If you're managing streaming data or large datasets, dynamic partitioning can help improve efficiency, reduce manual effort, and streamline data storage and retention. Below, we’ll explore how to set it up, manage it, and optimize its performance.

Setting Up Dynamic Partitioning in ClickHouse®

Requirements Before You Start

Before diving into dynamic partitioning, make sure you're using a recent version of ClickHouse that supports the MergeTree engine family. You'll also need database permissions to create tables and insert data. A basic understanding of SQL and experience with CREATE TABLE statements will be helpful.

Get familiar with ClickHouse's date manipulation functions like toYYYYMM(), toDate(), and toWeek(). Dynamic partitioning is particularly useful when your queries often filter by time ranges or specific categories, enabling partition pruning to improve query performance.

Creating a Table with Dynamic Partitions

Dynamic partitioning relies on the PARTITION BY clause in your table definition. This clause uses a SQL expression to determine how data is split into partitions. ClickHouse evaluates this expression for each row and automatically creates the necessary partitions.

Here’s an example of setting up a table for transaction data with monthly partitioning:

CREATE TABLE transactions (
    timestamp Date,
    transaction_id String,
    amount Float32,
    transaction_type_id UInt32
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, transaction_type_id);

In this case, the toYYYYMM(timestamp) function generates partitions in a "YYYYMM" format, such as "202507" for July 2025. According to the ClickHouse documentation:

"In most cases you do not need a partition key, and in most other cases you do not need a partition key more granular than by months." [6]

For more complex use cases, you can use tuple expressions to partition by multiple criteria. For example, if you're tracking events and need to partition by event type and week:

PARTITION BY (EventType, toWeek(EventTime))

Be cautious when choosing partition keys. They should have low cardinality to avoid creating too many partitions. As ClickHouse advises:

"You should never use too granular of partitioning. Don't partition your data by client identifiers or names. Instead, make a client identifier or name the first column in the ORDER BY expression." [6]

Once your table is configured, the next step is ensuring smooth data ingestion and management.

Adding and Managing Streaming Data

With your partitioned table ready, focus on efficiently managing incoming data. ClickHouse automatically creates partitions as new data is inserted.

For best performance, use batched inserts of 1,000 to 100,000 rows. If you're working with streaming workloads, enable asynchronous inserts by setting async_insert=1.

Here’s an example of inserting data that automatically creates partitions:

INSERT INTO transactions VALUES
    ('2025-07-15', 'txn_001', 125.50, 1),
    ('2025-07-16', 'txn_002', 89.99, 2),
    ('2025-08-01', 'txn_003', 200.00, 1);

This insert creates two partitions: one for July 2025 (202507) and another for August 2025 (202508). The system handles partition creation automatically based on the toYYYYMM(timestamp) expression defined in the partition key.

For production environments, consider using optimal codecs to compress data efficiently based on your usage patterns. Regularly monitor your partition creation to ensure it aligns with your expectations. Use the EXPLAIN command to analyze how ClickHouse executes queries on your partitioned tables. This will help you confirm whether partition pruning is working as intended and identify potential performance issues [5].

With dynamic partitioning in place, managing and ingesting data becomes far more streamlined.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Managing Dynamic Partitions at Scale

Partition Merging and Optimization

ClickHouse® is designed to automatically merge smaller data parts within partitions, ensuring efficient query performance. However, it's important to understand how this process works to manage partitions effectively. According to the documentation:

"With partitioning enabled, ClickHouse only merges data parts within, but not across partitions." [3][1]

Dynamic partitioning simplifies storage management for streaming data by continuously consolidating smaller parts into larger ones. This reduces fragmentation and enhances query speed. However, keeping an eye on these automatic merges is critical to avoid performance bottlenecks.

In production environments, monitor part accumulation carefully. If too many parts remain unmerged, it can degrade performance or even cause system errors. The infamous "Too many parts" error, for instance, occurs when parts accumulate faster than they can be merged, potentially halting data ingestion altogether.

For manual intervention, use the OPTIMIZE TABLE command sparingly. While it might seem like a good idea to force merges during low-traffic times, avoid using OPTIMIZE FINAL unless absolutely necessary - it’s extremely resource-intensive. Let the system handle merges automatically and only step in when you notice performance issues tied to excessive part buildup.

To maintain efficiency, ensure your partition key has fewer than 1,000 distinct values. Higher cardinality can overwhelm system resources and slow down merging operations. If your workload involves higher cardinality, rethink your partitioning strategy to group related data more effectively.

Using TTL for Partition Management

Once partitions are optimized, managing their lifecycle with Time-to-Live (TTL) settings becomes essential. TTL automates data retention, storage efficiency, and compliance, making it a perfect fit for streaming workloads. By integrating TTL, you can handle data cleanup seamlessly alongside the automated merging process.

During merges, TTL automatically deletes expired data. The system checks TTL rules based on a schedule, with the default merge_with_ttl_timeout setting triggering these events every 4 hours (14,400 seconds) [9].

For better efficiency, enable the ttl_only_drop_parts=1 setting. This allows ClickHouse to drop entire partitions when all their data has expired, avoiding the need to process rows individually. As the documentation advises:

"We recommend always using the setting ttl_only_drop_parts=1. When this setting is enabled, ClickHouse drops a whole part when all rows in it are expired." [8]

Align TTL values with your partition intervals, such as daily or monthly partitions, so that entire partitions can be removed efficiently. This approach saves resources compared to row-by-row deletion.

TTL also integrates well with tiered storage setups, where data transitions between hot, warm, and cold storage based on its age. This helps balance costs while keeping recent data readily accessible. Configure storage volumes accordingly and set TTL rules to automate data migration as it ages.

If you need to force TTL evaluation outside of its normal schedule, use the ALTER TABLE my_table MATERIALIZE TTL command [8].

Partition Pruning for Faster Queries

After addressing storage optimization with merging and TTL, the next step is improving query performance through partition pruning. This feature allows ClickHouse to skip irrelevant data during query execution, significantly boosting performance when partitions are designed well. MinMax indexes on partition columns are key to this process.

When you filter data based on your partition key, ClickHouse builds MinMax indexes that store the minimum and maximum values for each partition. The query optimizer uses these indexes to skip partitions that don’t contain relevant data [3].

To maximize these benefits, structure your WHERE clauses to align with your partition keys. For instance, if your table is partitioned by month, a query filtering on a specific date range will only scan the relevant partitions, speeding up execution.

That said, partitioning is primarily a data management tool, not a catch-all solution for query optimization. As the documentation points out:

"Partitioning is primarily a data management technique and not a query optimization tool, and while it can improve performance in specific workloads, it should not be the first mechanism used to accelerate queries." [3]

Avoid running queries across all partitions whenever possible, as this can perform worse than querying a non-partitioned table. Instead, design queries to target specific partitions, using proper ORDER BY clauses to further enhance performance.

To ensure partition pruning is working as expected, use the EXPLAIN command to analyze your query execution plans. Check that ClickHouse is skipping irrelevant partitions, and adjust your partitioning strategy if the results aren’t meeting your expectations.

Clickhouse Internals with Tom and Tyler

sbb-itb-65dad68

Tinybird vs. Self-Managed ClickHouse®: Choosing the Right Solution

Tinybird

When deciding between Tinybird and self-managed ClickHouse®, the choice often boils down to balancing operational simplicity against the need for control. Below, we'll break down the key differences to help you make an informed decision.

Feature Comparison Table

Feature	Tinybird	Self-Managed ClickHouse®
Setup Time	Minutes with preconfigured clusters	Days to weeks for proper configuration
Partition Management	Automated with optimized defaults	Manual configuration and monitoring
Database Upgrades	Automatic with zero downtime	Requires manual planning and execution
API Generation	One-click REST APIs from SQL queries	Custom API development needed
Scaling	Automatic compute scaling	Manual resource provisioning
Monitoring & Observability	Built-in dashboards and alerts	Requires third-party tool integration
Multi-tenancy Support	Pre-optimized for SaaS architectures	Custom implementation needed
Cost Structure	Usage-based: $0.34/GB storage and $0.07/GB processed	Costs include infrastructure plus development
Control Level	Higher abstraction with less direct control	Full access to ClickHouse® internals

When to Choose Tinybird

Tinybird is ideal if you want to streamline development while maintaining top-tier performance for analytics. It simplifies partition management and allows you to focus on building features rather than worrying about infrastructure. For real-time analytics APIs with low latency and high concurrency, Tinybird is a strong contender. Its Events API, for instance, can handle high-throughput streaming with ease.

Cost-wise, Tinybird often benefits small to medium-scale operations. For example, 93% of non-enterprise customers on paid plans spend less than $100/month, and the median workspace costs under $10/month[10]. Tinybird also eliminates the need for integrating multiple tools by offering built-in services for data ingestion, API hosting, monitoring, and security. Its compliance with SOC2 Type II and HIPAA standards ensures a production-ready analytics system without extra development effort.

When to Choose Self-Managed ClickHouse®

Self-managed ClickHouse® is the way to go if you need complete control over your database configuration and have the expertise to make the most of its advanced features. This option works well for teams that want to customize partition keys, merge policies, or TTL configurations beyond what managed services typically offer. While infrastructure costs may seem lower upfront, don't overlook the additional time and resources required for setup, maintenance, and scaling.

This approach is especially suited for custom security requirements, specific data residency needs, or unique integrations that demand granular management. Internal analytics systems, where cost efficiency matters more than ease of use, also benefit from this setup. With full access to ClickHouse® internals, teams can fine-tune performance through advanced partition pruning, custom storage policies, and other optimizations.

Best Practices for Dynamic Partitioning in ClickHouse®

Building on the basics of setup and management, these best practices aim to fine-tune dynamic partitioning for better performance. Getting it right requires careful planning and ongoing monitoring. The way you handle partition keys, tracking, and maintenance will directly influence your system's efficiency and scalability. Let’s break it down.

Choosing the Right Partition Key

The partition key is the backbone of your partitioning strategy, and picking the right one is crucial. Partitioning is more about managing data than speeding up queries [3]. A good starting point is to aim for low cardinality - ideally, your partition key should have fewer than 100–1,000 unique values [3]. Likewise, the total number of partitions should stay in the range of dozens to hundreds, not thousands [4][11]. For optimal performance, a single SELECT query should only touch a few dozen partitions, and INSERT operations should target just one or a handful [4].

For time-based data, monthly partitioning is a solid default choice, especially for streaming scenarios. If your monthly data exceeds 300 GB, consider switching to weekly partitions [11].

For non-time-based data, think carefully about how your data is accessed. For example, if you have a large users table (over 10 GB) and queries often filter by user_id, you could use xxHash64(user_id) % 10 to evenly distribute data across 10 partitions [11]. For sequential numbers, dividing by a large constant can group records effectively [11].

Once your partition key is set, the focus shifts to keeping everything running smoothly through monitoring and scaling.

Monitoring and Scaling Partitions

Monitoring is key to catching small issues before they balloon into big problems. ClickHouse® offers built-in tools for this. Use the system.parts table to track partition metrics, and query the _partition_value virtual column to see all unique partitions [1]. Keep an eye on partition sizes to ensure they remain within optimal limits.

MinMax indexes, which ClickHouse® creates automatically when partitioning is enabled, are another important tool. These indexes store the minimum and maximum values of the columns in your partition key and help with query optimization and partition pruning. Make sure to monitor their performance alongside partition growth [1].

As your data grows, you may need to scale horizontally by sharding. Plan your sharding strategy early, and use hash functions in your PARTITION BY clause to distribute data evenly [6]. Don’t forget to configure replication in distributed setups to ensure data availability and support read scaling.

Pay attention to query patterns. If queries consistently hit too many partitions, it might be time to revisit your partition key. Similarly, if partitions grow beyond 300 GB, consider a key with higher cardinality [4]. Regularly evaluating these metrics will help you maintain efficiency as your data scales.

Updating Partitioning Strategies Over Time

Partitioning isn’t a "set-it-and-forget-it" solution. As your data grows or access patterns evolve, your strategy should adapt too. Regularly review and adjust your setup to keep it aligned with your current needs. Watch for signs like oversized partitions, queries that span too many partitions, or uneven data distribution.

When making schema changes across partitions, proceed carefully. In some cases, it’s better to create a new table or use ALTER TABLE ... ATTACH PARTITION to move data [13][14]. This method can also be used to add data to existing tables [12].

Whenever possible, automate partition management. For instance, you can use custom scripts to create partitions dynamically, especially for time-based or identifier-based partitioning [13]. Before rolling out any new scheme, test it on a subset of your data to avoid unexpected issues.

Finally, document everything. Keep records of why certain partitioning decisions were made, what results you expected, and any lessons learned. This helps with future updates and ensures your team can easily understand the current setup.

Summary and Implementation Checklist

Key Takeaways

Dynamic partitioning in ClickHouse® reshapes how streaming data is managed by dividing tables into logical groups based on criteria like time ranges or categories. When done right, this method enhances both data management and query performance.

Partitioning is primarily a tool for efficient data management. It allows independent handling of partitions, making tasks like data deletion, TTL policies, and tiered storage simpler. While better query performance is a bonus, it’s not the main goal.

The key to success lies in selecting the right partition key. Aim for a cardinality of 1,000–10,000 distinct values[1], with dozens to hundreds of partitions rather than thousands[4]. For most streaming use cases, monthly partitioning works well, keeping partition sizes in the ideal range of 1–300 GB[4].

Monitoring and maintenance are just as important. Regular checks using system tables like system.parts help address issues early. Proper TTL configuration automates the data lifecycle, and as your data grows or access patterns shift, you may need to tweak your partitioning strategy.

Step-by-Step Implementation Checklist

Here’s a practical checklist to guide you through the setup and maintenance of dynamic partitioning in ClickHouse®. Following these steps will help you avoid common pitfalls like excessive part accumulation or slow queries.

Planning and Setup:

Choose a low-cardinality partition key (under 10,000 distinct values) that aligns with frequent query filters. Target partition sizes between 1–300 GB[1][4].
Design your table schema using the PARTITION BY clause to define the partitioning logic[2].
Ensure the partition key supports your data lifecycle and retention requirements.

Implementation Steps:

Create your table with the PARTITION BY clause, using your chosen partitioning expression[1].
Configure TTL settings, including ttl_only_drop_parts=1, to manage data lifecycles efficiently[2].
Set up streaming data ingestion to focus on one or a few partitions per INSERT operation[4].
Test your setup with sample data to confirm partition creation and ensure queries perform as expected.

Monitoring and Maintenance:

Use system.parts to monitor partition sizes and the number of active parts[2].
Query the _partition_value virtual column to list all unique partitions[1].
Run OPTIMIZE TABLE during off-peak times to merge smaller parts[7].
Use EXPLAIN to review query execution plans and identify bottlenecks[7].
Set up alerts to track partition size growth and detect query performance issues.
Keep detailed documentation of your partitioning strategy and configuration changes.

Ongoing Optimization:

Regularly evaluate how partitions are being accessed. Adjust your strategy as your data grows or usage patterns evolve.
When partitions near their size limits, consider sharding to scale horizontally.
Update TTL policies to reflect any changes in data retention needs.
Test schema changes in a staging environment before rolling them out to production.

Subscribe to our newsletter

Get 10 links weekly to the Data and AI articles the Tinybird team is reading.

Loading…

FAQs

What are the benefits of using dynamic partitioning in ClickHouse® for managing streaming data?

Dynamic partitioning in ClickHouse® takes the hassle out of managing streaming data by automatically breaking it into logical segments based on predefined keys. This removes the burden of manual partitioning, allowing the system to seamlessly adapt to real-time data streams.

With this approach, data organization becomes far more efficient, leading to noticeable improvements in query performance. The system can quickly pinpoint and process the data you need. On top of that, it boosts scalability and optimizes resource usage, making it a powerful choice for managing large, high-speed datasets.

How do I choose the right partition key for better performance and scalability in ClickHouse®?

When working with ClickHouse®, picking the right partition key is crucial for efficient data management and query performance. A good rule of thumb is to go for a low cardinality key. This means selecting a key that keeps data evenly distributed across partitions, avoiding unnecessary fragmentation. Why? Because uneven or overly fragmented partitions can slow down your queries.

Steer clear of overly specific keys, like customer IDs, as they tend to create partitions that are too granular. Instead, focus on a key that matches your query patterns and strikes a balance between partition size and query speed. A common choice for time-series data, for instance, is a date-based key, which often provides a good mix of simplicity and efficiency.

Getting your partitioning strategy right is essential, especially as your dataset scales up. It ensures your system remains manageable and performs well even as data volumes increase.

How do I monitor and optimize partition merging to avoid performance issues in a ClickHouse® setup?

When working with ClickHouse®, keeping an eye on partition merging is essential for maintaining system performance. Start by checking the system.merges table, which provides details on ongoing merge operations. This can help you understand how these processes might affect your system's performance. For a deeper dive into partition activity and to spot inefficiencies, explore the system.part_log table.

If you notice areas for improvement, tweak settings like max_bytes_to_merge_at_once and merge_with_ttl_timeout to align merging behavior with your workload's needs. It's also a good idea to routinely review merge levels and partition sizes. Doing so helps ensure smooth operations and avoids performance bottlenecks.

Skip the infra work. Deploy your first ClickHouse
project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Dynamic Partitioning in ClickHouse^®: Step-by-Step Guide

Dynamic Partitioning in ClickHouse®: Step-by-Step Guide

Setting Up Dynamic Partitioning in ClickHouse®

Requirements Before You Start

Creating a Table with Dynamic Partitions

Adding and Managing Streaming Data

Managing Dynamic Partitions at Scale

Partition Merging and Optimization

Using TTL for Partition Management

Partition Pruning for Faster Queries

Clickhouse Internals with Tom and Tyler

sbb-itb-65dad68

Tinybird vs. Self-Managed ClickHouse®: Choosing the Right Solution

Feature Comparison Table

When to Choose Tinybird

When to Choose Self-Managed ClickHouse®

Best Practices for Dynamic Partitioning in ClickHouse®

Choosing the Right Partition Key

Monitoring and Scaling Partitions

Updating Partitioning Strategies Over Time

Summary and Implementation Checklist

Key Takeaways

Step-by-Step Implementation Checklist

FAQs

What are the benefits of using dynamic partitioning in ClickHouse® for managing streaming data?

How do I choose the right partition key for better performance and scalability in ClickHouse®?

How do I monitor and optimize partition merging to avoid performance issues in a ClickHouse® setup?

Skip the infra work. Deploy your first ClickHouse
project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Dynamic Partitioning in ClickHouse®: Step-by-Step Guide

Dynamic Partitioning in ClickHouse®: Step-by-Step Guide

Setting Up Dynamic Partitioning in ClickHouse®

Requirements Before You Start

Creating a Table with Dynamic Partitions

Adding and Managing Streaming Data

Managing Dynamic Partitions at Scale

Partition Merging and Optimization

Using TTL for Partition Management

Partition Pruning for Faster Queries

Clickhouse Internals with Tom and Tyler

sbb-itb-65dad68

Tinybird vs. Self-Managed ClickHouse®: Choosing the Right Solution

Feature Comparison Table

When to Choose Tinybird

When to Choose Self-Managed ClickHouse®

Best Practices for Dynamic Partitioning in ClickHouse®

Choosing the Right Partition Key

Monitoring and Scaling Partitions

Updating Partitioning Strategies Over Time

Summary and Implementation Checklist

Key Takeaways

Step-by-Step Implementation Checklist

FAQs

What are the benefits of using dynamic partitioning in ClickHouse® for managing streaming data?

How do I choose the right partition key for better performance and scalability in ClickHouse®?

How do I monitor and optimize partition merging to avoid performance issues in a ClickHouse® setup?

Skip the infra work. Deploy your first ClickHouse project now.

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse
project now.

Dynamic Partitioning in ClickHouse^®: Step-by-Step Guide

Skip the infra work. Deploy your first ClickHouse
project now.