TOO_MANY_PARTS ClickHouse error

This error occurs when a ClickHouse table has too many parts, which can cause performance issues and system instability. It's common with high-frequency inserts or improper partitioning strategies.

The TOO_MANY_PARTS error in ClickHouse (and Tinybird) happens when a table accumulates too many parts, which can cause performance degradation, increased memory usage, and system instability. This typically occurs with high-frequency insert operations, improper partitioning strategies, or when background merge operations can't keep up with the insertion rate.

What causes this error

You'll typically see it when:

  • High-frequency INSERT operations create many small parts
  • Insufficient background merge operations
  • Improper partitioning strategy
  • Too many concurrent insert operations
  • Background merge operations are too slow
  • MergeTree table engine configuration issues
  • Insufficient system resources for merging
  • Partitioning by high-cardinality columns

Too many parts can significantly impact query performance. Implement proper partitioning and monitor merge operations.

Example errors

Fails: too many parts in table
INSERT INTO events (user_id, event_type, timestamp) VALUES
(123, 'click', '2024-01-01 10:00:00')
-- Error: TOO_MANY_PARTS
Fails: high-frequency inserts
-- Multiple rapid inserts creating many parts
INSERT INTO metrics (timestamp, value) VALUES
('2024-01-01 10:00:00', 100),
('2024-01-01 10:00:01', 101),
('2024-01-01 10:00:02', 102)
-- Error: TOO_MANY_PARTS
Fails: improper partitioning
-- Partitioning by high-cardinality column
CREATE TABLE events (
    id UInt64,
    user_id UInt64,
    event_type String,
    timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY user_id  -- Too many partitions
ORDER BY (timestamp, id)

How to fix it

Check current parts

Monitor the number of parts in your table:

Check table parts
SELECT
    database,
    table,
    partition,
    name,
    active,
    rows,
    bytes_on_disk
FROM system.parts
WHERE database = currentDatabase()
  AND table = 'your_table_name'
ORDER BY partition, name

Monitor merge operations

Check if merge operations are running:

Check merge operations
SELECT
    query_id,
    user,
    query,
    state,
    start_time,
    elapsed
FROM system.processes
WHERE query LIKE '%OPTIMIZE%' OR query LIKE '%MERGE%'

Optimize table

Force table optimization to merge parts:

Optimize table
-- Optimize table to merge parts
OPTIMIZE TABLE your_table_name FINAL;

-- Or optimize specific partition
OPTIMIZE TABLE your_table_name PARTITION '2024-01' FINAL;

Check partitioning strategy

Review and improve your partitioning strategy:

Improve partitioning
-- Instead of partitioning by user_id (high cardinality)
-- Use time-based partitioning
CREATE TABLE events (
    id UInt64,
    user_id UInt64,
    event_type String,
    timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)  -- Monthly partitions
ORDER BY (timestamp, user_id)

Common patterns and solutions

Time-based partitioning

Use time-based partitioning for time-series data:

Time-based partitioning
CREATE TABLE events (
    id UInt64,
    user_id UInt64,
    event_type String,
    timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)  -- Monthly partitions
ORDER BY (timestamp, user_id)
SETTINGS
    merge_with_ttl_timeout = 86400,  -- 1 day
    merge_max_block_size = 8192

Batch inserts

Implement batch inserts to reduce part creation:

Batch inserts
-- Instead of single inserts, use batch inserts
INSERT INTO events (user_id, event_type, timestamp) VALUES
(123, 'click', '2024-01-01 10:00:00'),
(124, 'view', '2024-01-01 10:00:01'),
(125, 'purchase', '2024-01-01 10:00:02'),
(126, 'click', '2024-01-01 10:00:03')

Merge settings optimization

Configure merge settings for better performance:

Merge settings
-- Set appropriate merge settings
ALTER TABLE your_table_name
SETTINGS
    merge_with_ttl_timeout = 3600,        -- 1 hour
    merge_max_block_size = 8192,          -- Block size for merging
    min_bytes_for_wide_part = 0,          -- Allow wide parts
    min_rows_for_wide_part = 0            -- Allow wide parts

Background merge configuration

Configure background merge operations:

Background merge
-- Configure background merge settings
SET merge_tree_parts_lifetime = 86400;     -- 1 day
SET merge_tree_parts_lifetime_randomize = 1;
SET merge_tree_max_bytes_to_merge_at_max_space_in_pool = 1000000000;  -- 1GB

Tinybird-specific notes

In Tinybird, TOO_MANY_PARTS errors often occur when:

  • Data Sources receive high-frequency data
  • Pipes create many small output parts
  • Materialized Views accumulate too many parts
  • Background merge operations can't keep up

To debug in Tinybird:

  1. Check Data Source insertion patterns
  2. Review Pipe transformation frequency
  3. Monitor Materialized View part counts
  4. Consider batching data operations

In Tinybird, use the Data Source monitoring to track insertion patterns and optimize accordingly.

Best practices

Partitioning strategy

  • Use time-based partitioning for time-series data
  • Avoid partitioning by high-cardinality columns
  • Keep partition sizes reasonable
  • Monitor partition growth over time

Insert optimization

  • Use batch inserts instead of single inserts
  • Implement appropriate insert intervals
  • Monitor insertion rates
  • Use bulk insert operations

Merge optimization

  • Configure appropriate merge settings
  • Monitor merge operation performance
  • Ensure sufficient system resources
  • Use background merge operations

Configuration options

Merge settings

Merge configuration
-- Check current merge settings
SELECT
    name,
    value,
    description
FROM system.settings
WHERE name LIKE '%merge%'

Table settings

Table settings
-- Configure table-specific settings
ALTER TABLE your_table_name
SETTINGS
    merge_with_ttl_timeout = 3600,
    merge_max_block_size = 8192,
    min_bytes_for_wide_part = 0,
    min_rows_for_wide_part = 0

System settings

System settings
-- Configure system-wide merge settings
SET merge_tree_parts_lifetime = 86400;
SET merge_tree_parts_lifetime_randomize = 1;
SET merge_tree_max_bytes_to_merge_at_max_space_in_pool = 1000000000;

Alternative solutions

Use ReplacingMergeTree

Use ReplacingMergeTree for deduplication:

ReplacingMergeTree
CREATE TABLE events (
    id UInt64,
    user_id UInt64,
    event_type String,
    timestamp DateTime,
    version UInt32
) ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id, id)

Implement TTL

Use TTL to automatically remove old parts:

TTL implementation
CREATE TABLE events (
    id UInt64,
    user_id UInt64,
    event_type String,
    timestamp DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (timestamp, user_id)
TTL timestamp + INTERVAL 1 YEAR  -- Remove data older than 1 year

Use AggregatingMergeTree

Use AggregatingMergeTree for pre-aggregated data:

AggregatingMergeTree
CREATE TABLE event_summary (
    date Date,
    user_id UInt64,
    event_count AggregateFunction(count),
    last_event DateTime
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, user_id)

Monitoring and prevention

Parts monitoring

Parts tracking
-- Monitor parts count over time
SELECT
    database,
    table,
    partition,
    COUNT(*) as part_count,
    SUM(rows) as total_rows,
    SUM(bytes_on_disk) as total_bytes
FROM system.parts
WHERE database = currentDatabase()
  AND active = 1
GROUP BY database, table, partition
ORDER BY part_count DESC

Merge performance monitoring

Merge performance
-- Monitor merge operation performance
SELECT
    query_id,
    user,
    query,
    state,
    start_time,
    elapsed
FROM system.processes
WHERE query LIKE '%OPTIMIZE%' OR query LIKE '%MERGE%'
ORDER BY start_time DESC

Table health monitoring

Table health
-- Monitor table health metrics
SELECT
    database,
    table,
    COUNT(*) as total_parts,
    COUNT(CASE WHEN active = 1 THEN 1 END) as active_parts,
    SUM(rows) as total_rows,
    SUM(bytes_on_disk) as total_bytes
FROM system.parts
WHERE database = currentDatabase()
GROUP BY database, table
ORDER BY total_parts DESC

See also

Updated