SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY ClickHouse error¶
This error occurs when ClickHouse detects that a column with low cardinality (few unique values) is using a data type that's not optimal for storage and performance. It's a warning about potential inefficiency.
The SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
error in ClickHouse (and Tinybird) happens when the system detects that a column with low cardinality (few unique values) is using a data type that's not optimal for storage and performance. This is a warning that suggests using more appropriate data types like LowCardinality
or Enum
for better efficiency.
What causes this error¶
You'll typically see it when:
- Using String type for columns with few unique values
- Not using LowCardinality for repetitive string data
- Missing Enum types for categorical data
- Using wide data types for narrow value ranges
- String columns with high repetition
- Missing optimization for categorical columns
- Inefficient storage for status or type fields
This error suggests optimization opportunities. Consider using LowCardinality or Enum types for columns with few unique values.
Example errors¶
Fails: String type for low cardinality
CREATE TABLE events ( id UInt64, event_type String, -- Only has 5-10 unique values status String, -- Only has 3-4 unique values timestamp DateTime ) ENGINE = MergeTree() ORDER BY (timestamp, id) -- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
Fails: Wide type for narrow range
CREATE TABLE users ( id UInt64, country String, -- Only 50-100 unique countries language String, -- Only 10-20 unique languages created_at DateTime ) ENGINE = MergeTree() ORDER BY (created_at, id) -- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
Fails: Missing optimization for categories
CREATE TABLE orders ( id UInt64, order_status String, -- Only: 'pending', 'confirmed', 'shipped', 'delivered' payment_method String, -- Only: 'credit_card', 'paypal', 'bank_transfer' created_at DateTime ) ENGINE = MergeTree() ORDER BY (created_at, id) -- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
How to fix it¶
Use LowCardinality type¶
Convert String columns to LowCardinality for better performance:
Fix: use LowCardinality
CREATE TABLE events ( id UInt64, event_type LowCardinality(String), -- Optimized for few unique values status LowCardinality(String), -- Optimized for few unique values timestamp DateTime ) ENGINE = MergeTree() ORDER BY (timestamp, id)
Use Enum types¶
Use Enum for columns with very few, fixed values:
Fix: use Enum types
CREATE TABLE orders ( id UInt64, order_status Enum('pending' = 1, 'confirmed' = 2, 'shipped' = 3, 'delivered' = 4), payment_method Enum('credit_card' = 1, 'paypal' = 2, 'bank_transfer' = 3), created_at DateTime ) ENGINE = MergeTree() ORDER BY (created_at, id)
Analyze cardinality¶
Check the actual cardinality of your columns:
Analyze column cardinality
SELECT name, type, COUNT(DISTINCT value) as unique_values, COUNT(*) as total_rows FROM ( SELECT 'event_type' as name, event_type as value FROM events UNION ALL SELECT 'status' as name, status as value FROM events ) GROUP BY name, type ORDER BY unique_values
Optimize existing tables¶
Alter existing tables to use better types:
Optimize existing table
-- Convert String columns to LowCardinality ALTER TABLE events MODIFY COLUMN event_type LowCardinality(String); ALTER TABLE events MODIFY COLUMN status LowCardinality(String); -- Or convert to Enum if values are fixed ALTER TABLE orders MODIFY COLUMN order_status Enum('pending' = 1, 'confirmed' = 2, 'shipped' = 3, 'delivered' = 4);
Common patterns and solutions¶
Categorical data optimization¶
Optimize columns with categorical values:
Categorical optimization
CREATE TABLE products ( id UInt64, category LowCardinality(String), -- Product categories brand LowCardinality(String), -- Brand names condition Enum('new' = 1, 'used' = 2, 'refurbished' = 3), price Decimal(10, 2), created_at DateTime ) ENGINE = MergeTree() ORDER BY (category, brand, created_at)
Status and type fields¶
Optimize status and type columns:
Status field optimization
CREATE TABLE transactions ( id UInt64, transaction_type Enum('purchase' = 1, 'refund' = 2, 'exchange' = 3), status Enum('pending' = 1, 'completed' = 2, 'failed' = 3, 'cancelled' = 4), amount Decimal(10, 2), created_at DateTime ) ENGINE = MergeTree() ORDER BY (transaction_type, status, created_at)
Geographic data¶
Optimize geographic columns:
Geographic optimization
CREATE TABLE users ( id UInt64, country LowCardinality(String), -- Country names region LowCardinality(String), -- State/Province city LowCardinality(String), -- City names timezone LowCardinality(String), -- Timezone identifiers created_at DateTime ) ENGINE = MergeTree() ORDER BY (country, region, created_at)
Configuration data¶
Optimize configuration and settings columns:
Configuration optimization
CREATE TABLE user_settings ( user_id UInt64, theme Enum('light' = 1, 'dark' = 2, 'auto' = 3), language LowCardinality(String), -- Language codes currency LowCardinality(String), -- Currency codes notification_level Enum('none' = 1, 'low' = 2, 'medium' = 3, 'high' = 4), updated_at DateTime ) ENGINE = MergeTree() ORDER BY (user_id, updated_at)
Tinybird-specific notes¶
In Tinybird, SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY warnings often occur when:
- Data Sources have inefficient column types
- Pipe transformations don't optimize data types
- Materialized Views use suboptimal schemas
- API endpoints return unoptimized data
To debug in Tinybird:
- Review Data Source schema for optimization opportunities
- Check Pipe transformations for type optimization
- Verify Materialized View column types
- Consider using LowCardinality for categorical data
In Tinybird, use the Data Source preview to analyze column cardinality and optimize types accordingly.
Best practices¶
Type selection¶
- Use LowCardinality(String) for columns with < 10,000 unique values
- Use Enum for columns with < 100 fixed values
- Use String only for truly high-cardinality data
- Consider storage and query performance trade-offs
Schema design¶
- Analyze data patterns before choosing types
- Monitor column cardinality over time
- Use appropriate types for categorical data
- Document type selection rationale
Performance optimization¶
- Benchmark different type choices
- Monitor storage usage and query performance
- Consider compression ratios
- Balance storage efficiency with query flexibility
Configuration options¶
LowCardinality settings¶
LowCardinality configuration
-- Check LowCardinality settings SELECT name, value, description FROM system.settings WHERE name LIKE '%low_cardinality%'
Enum optimization¶
Enum configuration
-- Configure Enum behavior SET allow_experimental_enum_data_type = 1; SET allow_experimental_low_cardinality_type = 1;
Type inference¶
Type inference settings
-- Enable automatic type inference SET allow_experimental_infer_schema = 1; SET allow_experimental_auto_type_inference = 1;
Alternative solutions¶
Use FixedString¶
Use FixedString for fixed-length categorical data:
FixedString optimization
CREATE TABLE products ( id UInt64, sku FixedString(10), -- Fixed-length SKU codes category_code FixedString(3), -- Fixed-length category codes brand_code FixedString(2), -- Fixed-length brand codes price Decimal(10, 2), created_at DateTime ) ENGINE = MergeTree() ORDER BY (category_code, brand_code, created_at)
Implement custom encoding¶
Create custom encoding for categorical data:
Custom encoding
-- Create mapping table for categories CREATE TABLE category_mapping ( id UInt8, name String ) ENGINE = Memory; INSERT INTO category_mapping VALUES (1, 'electronics'), (2, 'clothing'), (3, 'books'), (4, 'home'); -- Use UInt8 in main table CREATE TABLE products ( id UInt64, category_id UInt8, -- Reference to category_mapping name String, price Decimal(10, 2) ) ENGINE = MergeTree() ORDER BY (category_id, created_at)
Use Array types¶
Use Array types for multiple categorical values:
Array optimization
CREATE TABLE users ( id UInt64, interests Array(LowCardinality(String)), -- User interests skills Array(LowCardinality(String)), -- User skills created_at DateTime ) ENGINE = MergeTree() ORDER BY (created_at, id)
Monitoring and prevention¶
Cardinality monitoring¶
Cardinality tracking
-- Monitor column cardinality over time SELECT table, column, COUNT(DISTINCT value) as unique_values, COUNT(*) as total_rows, unique_values / total_rows as uniqueness_ratio FROM ( SELECT 'events' as table, 'event_type' as column, event_type as value FROM events UNION ALL SELECT 'events' as table, 'status' as column, status as value FROM events ) GROUP BY table, column ORDER BY uniqueness_ratio
Storage optimization¶
Storage monitoring
-- Monitor storage usage by column type SELECT table, column, type, COUNT(*) as row_count, SUM(bytes_on_disk) as storage_bytes FROM system.columns WHERE database = currentDatabase() GROUP BY table, column, type ORDER BY storage_bytes DESC
Performance monitoring¶
Performance tracking
-- Track query performance by column type -- Example pseudo-code: -- -- def track_column_performance(table, column, query_type, execution_time): -- logger.info(f"Column {column} in table {table} - {query_type}: {execution_time}s") -- -- # Track performance metrics -- increment_counter('column_query_performance', { -- 'table': table, -- 'column': column, -- 'query_type': query_type, -- 'execution_time': execution_time -- })