SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY ClickHouse error¶
This error occurs when ClickHouse detects that a column with low cardinality (few unique values) is using a data type that's not optimal for storage and performance. It's a warning about potential inefficiency.
The SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY error in ClickHouse (and Tinybird) happens when the system detects that a column with low cardinality (few unique values) is using a data type that's not optimal for storage and performance. This is a warning that suggests using more appropriate data types like LowCardinality or Enum for better efficiency.
What causes this error¶
You'll typically see it when:
- Using String type for columns with few unique values
- Not using LowCardinality for repetitive string data
- Missing Enum types for categorical data
- Using wide data types for narrow value ranges
- String columns with high repetition
- Missing optimization for categorical columns
- Inefficient storage for status or type fields
This error suggests optimization opportunities. Consider using LowCardinality or Enum types for columns with few unique values.
Example errors¶
Fails: String type for low cardinality
CREATE TABLE events (
id UInt64,
event_type String, -- Only has 5-10 unique values
status String, -- Only has 3-4 unique values
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (timestamp, id)
-- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
Fails: Wide type for narrow range
CREATE TABLE users (
id UInt64,
country String, -- Only 50-100 unique countries
language String, -- Only 10-20 unique languages
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (created_at, id)
-- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
Fails: Missing optimization for categories
CREATE TABLE orders (
id UInt64,
order_status String, -- Only: 'pending', 'confirmed', 'shipped', 'delivered'
payment_method String, -- Only: 'credit_card', 'paypal', 'bank_transfer'
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (created_at, id)
-- Warning: SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY
How to fix it¶
Use LowCardinality type¶
Convert String columns to LowCardinality for better performance:
Fix: use LowCardinality
CREATE TABLE events (
id UInt64,
event_type LowCardinality(String), -- Optimized for few unique values
status LowCardinality(String), -- Optimized for few unique values
timestamp DateTime
) ENGINE = MergeTree()
ORDER BY (timestamp, id)
Use Enum types¶
Use Enum for columns with very few, fixed values:
Fix: use Enum types
CREATE TABLE orders (
id UInt64,
order_status Enum('pending' = 1, 'confirmed' = 2, 'shipped' = 3, 'delivered' = 4),
payment_method Enum('credit_card' = 1, 'paypal' = 2, 'bank_transfer' = 3),
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (created_at, id)
Analyze cardinality¶
Check the actual cardinality of your columns:
Analyze column cardinality
SELECT
name,
type,
COUNT(DISTINCT value) as unique_values,
COUNT(*) as total_rows
FROM (
SELECT 'event_type' as name, event_type as value FROM events
UNION ALL
SELECT 'status' as name, status as value FROM events
)
GROUP BY name, type
ORDER BY unique_values
Optimize existing tables¶
Alter existing tables to use better types:
Optimize existing table
-- Convert String columns to LowCardinality
ALTER TABLE events MODIFY COLUMN event_type LowCardinality(String);
ALTER TABLE events MODIFY COLUMN status LowCardinality(String);
-- Or convert to Enum if values are fixed
ALTER TABLE orders MODIFY COLUMN order_status Enum('pending' = 1, 'confirmed' = 2, 'shipped' = 3, 'delivered' = 4);
Common patterns and solutions¶
Categorical data optimization¶
Optimize columns with categorical values:
Categorical optimization
CREATE TABLE products (
id UInt64,
category LowCardinality(String), -- Product categories
brand LowCardinality(String), -- Brand names
condition Enum('new' = 1, 'used' = 2, 'refurbished' = 3),
price Decimal(10, 2),
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (category, brand, created_at)
Status and type fields¶
Optimize status and type columns:
Status field optimization
CREATE TABLE transactions (
id UInt64,
transaction_type Enum('purchase' = 1, 'refund' = 2, 'exchange' = 3),
status Enum('pending' = 1, 'completed' = 2, 'failed' = 3, 'cancelled' = 4),
amount Decimal(10, 2),
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (transaction_type, status, created_at)
Geographic data¶
Optimize geographic columns:
Geographic optimization
CREATE TABLE users (
id UInt64,
country LowCardinality(String), -- Country names
region LowCardinality(String), -- State/Province
city LowCardinality(String), -- City names
timezone LowCardinality(String), -- Timezone identifiers
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (country, region, created_at)
Configuration data¶
Optimize configuration and settings columns:
Configuration optimization
CREATE TABLE user_settings (
user_id UInt64,
theme Enum('light' = 1, 'dark' = 2, 'auto' = 3),
language LowCardinality(String), -- Language codes
currency LowCardinality(String), -- Currency codes
notification_level Enum('none' = 1, 'low' = 2, 'medium' = 3, 'high' = 4),
updated_at DateTime
) ENGINE = MergeTree()
ORDER BY (user_id, updated_at)
Tinybird-specific notes¶
In Tinybird, SUSPICIOUS_TYPE_FOR_LOW_CARDINALITY warnings often occur when:
- Data Sources have inefficient column types
- Pipe transformations don't optimize data types
- Materialized Views use suboptimal schemas
- API endpoints return unoptimized data
To debug in Tinybird:
- Review Data Source schema for optimization opportunities
- Check Pipe transformations for type optimization
- Verify Materialized View column types
- Consider using LowCardinality for categorical data
In Tinybird, use the Data Source preview to analyze column cardinality and optimize types accordingly.
Best practices¶
Type selection¶
- Use LowCardinality(String) for columns with < 10,000 unique values
- Use Enum for columns with < 100 fixed values
- Use String only for truly high-cardinality data
- Consider storage and query performance trade-offs
Schema design¶
- Analyze data patterns before choosing types
- Monitor column cardinality over time
- Use appropriate types for categorical data
- Document type selection rationale
Performance optimization¶
- Benchmark different type choices
- Monitor storage usage and query performance
- Consider compression ratios
- Balance storage efficiency with query flexibility
Configuration options¶
LowCardinality settings¶
LowCardinality configuration
-- Check LowCardinality settings
SELECT
name,
value,
description
FROM system.settings
WHERE name LIKE '%low_cardinality%'
Enum optimization¶
Enum configuration
-- Configure Enum behavior SET allow_experimental_enum_data_type = 1; SET allow_experimental_low_cardinality_type = 1;
Type inference¶
Type inference settings
-- Enable automatic type inference SET allow_experimental_infer_schema = 1; SET allow_experimental_auto_type_inference = 1;
Alternative solutions¶
Use FixedString¶
Use FixedString for fixed-length categorical data:
FixedString optimization
CREATE TABLE products (
id UInt64,
sku FixedString(10), -- Fixed-length SKU codes
category_code FixedString(3), -- Fixed-length category codes
brand_code FixedString(2), -- Fixed-length brand codes
price Decimal(10, 2),
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (category_code, brand_code, created_at)
Implement custom encoding¶
Create custom encoding for categorical data:
Custom encoding
-- Create mapping table for categories
CREATE TABLE category_mapping (
id UInt8,
name String
) ENGINE = Memory;
INSERT INTO category_mapping VALUES
(1, 'electronics'),
(2, 'clothing'),
(3, 'books'),
(4, 'home');
-- Use UInt8 in main table
CREATE TABLE products (
id UInt64,
category_id UInt8, -- Reference to category_mapping
name String,
price Decimal(10, 2)
) ENGINE = MergeTree()
ORDER BY (category_id, created_at)
Use Array types¶
Use Array types for multiple categorical values:
Array optimization
CREATE TABLE users (
id UInt64,
interests Array(LowCardinality(String)), -- User interests
skills Array(LowCardinality(String)), -- User skills
created_at DateTime
) ENGINE = MergeTree()
ORDER BY (created_at, id)
Monitoring and prevention¶
Cardinality monitoring¶
Cardinality tracking
-- Monitor column cardinality over time
SELECT
table,
column,
COUNT(DISTINCT value) as unique_values,
COUNT(*) as total_rows,
unique_values / total_rows as uniqueness_ratio
FROM (
SELECT 'events' as table, 'event_type' as column, event_type as value FROM events
UNION ALL
SELECT 'events' as table, 'status' as column, status as value FROM events
)
GROUP BY table, column
ORDER BY uniqueness_ratio
Storage optimization¶
Storage monitoring
-- Monitor storage usage by column type
SELECT
table,
column,
type,
COUNT(*) as row_count,
SUM(bytes_on_disk) as storage_bytes
FROM system.columns
WHERE database = currentDatabase()
GROUP BY table, column, type
ORDER BY storage_bytes DESC
Performance monitoring¶
Performance tracking
-- Track query performance by column type
-- Example pseudo-code:
--
-- def track_column_performance(table, column, query_type, execution_time):
-- logger.info(f"Column {column} in table {table} - {query_type}: {execution_time}s")
--
-- # Track performance metrics
-- increment_counter('column_query_performance', {
-- 'table': table,
-- 'column': column,
-- 'query_type': query_type,
-- 'execution_time': execution_time
-- })