Slow queries troubleshooting

Common issues and solutions for slow query performance in ClickHouse and Tinybird.

Common causes

Missing or incorrect sorting keys

Issue: Queries scanning entire tables because filters don't match sorting keys

Solution: Ensure your sorting key (ENGINE_SORTING_KEY) includes columns you filter on, ordered by selectivity

-- Data Source with proper sorting key
SCHEMA >
  `user_id` Int64,
  `date` DateTime,
  `event_type` String

ENGINE "MergeTree"
ENGINE_SORTING_KEY "user_id, date"

Filter queries using columns in the sorting key, ordered by how they appear in the key:

-- Fast: filters by user_id (first in sorting key), then date
SELECT * FROM events
WHERE user_id = 123 AND date >= '2023-01-01'

Large result sets

Issue: Queries returning too much data

Solution: Use LIMIT and proper filtering

-- Add limits to prevent large results
SELECT * FROM events
WHERE date >= today() - 7
LIMIT 1000

Performance patterns

Inefficient WHERE clauses

Issue: Filters not matching sorting key order, causing full table scans

Solution: Filter by sorting key columns in the same order they appear in the sorting key

-- Fast: filters match sorting key order (user_id, date)
SELECT * FROM events
WHERE user_id = 123 AND date >= '2023-01-01' AND event_type = 'click'

The order matters: filter by the most selective columns first, matching your sorting key order. Columns in the sorting key should be the ones you use for filtering.

Complex aggregations

Issue: Heavy aggregations on large datasets at query time

Solution: Pre-aggregate data using Materialized Views to shift computation to ingestion time

-- Create Materialized View for pre-aggregation
CREATE MATERIALIZED VIEW daily_stats
ENGINE = SummingMergeTree()
ORDER BY (user_id, date)
AS SELECT
  user_id,
  toDate(timestamp) as date,
  sum(value) as total_value,
  count() as event_count
FROM events
GROUP BY user_id, toDate(timestamp)

This moves the aggregation from query time to ingestion time, significantly improving query performance.

Query optimization

Using FINAL modifier

Issue: Queries not using FINAL for ReplacingMergeTree

Solution: Add FINAL when needed

-- Use FINAL for latest versions
SELECT * FROM events FINAL WHERE user_id = 123

Avoiding SELECT *

Issue: Selecting unnecessary columns

Solution: Select only needed columns

-- Select specific columns
SELECT user_id, event_type, timestamp
FROM events
WHERE date = today()

Memory and resource issues

TOO_MANY_ROWS_OR_BYTES

Issue: Queries exceeding memory limits

Solution: Filter data and use proper limits

-- Filter data to reduce memory usage
SELECT * FROM events
WHERE date >= today() - 30
LIMIT 10000

Memory pressure

Issue: Queries consuming too much memory

Solution: Optimize query structure

-- Break down complex queries
WITH filtered_data AS (
  SELECT * FROM events WHERE date >= today() - 7
)
SELECT count() FROM filtered_data

Best practices

  1. Use sorting keys - Set ENGINE_SORTING_KEY with columns you filter on, ordered by selectivity
  2. Filter by sorting key - Filter using columns in the sorting key, in the same order
  3. Filter first - Apply WHERE clauses before complex operations like aggregations
  4. Pre-aggregate - Use Materialized Views to shift aggregations from query time to ingestion time
  5. Limit results - Always use LIMIT for large datasets
  6. Select specific columns - Avoid SELECT * on large tables
Updated