---
title: "ClickHouse fast queries"
excerpt: "ClickHouse is fast by default, but how fast depends entirely on schema design and query patterns. The difference between a 50ms query and a 50-second query on the same data is usually one of five things: sort key misuse, missing pre-aggregation, COUNT DISTINCT overhead, no PREWHERE, or querying raw data when a rollup exists."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-06-03 00:00:00"
publishedOn: "2026-06-03 00:00:00"
updatedOn: "2026-06-03 00:00:00"
status: "published"
---

ClickHouse® is fast by default. But "fast by default" means fast when the schema and queries are designed well. The same data can return in 40ms or 40 seconds depending on whether the sort key is used, whether you're scanning raw events instead of a rollup, and whether the BI tool wrapping your queries is generating `COUNT(DISTINCT)` and `SELECT *` without you realizing it.

Most [SQL query performance](https://www.tinybird.co/blog/how-to-make-sql-database-faster) improvements in ClickHouse come from schema decisions made at table creation time, not from query hints or index tuning after the fact. Get the sort key and table engine right and many queries are fast without further optimization.

This post covers the five most impactful changes you can make to get faster queries in ClickHouse, plus diagnostics and BI tool patterns.

## 1. Sort key design

The sort key is the most important performance lever in ClickHouse. It determines the order of rows on disk, which determines how much data ClickHouse reads to satisfy a filter.

ClickHouse stores data in granules (default: 8,192 rows each) and reads the minimum number of granules that could contain rows matching your WHERE clause. The primary key index maps the first row of each granule, so if rows are sorted by `(org_id, timestamp)`, a query filtering on `org_id = 'acme'` skips all granules where the entire range belongs to other orgs.

**Wrong sort key:**

```sql
CREATE TABLE events (
    event_time  DateTime,
    user_id     String,
    org_id      String,
    event_type  LowCardinality(String)
)
ENGINE = MergeTree
ORDER BY (user_id, event_time);  -- but queries filter on org_id first
```

A query `WHERE org_id = 'acme' AND event_time >= ...` scans the full table because org_id is not in the sort key. ClickHouse reads every granule to find matching rows.

**Right sort key:**

```sql
ORDER BY (org_id, event_time, event_type, user_id)
```

Now `WHERE org_id = 'acme' AND event_time >= now() - 7 DAY` reads only the granules belonging to that org in that time range. At 100 billion rows with 1,000 orgs, this is the difference between scanning 100 billion rows and scanning 100 million.

Rules:

- Put your highest-cardinality filter first, usually tenant or org ID.
- Time column second. Most queries have a time range.
- Additional filter columns after time.
- Never put a column you never filter on in the sort key.

## 2. LowCardinality on dimension columns

`LowCardinality` is one of the highest-impact optimizations available for free in ClickHouse. It applies dictionary encoding to columns with a bounded set of distinct values (typically fewer than 10,000), which reduces memory usage for GROUP BY operations and speeds up filtering significantly.

```sql
-- Without LowCardinality: stores raw string bytes for every row
event_type  String,

-- With LowCardinality: stores an integer dictionary key per row
event_type  LowCardinality(String),
```

For event tracking schemas, apply `LowCardinality` to: event name/type, platform, device type, country, plan tier, status values, any string with a bounded set of values. The compression improvement is typically 3-10x on those columns, and GROUP BY on `LowCardinality` columns uses integer comparison rather than string comparison.

Don't apply `LowCardinality` to high-cardinality columns like user_id, session_id, or order_id. The overhead of managing a dictionary larger than 65,535 values negates the benefit.

## 3. Avoid functions on sort key columns

Once you have the right sort key, the most common way to bypass it is wrapping a sort key column in a function in your WHERE clause. ClickHouse can't use the sort index when the filter column is transformed.

```sql
-- All of these break sort key usage
WHERE toDate(event_time) = today()
WHERE toString(org_id) = 'acme'
WHERE lower(event_type) = 'purchase'
WHERE toYYYYMM(event_time) = 202605
```

All of these force a full scan. Rewrite as range conditions on the raw column:

```sql
-- Uses sort key correctly
WHERE event_time >= today() AND event_time < today() + 1
WHERE org_id = 'acme'
WHERE event_type = 'purchase'
WHERE event_time >= '2026-05-01' AND event_time < '2026-06-01'
```

This is the most common performance issue when BI tools generate SQL against ClickHouse. Tools like Grafana, Metabase, and Tableau sometimes wrap date columns in functions. The [5 rules for writing faster SQL queries](https://www.tinybird.co/blog/5-rules-for-writing-faster-sql-queries) in ClickHouse all trace back to respecting the sort key and partition boundaries.

## 4. PREWHERE for selective filters

`PREWHERE` is a ClickHouse-specific optimization that reads a filter column first, finds matching rows, and then reads the remaining columns only for those rows. It's most effective when the filter is highly selective and the table is wide.

```sql
-- Standard WHERE: reads all columns, then filters
SELECT user_id, event_type, properties
FROM events
WHERE event_time >= now() - INTERVAL 7 DAY
  AND event_name = 'purchase';

-- PREWHERE: reads event_name first, finds matching rows, then reads other columns
SELECT user_id, event_type, properties
FROM events
PREWHERE event_name = 'purchase'
WHERE event_time >= now() - INTERVAL 7 DAY;
```

On a wide table where `event_name = 'purchase'` matches 0.5% of rows, `PREWHERE` reduces I/O by ~99.5% for the non-PREWHERE columns. ClickHouse applies this optimization automatically in many cases, but explicitly specifying `PREWHERE` ensures it applies when your most selective filter is on a non-sort-key column.

Don't use `PREWHERE` with:

- Columns that are part of the sort key (the optimizer already uses them efficiently).
- Non-selective filters (filters matching >10% of rows gain little from PREWHERE).
- `Nullable` columns (requires extra null checks that can negate the benefit).

For wide tables with many columns and selective attribute filters, `PREWHERE` consistently shows 5-20x I/O reduction. The [ClickHouse query optimization](https://www.tinybird.co/blog/clickhouse-query-optimization) documentation covers when the optimizer applies it automatically and when to force it manually.

## 5. Replace COUNT(DISTINCT) with uniq()

`COUNT(DISTINCT column)` is exact but expensive at scale. ClickHouse uses a hash set to track distinct values, which requires memory proportional to cardinality and grows linearly with the number of rows.

`uniq()` uses HyperLogLog with ~2% error and is 10-100x faster on high-cardinality columns. For most product analytics use cases, the approximation is imperceptible:

```sql
-- Slow: exact distinct count
SELECT COUNT(DISTINCT user_id) FROM events WHERE event_time >= today() - 7;

-- Fast: ~2% error, 10-100x faster
SELECT uniq(user_id) FROM events WHERE event_time >= today() - 7;
```

For low-cardinality columns (dozens to hundreds of distinct values), the difference is negligible either way. Use `uniqExact()` when exactness is contractually required (billing, compliance). Use `uniq()` for dashboards and product metrics.

The same pattern applies to approximate percentiles. `quantile(0.95)(duration_ms)` is much faster than `quantileExact(0.95)(duration_ms)` at scale, with error rates low enough for SLA monitoring.

## 6. Pre-aggregate with materialized views

Scanning billions of raw events for every dashboard query is the most common reason ClickHouse deployments feel slow after initial setup. The data is large, queries scan a lot of it, and dashboards that load in seconds at first take 10+ seconds as the table grows.

The fix is pre-aggregation. A materialized view computes aggregates at insert time and writes them to a rollup table. Dashboard queries hit the rollup, not the raw table. The pattern and options for [materialized views in ClickHouse](https://www.tinybird.co/blog/backfilling-materialized-views-diy-vs-tinybird-and-best-practices) include backfilling historical data when you add a new rollup.

```sql
CREATE TABLE events_daily
(
    day             Date,
    org_id          String,
    event_name      LowCardinality(String),
    events          UInt64,
    unique_users    AggregateFunction(uniq, String),
    revenue         AggregateFunction(sum, Float64)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (org_id, day, event_name);

CREATE MATERIALIZED VIEW events_daily_mv TO events_daily AS
SELECT
    toDate(event_time)      AS day,
    org_id,
    event_name,
    count()                 AS events,
    uniqState(user_id)      AS unique_users,
    sumState(revenue)       AS revenue
FROM events
GROUP BY day, org_id, event_name;

-- Dashboard query reads rollup, not raw events
SELECT
    day,
    event_name,
    sum(events)             AS total_events,
    uniqMerge(unique_users) AS unique_users,
    sumMerge(revenue)       AS revenue
FROM events_daily
WHERE org_id = 'acme'
  AND day >= today() - 30
GROUP BY day, event_name
ORDER BY day;
```

A 30-day dashboard query now reads at most 30 rows per org/event-name combination, regardless of whether the underlying table has 1 billion or 1 trillion rows. Query latency stays flat as data grows.

## Diagnosing slow queries

ClickHouse's `system.query_log` records every query with rows read, bytes read, and duration. Use it to find what's slow:

```sql
SELECT
    substr(query, 1, 120)           AS query_preview,
    read_rows,
    formatReadableSize(read_bytes)  AS bytes_read,
    query_duration_ms,
    user
FROM system.query_log
WHERE type = 'QueryFinish'
  AND event_time >= now() - INTERVAL 1 HOUR
  AND query_duration_ms > 500
ORDER BY read_bytes DESC
LIMIT 20;
```

A query with high `read_bytes` relative to `read_rows` is reading wide rows, often `SELECT *` from a wide table. A query with high `read_rows` but low `read_bytes` is reading narrow data but scanning too many rows, usually a sort key miss.

For deeper diagnosis, use `EXPLAIN PIPELINE` to see the query execution plan:

```sql
EXPLAIN PIPELINE
SELECT org_id, count(), uniq(user_id)
FROM events
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY org_id;
```

The pipeline output shows which indexes are used, how many granules are read, and where parallelism happens. A query using no index shows `FullScan`. A query using the sort key shows `PrimaryKeyCondition`. If you see `FullScan` on a query you expect to be indexed, check for function wrapping on filter columns.

## Common fixes for BI tool-generated SQL

When a BI tool connects to ClickHouse, the SQL it generates often bypasses the optimizations above:

| BI-generated SQL | Problem | Fix |
| --- | --- | --- |
| `SELECT *` | Reads all columns | Define a view with only the needed columns |
| `WHERE toDate(ts) = ...` | Wraps sort key | Rewrite as range filter in the BI tool |
| `COUNT(DISTINCT x)` | Slow exact count | Use `uniq()` in a pre-aggregated layer |
| Query hits raw table | No pre-aggregation | Point BI tool at rollup table |
| No partition filter | Reads all partitions | Add `event_time` filter to every query |

The cleanest fix is to create dedicated views or rollup tables that the BI tool queries, designed so that even a `SELECT *` on them is fast. The view selects only the columns the BI tool uses, filters to the relevant time range, and reads from the pre-aggregated table.

## Tinybird for fast query APIs

For query APIs that need to be fast under concurrent load, Tinybird is managed ClickHouse with [analytics at scale](https://www.tinybird.co/blog/analytics-at-scale) and observability built in. Pipes let you define the optimized query once (with proper sort key usage, PREWHERE, and rollup tables) and expose it as a parameterized HTTP endpoint. The built-in query analytics show you latency percentiles, cache hit rates, and read bytes per endpoint, so you can identify regressions before your users do.

{% cta
  title="Fast ClickHouse queries, managed"
  text="Tinybird is managed ClickHouse with SQL endpoints, observability, and automatic scaling. Ship in hours, not weeks."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
