ClickHouse® is a columnar database built for exactly the queries analysts run most: aggregations across large date ranges, GROUP BY across multiple dimensions, COUNT DISTINCT on high-cardinality columns, and time-series rollups. Queries that take minutes in Postgres take milliseconds here.
ClickHouse is "a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP)." Unlike row-store databases, where even a query touching two columns reads entire rows from disk, ClickHouse stores each column independently. A query that only needs event_time, event_type, and user_id reads exactly those three columns and nothing else.
It runs standard SQL. If you know SQL, you can query ClickHouse today. There are a few extensions worth knowing, a few patterns that unlock the most common analyst workflows, and a few gotchas that catch people coming from row-store databases. This post covers all three.
If you're using Tinybird, managed ClickHouse with a SQL editor and API layer, every example here runs exactly as written in the Tinybird workspace.
What ClickHouse is fast at
The queries that are painful in row-store databases are ClickHouse's native workload. If you're coming from a warehouse like BigQuery or Snowflake, the positioning is different, ClickHouse is optimized for analytics at scale rather than scheduled batch queries:
-- Count distinct users who performed an action, by day, last 90 days
SELECT
toDate(event_time) AS day,
event_type,
count() AS events,
uniq(user_id) AS unique_users
FROM events
WHERE event_time >= now() - INTERVAL 90 DAY
GROUP BY day, event_type
ORDER BY day;
This query scans 90 days of event data across potentially billions of rows. In ClickHouse, with a well-designed schema, it returns in under a second. The reason is physical: because column values are stored sequentially on disk, only the four referenced columns are read. Everything else stays compressed and unread.
This block-wise storage alignment is specifically what makes analytical queries faster, no unnecessary data is loaded, compared to row-based storage, where entire rows including irrelevant columns are always read.
The SQL you already know, plus some useful extensions
ClickHouse supports the SQL standard you're used to: SELECT, WHERE, GROUP BY, ORDER BY, JOIN, HAVING, window functions, CTEs. A few additions are worth learning early.
uniq() instead of COUNT(DISTINCT ...)
COUNT(DISTINCT col) in standard SQL is exact but expensive at scale. ClickHouse's uniq() uses a HyperLogLog sketch, approximately 2% error, but orders of magnitude faster on large datasets. For even faster approximation with slightly less accuracy, uniqHLL12() uses a smaller sketch:
-- Approximate (fast, ~2% error)
SELECT uniq(user_id) AS unique_users FROM events;
-- Even faster approximation (useful on very large tables)
SELECT uniqHLL12(user_id) AS unique_users FROM events;
-- Exact (slower, uses more memory)
SELECT uniqExact(user_id) AS unique_users FROM events;
For most analyst use cases, understanding scale, spotting trends, comparing segments, the approximation is fine. Use uniqExact() when you need the precise number, such as billing reconciliation or compliance reporting.
countIf() and sumIf() for conditional aggregation
Instead of multiple subqueries or CASE statements, use the If variants. One pass over the data, multiple conditional aggregations:
SELECT
campaign_id,
countIf(event_type = 'impression') AS impressions,
countIf(event_type = 'click') AS clicks,
round(
countIf(event_type = 'click') /
nullIf(countIf(event_type = 'impression'), 0) * 100,
2
) AS ctr_pct,
sumIf(revenue, event_type = 'conversion') AS revenue
FROM ad_events
WHERE event_time >= now() - INTERVAL 7 DAY
GROUP BY campaign_id
ORDER BY impressions DESC;
nullIf(x, 0) prevents division by zero when a campaign has impressions but no clicks, returning NULL instead of an error.
PREWHERE for selective filters
ClickHouse's PREWHERE is a performance optimization not available in standard SQL. It reads only the filter column first, identifies matching rows, and then reads the remaining columns only for those rows:
-- Standard WHERE: reads all columns for all rows, then filters
SELECT user_id, revenue FROM events
WHERE event_type = 'conversion';
-- PREWHERE: reads event_type first, then user_id/revenue only for matches
SELECT user_id, revenue FROM events
PREWHERE event_type = 'conversion';
On a table with many columns where the filter is highly selective, PREWHERE can cut I/O dramatically. On a table with many columns where the filter is highly selective, PREWHERE can cut I/O dramatically. In Tinybird, the query optimizer applies it automatically in many cases, but knowing the concept helps you write queries that the optimizer can work with.
Date and time functions
ClickHouse has a rich set of date functions more ergonomic than standard SQL:
-- Truncate to time buckets (useful for GROUP BY)
toStartOfHour(event_time)
toStartOfDay(event_time)
toStartOfWeek(event_time) -- Monday
toStartOfMonth(event_time)
-- Relative time filters
WHERE event_time >= now() - INTERVAL 7 DAY
WHERE event_time >= today() - 30
WHERE toDate(event_time) = yesterday()
-- Extract components
toDayOfWeek(event_time) -- 1 = Monday, 7 = Sunday
toHour(event_time) -- 0-23
toISOWeek(event_time) -- ISO week number
Array and JSON functions for semi-structured data
If your events store extra attributes as JSON or arrays:
-- Extract typed fields from JSON
SELECT
JSONExtractString(properties, 'plan') AS plan,
JSONExtractInt(properties, 'seat_count') AS seats,
JSONExtractFloat(properties, 'mrr') AS mrr
FROM accounts;
-- Flatten arrays into rows (useful for tag analysis)
SELECT arrayJoin(tags) AS tag, count() AS cnt
FROM content
GROUP BY tag
ORDER BY cnt DESC;
In Tinybird, the workspace UI shows column types and lets you autocomplete JSON paths once your datasource is loaded, useful when exploring event schemas you didn't write yourself.
Funnel analysis
Funnel queries are one of the most common analyst requests. ClickHouse handles them efficiently with countIf and the built-in windowFunnel function.
A simple step-by-step funnel:
SELECT
countIf(event = 'signed_up') AS step_1_signup,
countIf(event = 'onboarding_completed') AS step_2_onboarding,
countIf(event = 'first_query_run') AS step_3_first_query,
round(countIf(event = 'onboarding_completed') /
nullIf(countIf(event = 'signed_up'), 0) * 100, 1) AS step1_to_step2_pct,
round(countIf(event = 'first_query_run') /
nullIf(countIf(event = 'onboarding_completed'), 0) * 100, 1) AS step2_to_step3_pct
FROM user_events
WHERE event_time >= now() - INTERVAL 30 DAY;
For strict ordered funnels, where a user must complete step N before counting for step N+1, use windowFunnel:
SELECT level, count() AS users
FROM (
SELECT
user_id,
windowFunnel(7 * 86400)( -- 7-day window in seconds
event_time,
event = 'signed_up',
event = 'onboarding_completed',
event = 'first_query_run',
event = 'invited_teammate'
) AS level
FROM user_events
WHERE event_time >= now() - INTERVAL 90 DAY
GROUP BY user_id
)
GROUP BY level
ORDER BY level;
windowFunnel returns the deepest step reached within the time window. In Tinybird, this becomes a Pipe endpoint your product team calls with ?days=30 or ?days=90, no SQL access required on their side.
Cohort retention analysis
Retention is a weekly or monthly analysis in most BI tools. In ClickHouse, a full cohort retention table runs in seconds even with years of history:
WITH
cohorts AS (
SELECT user_id, toStartOfWeek(min(event_time)) AS cohort_week
FROM user_events
WHERE event = 'signed_up'
GROUP BY user_id
),
activity AS (
SELECT DISTINCT user_id, toStartOfWeek(event_time) AS active_week
FROM user_events
WHERE event_time >= now() - INTERVAL 52 WEEK
)
SELECT
c.cohort_week,
dateDiff('week', c.cohort_week, a.active_week) AS weeks_since_signup,
count(DISTINCT a.user_id) AS retained_users,
count(DISTINCT c2.user_id) AS cohort_size,
round(count(DISTINCT a.user_id) /
nullIf(count(DISTINCT c2.user_id), 0) * 100, 1) AS retention_pct
FROM cohorts c
JOIN cohorts c2 ON c.cohort_week = c2.cohort_week
JOIN activity a ON c.user_id = a.user_id
WHERE dateDiff('week', c.cohort_week, a.active_week) BETWEEN 0 AND 12
GROUP BY c.cohort_week, weeks_since_signup
ORDER BY c.cohort_week, weeks_since_signup;
Percentile and distribution analysis
Standard SQL's AVG hides a lot. For latency, revenue, and engagement, percentiles tell a more honest story:
SELECT
endpoint,
count() AS requests,
avg(duration_ms) AS avg_ms,
quantile(0.50)(duration_ms) AS p50_ms,
quantile(0.90)(duration_ms) AS p90_ms,
quantile(0.99)(duration_ms) AS p99_ms
FROM api_requests
WHERE timestamp >= now() - INTERVAL 24 HOUR
GROUP BY endpoint
ORDER BY p99_ms DESC;
quantile() uses T-Digest approximation. For exact percentiles in SLA reporting, use quantileExact().
Period-over-period comparison with lagInFrame
lagInFrame lets you access the previous period's value for period-over-period comparisons without a self-join:
SELECT
day,
revenue,
lagInFrame(revenue, 1, 0) OVER (ORDER BY day) AS prev_day_revenue,
round(
(revenue - lagInFrame(revenue, 1, 0) OVER (ORDER BY day)) /
nullIf(lagInFrame(revenue, 1, 0) OVER (ORDER BY day), 0) * 100,
1
) AS pct_change
FROM daily_revenue
ORDER BY day;
This is cleaner than a self-join and significantly faster on large tables.
Window functions for rankings and running totals
ClickHouse supports full SQL window function syntax:
-- 7-day rolling average revenue
SELECT
day,
revenue,
round(avg(revenue) OVER (
ORDER BY day
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS rolling_7d_avg
FROM daily_revenue
ORDER BY day;
-- Rank users by spend within each plan tier
SELECT
user_id,
plan,
total_spend,
rank() OVER (PARTITION BY plan ORDER BY total_spend DESC) AS spend_rank
FROM user_spend
WHERE month = '2026-05';
Reading query performance
ClickHouse returns statistics with every result. In the Tinybird query editor, check elapsed, rows_read, and bytes_read at the bottom of every query run:
{
"statistics": {
"elapsed": 0.043,
"rows_read": 4820000,
"bytes_read": 96400000
}
}
43ms to scan 4.8 million rows. If rows_read is much higher than you expected, your query is scanning too much. Common causes:
- A function wraps an indexed column:
toDate(timestamp) = '2026-05-25'disables the index. Rewrite astimestamp >= '2026-05-25' AND timestamp < '2026-05-26' - The time filter column isn't first in the table's
ORDER BYkey - No partition key, so a monthly filter still reads the whole table
Sampling for fast exploration
When exploring a large dataset for the first time, use SAMPLE to get a representative subset without waiting for a full scan:
-- Query 1% of the data, scale results ×100
SELECT event_type, count() * 100 AS estimated_count
FROM events SAMPLE 0.01
WHERE event_time >= now() - INTERVAL 30 DAY
GROUP BY event_type
ORDER BY estimated_count DESC;
Results scale linearly. This is how you explore a 10-billion-row dataset in under a second before deciding which query to optimize. More patterns are in how to make SQL database faster.
Sharing analysis with your team via Tinybird
One of the recurring friction points for analysts is re-running the same query when stakeholders ask for slightly different cuts of the data. Tinybird solves this with Pipes: SQL queries you write once and publish as parameterized HTTP endpoints.
Write the cohort analysis query, publish it as an endpoint:
-- retention_by_cohort.pipe
NODE cohorts
SQL >
SELECT user_id, toStartOfWeek(min(event_time)) AS cohort_week
FROM user_events
WHERE event = 'signed_up'
GROUP BY user_id
NODE retention
SQL >
%
SELECT
c.cohort_week,
dateDiff('week', c.cohort_week, a.active_week) AS week_number,
round(count(DISTINCT a.user_id) /
nullIf(count(DISTINCT c2.user_id), 0) * 100, 1) AS retention_pct
FROM cohorts c
JOIN cohorts c2 ON c.cohort_week = c2.cohort_week
JOIN (
SELECT DISTINCT user_id, toStartOfWeek(event_time) AS active_week
FROM user_events
WHERE event_time >= now() - INTERVAL {{ Int32(weeks, 12) }} WEEK
) a ON c.user_id = a.user_id
WHERE week_number BETWEEN 0 AND {{ Int32(max_week, 8) }}
GROUP BY c.cohort_week, week_number
ORDER BY c.cohort_week, week_number
TYPE ENDPOINT
Your product team calls retention_by_cohort.json?weeks=24&max_week=12 themselves. Your dashboard connects to the same URL. You update the SQL once and everyone gets fresh results. No Slack message asking you to re-run the pivot table.
Tinybird also connects to BI tools, Grafana, Metabase, Superset, via its ClickHouse-compatible query interface. If your team already has a dashboard tool, you can point it at Tinybird and get ClickHouse performance without changing workflows.
What ClickHouse is NOT good at
Point lookups are slower than Postgres. Fetching a single row by primary key is what Postgres is optimized for. ClickHouse's strength is scanning millions of rows, not returning one.
Updates and deletes are expensive. ClickHouse is append-first. ALTER TABLE ... DELETE and UPDATE are asynchronous and heavyweight. Don't use it for data that changes frequently at the row level.
JOINs on two large tables require care. ClickHouse loads the right-side table into memory during a hash join. Filter aggressively before joining, and put the smaller table on the right:
SELECT e.user_id, u.plan
FROM events e
INNER JOIN (
SELECT user_id, plan FROM users WHERE plan IN ('pro', 'enterprise')
) u ON e.user_id = u.user_id
WHERE e.event_time >= now() - INTERVAL 30 DAY;
