Product analytics is essentially a series of questions against an event log: who did what, in what order, how often, and when did they stop. The database that answers those questions fast, over the full history of your event data, is the one that makes product decisions fast.
ClickHouse® is the engine behind several major product analytics platforms. Its columnar storage, vectorized aggregation, and high-cardinality GROUP BY performance are a near-perfect match for event tracking workloads. Mixpanel, Amplitude, and PostHog all use ClickHouse under the hood. The event-driven architecture best practices for high-volume event systems consistently point to columnar storage for the analytical layer.
This post covers how to design an event tracking schema in ClickHouse, implement session analysis, run funnel and retention queries, compute active user metrics, and expose those analytics as real-time API endpoints.
Event schema design
The foundational decision for event tracking in ClickHouse is the schema. Event data is append-only, which makes MergeTree the right engine. The sort key determines which queries are fast.
CREATE TABLE events
(
event_time DateTime64(3),
event_name LowCardinality(String),
user_id String,
session_id String,
device_id String,
platform LowCardinality(String), -- 'web', 'ios', 'android'
country LowCardinality(String),
properties String, -- JSON blob for flexible properties
-- Materialized columns extracted from JSON for fast access
product_id String MATERIALIZED JSONExtractString(properties, 'product_id'),
plan LowCardinality(String) MATERIALIZED JSONExtractString(properties, 'plan'),
revenue Float64 MATERIALIZED JSONExtractFloat(properties, 'revenue')
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_name, toStartOfHour(event_time), user_id);
On the JSON properties column. Storing flexible event properties as JSON in a single String column is the standard approach. It avoids the schema churn that comes from adding new event properties. The materialized columns extract the values you query most frequently at insert time, so they're stored separately and indexed without re-parsing the JSON on every query.
On sort key order. (event_name, event_time, user_id) is correct for event tracking. Most product analytics queries filter on event name first (WHERE event_name = 'signup'), then time, then optionally by user. Flipping to (user_id, event_time, event_name) optimizes for per-user queries at the expense of event-level aggregations, which are more common in product analytics workloads.
Session analysis
A session is a sequence of events from the same user within an inactivity timeout. Computing sessions in ClickHouse requires finding gaps in each user's event stream. The standard approach uses a window function to detect new sessions:
SELECT
user_id,
session_id,
min(event_time) AS session_start,
max(event_time) AS session_end,
count() AS events_in_session,
dateDiff('second', min(event_time), max(event_time)) AS duration_sec
FROM (
SELECT
user_id,
event_time,
sum(is_new_session) OVER (
PARTITION BY user_id
ORDER BY event_time
) AS session_id
FROM (
SELECT
user_id,
event_time,
if(
dateDiff('second',
lagInFrame(event_time, 1, event_time) OVER (
PARTITION BY user_id ORDER BY event_time
),
event_time
) > 1800, -- 30-minute inactivity timeout
1, 0
) AS is_new_session
FROM events
WHERE event_time >= today() - 7
)
)
GROUP BY user_id, session_id
ORDER BY session_start DESC;
For most production use, pre-compute sessions in a materialized view rather than computing them on every query. Session boundaries detected at insert time result in a sessions table that aggregates event-level data into session summaries. Investigation queries can then join sessions to raw events without recomputing the sessionization logic each time.
Session quality metrics give you a richer picture than session counts alone:
SELECT
platform,
toDate(session_start) AS day,
count() AS sessions,
avg(duration_sec) AS avg_duration_sec,
avg(events_in_session) AS avg_events,
countIf(duration_sec < 10) AS bounces,
round(countIf(duration_sec < 10) / count() * 100, 1) AS bounce_rate
FROM sessions
WHERE session_start >= today() - 30
GROUP BY platform, day
ORDER BY day DESC, sessions DESC;
Funnel analysis
Funnel analysis, how many users complete each step of a defined sequence, is the most common product analytics query. ClickHouse's windowFunnel function computes this natively:
-- Signup funnel: account_created -> onboarding_completed -> first_key_action -> upgrade
SELECT
platform,
count() AS users_entered,
countIf(funnel_step >= 2) AS completed_step_2,
countIf(funnel_step >= 3) AS completed_step_3,
countIf(funnel_step >= 4) AS converted,
round(countIf(funnel_step >= 4) / count() * 100, 1) AS overall_cvr
FROM (
SELECT
user_id,
argMax(platform, event_time) AS platform,
windowFunnel(604800)( -- 7-day window
event_time,
event_name = 'account_created',
event_name = 'onboarding_completed',
event_name = 'first_key_action',
event_name = 'upgrade'
) AS funnel_step
FROM events
WHERE event_time >= today() - 30
AND event_name IN ('account_created', 'onboarding_completed', 'first_key_action', 'upgrade')
GROUP BY user_id
)
GROUP BY platform
ORDER BY users_entered DESC;
windowFunnel(604800) requires the steps to happen in order within a 7-day window. Users who complete step 2 before step 1, or who complete the sequence but take longer than 7 days, don't count. This matches how product teams actually think about funnel completion and is how ClickHouse streaming analytics platforms implement behavioral funnels.
Retention analysis
Retention measures how many users from a starting cohort return on subsequent days or weeks:
SELECT
cohort_date,
days_after,
retained_users,
cohort_size,
round(retained_users / cohort_size * 100, 1) AS retention_rate
FROM (
SELECT
toDate(first_event.event_time) AS cohort_date,
dateDiff('day', first_event.event_time, r.event_time) AS days_after,
uniq(r.user_id) AS retained_users
FROM events AS r
INNER JOIN (
SELECT user_id, min(event_time) AS event_time
FROM events
WHERE event_name = 'account_created'
GROUP BY user_id
) AS first_event USING (user_id)
WHERE r.event_name = 'key_action'
AND r.event_time >= first_event.event_time
AND days_after <= 30
GROUP BY cohort_date, days_after
) AS retained
INNER JOIN (
SELECT toDate(min(event_time)) AS cohort_date, uniq(user_id) AS cohort_size
FROM events
WHERE event_name = 'account_created'
GROUP BY cohort_date
) AS cohorts USING (cohort_date)
ORDER BY cohort_date, days_after;
This returns day-1, day-7, day-14, day-30 retention broken out by signup cohort. For a high-level view, aggregate cohort_date by week or month. For feature-level retention (which features correlate with day-30 retention), add a feature engagement flag to the right side of the join.
Active user counts (DAU/WAU/MAU)
SELECT
toDate(event_time) AS day,
uniq(user_id) AS dau,
uniqIf(user_id,
event_time >= toDate(event_time) - 6
) AS wau_rolling,
uniqIf(user_id,
event_time >= toDate(event_time) - 29
) AS mau_rolling,
round(
uniq(user_id) /
uniqIf(user_id, event_time >= toDate(event_time) - 29) * 100, 1
) AS dau_mau_ratio
FROM events
WHERE event_name != 'page_view'
AND event_time >= today() - 90
GROUP BY day
ORDER BY day DESC;
The DAU/MAU ratio (stickiness) is a product health metric that's trivial to compute in ClickHouse but expensive in a traditional warehouse because it requires scanning the full event history for each day's MAU calculation.
For high-volume products, uniq() (HyperLogLog, ~2% error) is substantially faster than uniqExact(). The difference between 47,823 and 47,906 daily active users is not actionable. Use uniqExact() only where exact counts are contractually required.
Pre-aggregated metrics table
For dashboards that product managers load throughout the day, pre-aggregate core metrics into a daily rollup. Materialized views compute these aggregates at insert time, meaning dashboard queries hit a pre-computed table rather than the full event stream:
CREATE TABLE product_metrics_daily
(
day Date,
event_name LowCardinality(String),
platform LowCardinality(String),
country LowCardinality(String),
events UInt64,
unique_users AggregateFunction(uniq, String),
unique_sessions AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (day, event_name, platform, country);
CREATE MATERIALIZED VIEW product_metrics_daily_mv TO product_metrics_daily AS
SELECT
toDate(event_time) AS day,
event_name,
platform,
country,
count() AS events,
uniqState(user_id) AS unique_users,
uniqState(session_id) AS unique_sessions
FROM events
GROUP BY day, event_name, platform, country;
Dashboard queries read from product_metrics_daily with uniqMerge(unique_users). The raw event table stays for investigation queries. Both paths stay fast regardless of how many total events are in the base table.
Real-time alerting with materialized views
For monitoring critical product events in real time, such as error rate spikes, payment failures, or unexpected drops in key metrics, a materialized view writing to a summary table provides the aggregated signal:
CREATE TABLE error_summary_5m
(
window_start DateTime,
error_type LowCardinality(String),
platform LowCardinality(String),
error_count UInt64,
affected_users AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
ORDER BY (window_start, error_type, platform);
CREATE MATERIALIZED VIEW error_summary_5m_mv TO error_summary_5m AS
SELECT
toStartOfFiveMinutes(event_time) AS window_start,
JSONExtractString(properties, 'error_type') AS error_type,
platform,
countState() AS error_count,
uniqState(user_id) AS affected_users
FROM events
WHERE event_name = 'error'
GROUP BY window_start, error_type, platform;
An alerting pipeline queries this table every minute for windows where error count or affected users exceed a threshold. The query reads from the pre-aggregated summary rather than scanning raw events, staying fast even at millions of events per second.
Real-time event API with Tinybird
For product analytics dashboards that engineering, product, or customers see, ClickHouse queries need to be exposed as HTTP endpoints with access control. Tinybird handles this with Pipes, turning your SQL into a parameterized JSON endpoint. For real-time data ingestion, the Events API makes events queryable within seconds:
-- event_counts.pipe
NODE counts
SQL >
%
SELECT
toDate(event_time) AS day,
event_name,
count() AS events,
uniq(user_id) AS unique_users
FROM events
WHERE org_id = {{ String(org_id, required=True) }}
AND event_time >= {{ DateTime(start_date, '2026-01-01 00:00:00') }}
AND event_time < {{ DateTime(end_date, '2026-12-31 23:59:59') }}
GROUP BY day, event_name
ORDER BY day, events DESC
TYPE ENDPOINT
Your frontend calls /event_counts.json?org_id=acme&start_date=2026-05-01&end_date=2026-06-01. ClickHouse runs it in milliseconds. For build real-time apps patterns where product usage data drives in-product features, this endpoint pattern is how you get sub-50ms latency at the p99.
