ClickHouse event tracking

Product analytics is essentially a series of questions against an event log: who did what, in what order, how often, and when did they stop. The database that answers those questions fast, over the full history of your event data, is the one that makes product decisions fast.

ClickHouse® is the engine behind several major product analytics platforms. Its columnar storage, vectorized aggregation, and high-cardinality GROUP BY performance are a near-perfect match for event tracking workloads. Mixpanel, Amplitude, and PostHog all use ClickHouse under the hood. The event-driven architecture best practices for high-volume event systems consistently point to columnar storage for the analytical layer.

This post covers how to design an event tracking schema in ClickHouse, implement session analysis, run funnel and retention queries, compute active user metrics, and expose those analytics as real-time API endpoints.

Event schema design

The foundational decision for event tracking in ClickHouse is the schema. Event data is append-only, which makes MergeTree the right engine. The sort key determines which queries are fast.

CREATE TABLE events
(
    event_time      DateTime64(3),
    event_name      LowCardinality(String),
    user_id         String,
    session_id      String,
    device_id       String,
    platform        LowCardinality(String),  -- 'web', 'ios', 'android'
    country         LowCardinality(String),
    properties      String,                  -- JSON blob for flexible properties
    -- Materialized columns extracted from JSON for fast access
    product_id      String                   MATERIALIZED JSONExtractString(properties, 'product_id'),
    plan            LowCardinality(String)   MATERIALIZED JSONExtractString(properties, 'plan'),
    revenue         Float64                  MATERIALIZED JSONExtractFloat(properties, 'revenue')
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_name, toStartOfHour(event_time), user_id);

On the JSON properties column. Storing flexible event properties as JSON in a single String column is the standard approach. It avoids the schema churn that comes from adding new event properties. The materialized columns extract the values you query most frequently at insert time, so they're stored separately and indexed without re-parsing the JSON on every query.

On sort key order. (event_name, event_time, user_id) is correct for event tracking. Most product analytics queries filter on event name first (WHERE event_name = 'signup'), then time, then optionally by user. Flipping to (user_id, event_time, event_name) optimizes for per-user queries at the expense of event-level aggregations, which are more common in product analytics workloads.

Session analysis

A session is a sequence of events from the same user within an inactivity timeout. Computing sessions in ClickHouse requires finding gaps in each user's event stream. The standard approach uses a window function to detect new sessions:

SELECT
    user_id,
    session_id,
    min(event_time)             AS session_start,
    max(event_time)             AS session_end,
    count()                     AS events_in_session,
    dateDiff('second', min(event_time), max(event_time)) AS duration_sec
FROM (
    SELECT
        user_id,
        event_time,
        sum(is_new_session) OVER (
            PARTITION BY user_id
            ORDER BY event_time
        )                                               AS session_id
    FROM (
        SELECT
            user_id,
            event_time,
            if(
                dateDiff('second',
                    lagInFrame(event_time, 1, event_time) OVER (
                        PARTITION BY user_id ORDER BY event_time
                    ),
                    event_time
                ) > 1800,  -- 30-minute inactivity timeout
                1, 0
            ) AS is_new_session
        FROM events
        WHERE event_time >= today() - 7
    )
)
GROUP BY user_id, session_id
ORDER BY session_start DESC;

For most production use, pre-compute sessions in a materialized view rather than computing them on every query. Session boundaries detected at insert time result in a sessions table that aggregates event-level data into session summaries. Investigation queries can then join sessions to raw events without recomputing the sessionization logic each time.

Session quality metrics give you a richer picture than session counts alone:

SELECT
    platform,
    toDate(session_start)                                   AS day,
    count()                                                 AS sessions,
    avg(duration_sec)                                       AS avg_duration_sec,
    avg(events_in_session)                                  AS avg_events,
    countIf(duration_sec < 10)                              AS bounces,
    round(countIf(duration_sec < 10) / count() * 100, 1)   AS bounce_rate
FROM sessions
WHERE session_start >= today() - 30
GROUP BY platform, day
ORDER BY day DESC, sessions DESC;

Funnel analysis

Funnel analysis, how many users complete each step of a defined sequence, is the most common product analytics query. ClickHouse's windowFunnel function computes this natively:

-- Signup funnel: account_created -> onboarding_completed -> first_key_action -> upgrade
SELECT
    platform,
    count()                     AS users_entered,
    countIf(funnel_step >= 2)   AS completed_step_2,
    countIf(funnel_step >= 3)   AS completed_step_3,
    countIf(funnel_step >= 4)   AS converted,
    round(countIf(funnel_step >= 4) / count() * 100, 1) AS overall_cvr
FROM (
    SELECT
        user_id,
        argMax(platform, event_time) AS platform,
        windowFunnel(604800)(         -- 7-day window
            event_time,
            event_name = 'account_created',
            event_name = 'onboarding_completed',
            event_name = 'first_key_action',
            event_name = 'upgrade'
        ) AS funnel_step
    FROM events
    WHERE event_time >= today() - 30
      AND event_name IN ('account_created', 'onboarding_completed', 'first_key_action', 'upgrade')
    GROUP BY user_id
)
GROUP BY platform
ORDER BY users_entered DESC;

windowFunnel(604800) requires the steps to happen in order within a 7-day window. Users who complete step 2 before step 1, or who complete the sequence but take longer than 7 days, don't count. This matches how product teams actually think about funnel completion and is how ClickHouse streaming analytics platforms implement behavioral funnels.

Retention analysis

Retention measures how many users from a starting cohort return on subsequent days or weeks:

SELECT
    cohort_date,
    days_after,
    retained_users,
    cohort_size,
    round(retained_users / cohort_size * 100, 1) AS retention_rate
FROM (
    SELECT
        toDate(first_event.event_time)                          AS cohort_date,
        dateDiff('day', first_event.event_time, r.event_time)   AS days_after,
        uniq(r.user_id)                                         AS retained_users
    FROM events AS r
    INNER JOIN (
        SELECT user_id, min(event_time) AS event_time
        FROM events
        WHERE event_name = 'account_created'
        GROUP BY user_id
    ) AS first_event USING (user_id)
    WHERE r.event_name = 'key_action'
      AND r.event_time >= first_event.event_time
      AND days_after <= 30
    GROUP BY cohort_date, days_after
) AS retained
INNER JOIN (
    SELECT toDate(min(event_time)) AS cohort_date, uniq(user_id) AS cohort_size
    FROM events
    WHERE event_name = 'account_created'
    GROUP BY cohort_date
) AS cohorts USING (cohort_date)
ORDER BY cohort_date, days_after;

This returns day-1, day-7, day-14, day-30 retention broken out by signup cohort. For a high-level view, aggregate cohort_date by week or month. For feature-level retention (which features correlate with day-30 retention), add a feature engagement flag to the right side of the join.

Active user counts (DAU/WAU/MAU)

SELECT
    toDate(event_time)      AS day,
    uniq(user_id)           AS dau,
    uniqIf(user_id,
        event_time >= toDate(event_time) - 6
    )                       AS wau_rolling,
    uniqIf(user_id,
        event_time >= toDate(event_time) - 29
    )                       AS mau_rolling,
    round(
        uniq(user_id) /
        uniqIf(user_id, event_time >= toDate(event_time) - 29) * 100, 1
    )                       AS dau_mau_ratio
FROM events
WHERE event_name != 'page_view'
  AND event_time >= today() - 90
GROUP BY day
ORDER BY day DESC;

The DAU/MAU ratio (stickiness) is a product health metric that's trivial to compute in ClickHouse but expensive in a traditional warehouse because it requires scanning the full event history for each day's MAU calculation.

For high-volume products, uniq() (HyperLogLog, ~2% error) is substantially faster than uniqExact(). The difference between 47,823 and 47,906 daily active users is not actionable. Use uniqExact() only where exact counts are contractually required.

Pre-aggregated metrics table

For dashboards that product managers load throughout the day, pre-aggregate core metrics into a daily rollup. Materialized views compute these aggregates at insert time, meaning dashboard queries hit a pre-computed table rather than the full event stream:

CREATE TABLE product_metrics_daily
(
    day             Date,
    event_name      LowCardinality(String),
    platform        LowCardinality(String),
    country         LowCardinality(String),
    events          UInt64,
    unique_users    AggregateFunction(uniq, String),
    unique_sessions AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(day)
ORDER BY (day, event_name, platform, country);

CREATE MATERIALIZED VIEW product_metrics_daily_mv TO product_metrics_daily AS
SELECT
    toDate(event_time)      AS day,
    event_name,
    platform,
    country,
    count()                 AS events,
    uniqState(user_id)      AS unique_users,
    uniqState(session_id)   AS unique_sessions
FROM events
GROUP BY day, event_name, platform, country;

Dashboard queries read from product_metrics_daily with uniqMerge(unique_users). The raw event table stays for investigation queries. Both paths stay fast regardless of how many total events are in the base table.

Real-time alerting with materialized views

For monitoring critical product events in real time, such as error rate spikes, payment failures, or unexpected drops in key metrics, a materialized view writing to a summary table provides the aggregated signal:

CREATE TABLE error_summary_5m
(
    window_start    DateTime,
    error_type      LowCardinality(String),
    platform        LowCardinality(String),
    error_count     UInt64,
    affected_users  AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
ORDER BY (window_start, error_type, platform);

CREATE MATERIALIZED VIEW error_summary_5m_mv TO error_summary_5m AS
SELECT
    toStartOfFiveMinutes(event_time)    AS window_start,
    JSONExtractString(properties, 'error_type') AS error_type,
    platform,
    countState()                        AS error_count,
    uniqState(user_id)                  AS affected_users
FROM events
WHERE event_name = 'error'
GROUP BY window_start, error_type, platform;

An alerting pipeline queries this table every minute for windows where error count or affected users exceed a threshold. The query reads from the pre-aggregated summary rather than scanning raw events, staying fast even at millions of events per second.

Real-time event API with Tinybird

For product analytics dashboards that engineering, product, or customers see, ClickHouse queries need to be exposed as HTTP endpoints with access control. Tinybird handles this with Pipes, turning your SQL into a parameterized JSON endpoint. For real-time data ingestion, the Events API makes events queryable within seconds:

-- event_counts.pipe
NODE counts
SQL >
  %
  SELECT
    toDate(event_time)  AS day,
    event_name,
    count()             AS events,
    uniq(user_id)       AS unique_users
  FROM events
  WHERE org_id   = {{ String(org_id, required=True) }}
    AND event_time >= {{ DateTime(start_date, '2026-01-01 00:00:00') }}
    AND event_time <  {{ DateTime(end_date, '2026-12-31 23:59:59') }}
  GROUP BY day, event_name
  ORDER BY day, events DESC

TYPE ENDPOINT

Your frontend calls /event_counts.json?org_id=acme&start_date=2026-05-01&end_date=2026-06-01. ClickHouse runs it in milliseconds. For build real-time apps patterns where product usage data drives in-product features, this endpoint pattern is how you get sub-50ms latency at the p99.

Event tracking at any scale

Tinybird is managed ClickHouse with a streaming Events API. Ingest millions of events per second, query in milliseconds.

Try Tinybird free

Skip the infra work. Deploy your first ClickHouse^®
project now.

Blog

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse event tracking

Event schema design

Session analysis

Funnel analysis

Retention analysis

Active user counts (DAU/WAU/MAU)

Pre-aggregated metrics table

Real-time alerting with materialized views

Real-time event API with Tinybird

Event tracking at any scale

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

Blog

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse event tracking

Event schema design

Session analysis

Funnel analysis

Retention analysis

Active user counts (DAU/WAU/MAU)

Pre-aggregated metrics table

Real-time alerting with materialized views

Real-time event API with Tinybird

Event tracking at any scale

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.