---
title: "ClickHouse ecommerce data"
excerpt: "Ecommerce generates event data at a volume and velocity that traditional databases struggle to query in real time. ClickHouse handles order tracking, inventory analytics, real-time merchandising, and customer behavior analysis at the scale modern ecommerce produces."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-06-03 00:00:00"
publishedOn: "2026-06-03 00:00:00"
updatedOn: "2026-06-03 00:00:00"
status: "published"
---

Ecommerce analytics has a pattern that doesn't fit traditional databases well: massive event volumes (every page view, add-to-cart, product impression), combined with a need to query that data interactively, in real time, for both internal dashboards and customer-facing features like order tracking and personalized recommendations.

Postgres and MySQL handle transactions well. They don't handle analytical queries over 500 million events. ClickHouse® does both ingestion at scale and sub-second analytical queries over that data, which makes it a natural fit for ecommerce data infrastructure.

This post covers the schema patterns, behavioral analysis queries, real-time inventory tracking, customer segmentation, and in-product analytics that matter most for ecommerce workloads.

## Core schema: events and orders

The most important design decision is treating behavioral data and transactional data separately. Orders belong in an OLTP database (Postgres, MySQL). Behavioral events, product impressions, funnel analytics, and operational dashboards belong in ClickHouse.

The behavioral event table:

```sql
CREATE TABLE ecommerce_events
(
    event_time      DateTime64(3),
    event_type      LowCardinality(String),  -- 'view', 'add_cart', 'checkout', 'purchase'
    session_id      String,
    user_id         Nullable(String),         -- null for anonymous visitors
    product_id      String,
    category        LowCardinality(String),
    price           Nullable(Decimal(12, 2)),
    currency        LowCardinality(String),
    source          LowCardinality(String),   -- 'organic', 'paid', 'email', 'direct'
    device_type     LowCardinality(String),   -- 'desktop', 'mobile', 'tablet'
    country         LowCardinality(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, toStartOfHour(event_time), user_id, product_id);
```

`LowCardinality` on event type, category, source, device, and country is important here. These columns appear in almost every WHERE clause and GROUP BY. Dictionary encoding makes them 2-4x faster to filter and aggregate with significantly better compression.

The order summary table synced from your OLTP database via [change data capture](https://www.tinybird.co/blog/change-data-capture-tools):

```sql
CREATE TABLE orders
(
    order_id        String,
    user_id         String,
    status          LowCardinality(String),
    total_amount    Decimal(12, 2),
    item_count      UInt16,
    created_at      DateTime,
    updated_at      DateTime,
    _version        UInt64,
    _deleted        UInt8 DEFAULT 0
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY order_id;
```

This handles CDC updates from your OLTP store. Every status change (pending, confirmed, shipped, delivered, returned) arrives as a new insert with a higher `_version`. Queries use the `argMax` pattern to get current state without waiting for background deduplication.

## Funnel analysis

Conversion funnel analysis is the most common ecommerce analytics query. How many visitors go from product view to add-to-cart to checkout to purchase?

```sql
SELECT
    source,
    device_type,
    uniqIf(session_id, event_type = 'view')      AS views,
    uniqIf(session_id, event_type = 'add_cart')  AS add_to_cart,
    uniqIf(session_id, event_type = 'checkout')  AS checkouts,
    uniqIf(session_id, event_type = 'purchase')  AS purchases,
    round(
        uniqIf(session_id, event_type = 'purchase') /
        uniqIf(session_id, event_type = 'view') * 100, 2
    )                                            AS overall_cvr
FROM ecommerce_events
WHERE event_time >= today() - 7
GROUP BY source, device_type
ORDER BY views DESC;
```

For ordered funnel sequences within a session (did this visitor complete step 1, then step 2, then step 3 in order), ClickHouse's `windowFunnel` handles it without a separate correlation engine:

```sql
SELECT
    product_id,
    countIf(funnel_step >= 1)  AS reached_view,
    countIf(funnel_step >= 2)  AS reached_add_cart,
    countIf(funnel_step >= 3)  AS reached_purchase
FROM (
    SELECT
        product_id,
        windowFunnel(3600)(
            event_time,
            event_type = 'view',
            event_type = 'add_cart',
            event_type = 'purchase'
        ) AS funnel_step
    FROM ecommerce_events
    WHERE event_time >= today() - 30
    GROUP BY product_id, session_id
)
GROUP BY product_id
ORDER BY reached_view DESC
LIMIT 50;
```

This shows which products have the worst drop-off between add-to-cart and purchase. For a deeper look at improving ecommerce funnel analytics at scale, [build real-time ecommerce analytics APIs with Kafka](https://www.tinybird.co/blog/build-real-time-ecommerce-analytics-api-kafka) walks through the full ingestion-to-query pipeline.

## Real-time product performance

For merchandising teams who need to see what's selling right now:

```sql
SELECT
    product_id,
    count()                                         AS page_views,
    uniq(session_id)                                AS unique_visitors,
    countIf(event_type = 'add_cart')                AS add_to_cart,
    countIf(event_type = 'purchase')                AS purchases,
    round(
        countIf(event_type = 'purchase') /
        nullIf(countIf(event_type = 'view'), 0) * 100, 2
    )                                               AS view_to_purchase_rate,
    round(sumIf(price, event_type = 'purchase'), 2) AS revenue
FROM ecommerce_events
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY product_id
ORDER BY revenue DESC
LIMIT 20;
```

This query runs against the last hour of raw events and returns in milliseconds. For a merchandising dashboard that refreshes every 30 seconds, this is fast enough to query live without pre-aggregation. For [faster dashboards](https://www.tinybird.co/blog/7-tips-to-make-your-dashboards-faster), pre-aggregate into a daily rollup for time windows beyond a few hours:

```sql
CREATE MATERIALIZED VIEW product_daily_stats_mv
TO product_daily_stats AS
SELECT
    toDate(event_time)                          AS day,
    product_id,
    category,
    countIf(event_type = 'view')                AS views,
    uniqIf(session_id, event_type = 'view')     AS unique_views,
    countIf(event_type = 'add_cart')            AS add_to_cart,
    countIf(event_type = 'purchase')            AS purchases,
    sumIf(price, event_type = 'purchase')       AS revenue
FROM ecommerce_events
GROUP BY day, product_id, category;
```

Daily performance dashboards hit the rollup table. The same product page viewed 10 million times in a month results in a single row per product per day in the rollup.

## Customer behavior and segmentation

For RFM (recency, frequency, monetary) segmentation, which feeds targeting lists, churn models, and re-engagement campaigns:

```sql
SELECT
    user_id,
    dateDiff('day', max(event_time), now())                     AS recency_days,
    countIf(event_type = 'purchase')                            AS frequency,
    round(sumIf(price, event_type = 'purchase'), 2)             AS monetary,
    multiIf(
        dateDiff('day', max(event_time), now()) <= 7,  'active',
        dateDiff('day', max(event_time), now()) <= 30, 'recent',
        dateDiff('day', max(event_time), now()) <= 90, 'at_risk',
        'churned'
    ) AS recency_segment
FROM ecommerce_events
WHERE user_id IS NOT NULL
GROUP BY user_id
ORDER BY monetary DESC;
```

For repeat purchase rate by acquisition source, showing which channels produce the most loyal customers:

```sql
SELECT
    source,
    uniq(user_id)                               AS total_customers,
    uniqIf(user_id, purchase_count >= 2)        AS repeat_buyers,
    round(
        uniqIf(user_id, purchase_count >= 2) /
        uniq(user_id) * 100, 1
    )                                           AS repeat_rate_pct
FROM (
    SELECT
        user_id,
        argMax(source, event_time)          AS source,
        countIf(event_type = 'purchase')    AS purchase_count
    FROM ecommerce_events
    WHERE user_id IS NOT NULL
    GROUP BY user_id
)
GROUP BY source
ORDER BY total_customers DESC;
```

For [real-time personalization](https://www.tinybird.co/blog/real-time-personalization), segment membership can be computed on the fly at request time rather than pre-batching into a separate CRM system. A parameterized SQL endpoint returns the segment for a given user in milliseconds, which is fast enough to use as a targeting signal in your product.

## A/B testing and experiments

Ecommerce teams run constant experiments: new checkout flows, pricing changes, recommendation algorithms. ClickHouse handles the statistical aggregation for these experiments natively:

```sql
SELECT
    variant,
    uniq(user_id)                                           AS users,
    countIf(event_type = 'purchase')                        AS conversions,
    round(
        countIf(event_type = 'purchase') /
        uniq(user_id) * 100, 3
    )                                                       AS cvr_pct,
    round(sumIf(price, event_type = 'purchase'), 2)         AS total_revenue,
    round(
        sumIf(price, event_type = 'purchase') /
        uniq(user_id), 2
    )                                                       AS revenue_per_user
FROM ecommerce_events
WHERE experiment_id = 'checkout_v2'
  AND event_time >= '2026-05-15'
GROUP BY variant
ORDER BY variant;
```

For statistical significance on conversion rates, add the normal approximation confidence interval:

```sql
SELECT
    variant,
    cvr,
    1.96 * sqrt(cvr * (1 - cvr) / users) AS margin_of_error
FROM (
    SELECT
        variant,
        uniq(user_id)                                        AS users,
        countIf(event_type = 'purchase') / uniq(user_id)    AS cvr
    FROM ecommerce_events
    WHERE experiment_id = 'checkout_v2'
    GROUP BY variant
)
ORDER BY variant;
```

Running this directly in ClickHouse over full experiment data is faster than exporting a sample to Python, and avoids the sampling bias that comes from only analyzing a subset of the experiment.

## Order status tracking for real-time customer dashboards

For [in-product dashboards](https://www.tinybird.co/blog/inproduct-dashboards) that customers see, such as an order history page that shows live statuses, ClickHouse with parameterized SQL endpoints handles the per-customer lookup pattern without per-user licensing:

```sql
-- order_history.pipe
NODE status
SQL >
  %
  SELECT
    order_id,
    argMax(status, updated_at)       AS current_status,
    argMax(total_amount, updated_at) AS total_amount,
    min(created_at)                  AS placed_at,
    max(updated_at)                  AS last_updated
  FROM orders
  WHERE user_id = {{ String(user_id, required=True) }}
    AND _deleted = 0
  GROUP BY order_id
  ORDER BY placed_at DESC
  LIMIT {{ Int32(limit, 20) }}

TYPE ENDPOINT
```

Your storefront calls `/order_history.json?user_id=abc123&limit=10`. No intermediary service, no per-user query routing, no BI tool license. The SQL enforces that each user only sees their own data.

## Inventory and catalog analytics

ClickHouse handles inventory as a streaming event log, treating every sale and restock as an event rather than updating a mutable row. This is [real-time inventory management](https://www.tinybird.co/blog/real-time-inventory-management-with-lambda-architecture) without the complexity of lambda architectures:

```sql
-- Real-time low stock alerts
SELECT
    product_id,
    sumIf(quantity_delta, event_type = 'restock')        AS total_restocked,
    abs(sumIf(quantity_delta, event_type = 'sale'))       AS total_sold,
    sumIf(quantity_delta, event_type = 'restock') -
    abs(sumIf(quantity_delta, event_type = 'sale'))       AS current_stock
FROM inventory_events
WHERE product_id IN (SELECT product_id FROM product_catalog WHERE active = 1)
GROUP BY product_id
HAVING current_stock < 10
ORDER BY current_stock ASC;
```

For high-velocity products (flash sales, hot items), materialize current stock in a `ReplacingMergeTree` table and query that instead, using the event log only for historical audit trails.

Trend detection for restocking decisions:

```sql
SELECT
    product_id,
    toStartOfWeek(event_time)           AS week,
    abs(sumIf(quantity_delta, event_type = 'sale')) AS units_sold
FROM inventory_events
WHERE event_time >= today() - 90
  AND event_type = 'sale'
GROUP BY product_id, week
ORDER BY product_id, week;
```

Combine with a ranking window function to surface which products are accelerating in sales velocity over the past 4 weeks.

## Tinybird for ecommerce

Tinybird is managed ClickHouse with streaming ingestion and HTTP endpoints. The Events API ingests behavioral events directly from your frontend at 1,000+ requests per second. SQL Pipes become the API layer your storefront and internal dashboards call.

For ecommerce teams, the typical setup is: ingest events via the Events API, sync order state via CDC from Postgres, define Pipes for the queries above, and expose them as JSON endpoints. The pipeline from raw event to customer-facing dashboard runs without a separate analytics service.

{% cta
  title="Real-time ecommerce analytics without the infrastructure"
  text="Tinybird is managed ClickHouse with streaming ingestion and SQL endpoints. Connect your storefront in minutes."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
