Ecommerce analytics has a pattern that doesn't fit traditional databases well: massive event volumes (every page view, add-to-cart, product impression), combined with a need to query that data interactively, in real time, for both internal dashboards and customer-facing features like order tracking and personalized recommendations.
Postgres and MySQL handle transactions well. They don't handle analytical queries over 500 million events. ClickHouse® does both ingestion at scale and sub-second analytical queries over that data, which makes it a natural fit for ecommerce data infrastructure.
This post covers the schema patterns, behavioral analysis queries, real-time inventory tracking, customer segmentation, and in-product analytics that matter most for ecommerce workloads.
Core schema: events and orders
The most important design decision is treating behavioral data and transactional data separately. Orders belong in an OLTP database (Postgres, MySQL). Behavioral events, product impressions, funnel analytics, and operational dashboards belong in ClickHouse.
The behavioral event table:
CREATE TABLE ecommerce_events
(
event_time DateTime64(3),
event_type LowCardinality(String), -- 'view', 'add_cart', 'checkout', 'purchase'
session_id String,
user_id Nullable(String), -- null for anonymous visitors
product_id String,
category LowCardinality(String),
price Nullable(Decimal(12, 2)),
currency LowCardinality(String),
source LowCardinality(String), -- 'organic', 'paid', 'email', 'direct'
device_type LowCardinality(String), -- 'desktop', 'mobile', 'tablet'
country LowCardinality(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (event_type, toStartOfHour(event_time), user_id, product_id);
LowCardinality on event type, category, source, device, and country is important here. These columns appear in almost every WHERE clause and GROUP BY. Dictionary encoding makes them 2-4x faster to filter and aggregate with significantly better compression.
The order summary table synced from your OLTP database via change data capture:
CREATE TABLE orders
(
order_id String,
user_id String,
status LowCardinality(String),
total_amount Decimal(12, 2),
item_count UInt16,
created_at DateTime,
updated_at DateTime,
_version UInt64,
_deleted UInt8 DEFAULT 0
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY order_id;
This handles CDC updates from your OLTP store. Every status change (pending, confirmed, shipped, delivered, returned) arrives as a new insert with a higher _version. Queries use the argMax pattern to get current state without waiting for background deduplication.
Funnel analysis
Conversion funnel analysis is the most common ecommerce analytics query. How many visitors go from product view to add-to-cart to checkout to purchase?
SELECT
source,
device_type,
uniqIf(session_id, event_type = 'view') AS views,
uniqIf(session_id, event_type = 'add_cart') AS add_to_cart,
uniqIf(session_id, event_type = 'checkout') AS checkouts,
uniqIf(session_id, event_type = 'purchase') AS purchases,
round(
uniqIf(session_id, event_type = 'purchase') /
uniqIf(session_id, event_type = 'view') * 100, 2
) AS overall_cvr
FROM ecommerce_events
WHERE event_time >= today() - 7
GROUP BY source, device_type
ORDER BY views DESC;
For ordered funnel sequences within a session (did this visitor complete step 1, then step 2, then step 3 in order), ClickHouse's windowFunnel handles it without a separate correlation engine:
SELECT
product_id,
countIf(funnel_step >= 1) AS reached_view,
countIf(funnel_step >= 2) AS reached_add_cart,
countIf(funnel_step >= 3) AS reached_purchase
FROM (
SELECT
product_id,
windowFunnel(3600)(
event_time,
event_type = 'view',
event_type = 'add_cart',
event_type = 'purchase'
) AS funnel_step
FROM ecommerce_events
WHERE event_time >= today() - 30
GROUP BY product_id, session_id
)
GROUP BY product_id
ORDER BY reached_view DESC
LIMIT 50;
This shows which products have the worst drop-off between add-to-cart and purchase. For a deeper look at improving ecommerce funnel analytics at scale, build real-time ecommerce analytics APIs with Kafka walks through the full ingestion-to-query pipeline.
Real-time product performance
For merchandising teams who need to see what's selling right now:
SELECT
product_id,
count() AS page_views,
uniq(session_id) AS unique_visitors,
countIf(event_type = 'add_cart') AS add_to_cart,
countIf(event_type = 'purchase') AS purchases,
round(
countIf(event_type = 'purchase') /
nullIf(countIf(event_type = 'view'), 0) * 100, 2
) AS view_to_purchase_rate,
round(sumIf(price, event_type = 'purchase'), 2) AS revenue
FROM ecommerce_events
WHERE event_time >= now() - INTERVAL 1 HOUR
GROUP BY product_id
ORDER BY revenue DESC
LIMIT 20;
This query runs against the last hour of raw events and returns in milliseconds. For a merchandising dashboard that refreshes every 30 seconds, this is fast enough to query live without pre-aggregation. For faster dashboards, pre-aggregate into a daily rollup for time windows beyond a few hours:
CREATE MATERIALIZED VIEW product_daily_stats_mv
TO product_daily_stats AS
SELECT
toDate(event_time) AS day,
product_id,
category,
countIf(event_type = 'view') AS views,
uniqIf(session_id, event_type = 'view') AS unique_views,
countIf(event_type = 'add_cart') AS add_to_cart,
countIf(event_type = 'purchase') AS purchases,
sumIf(price, event_type = 'purchase') AS revenue
FROM ecommerce_events
GROUP BY day, product_id, category;
Daily performance dashboards hit the rollup table. The same product page viewed 10 million times in a month results in a single row per product per day in the rollup.
Customer behavior and segmentation
For RFM (recency, frequency, monetary) segmentation, which feeds targeting lists, churn models, and re-engagement campaigns:
SELECT
user_id,
dateDiff('day', max(event_time), now()) AS recency_days,
countIf(event_type = 'purchase') AS frequency,
round(sumIf(price, event_type = 'purchase'), 2) AS monetary,
multiIf(
dateDiff('day', max(event_time), now()) <= 7, 'active',
dateDiff('day', max(event_time), now()) <= 30, 'recent',
dateDiff('day', max(event_time), now()) <= 90, 'at_risk',
'churned'
) AS recency_segment
FROM ecommerce_events
WHERE user_id IS NOT NULL
GROUP BY user_id
ORDER BY monetary DESC;
For repeat purchase rate by acquisition source, showing which channels produce the most loyal customers:
SELECT
source,
uniq(user_id) AS total_customers,
uniqIf(user_id, purchase_count >= 2) AS repeat_buyers,
round(
uniqIf(user_id, purchase_count >= 2) /
uniq(user_id) * 100, 1
) AS repeat_rate_pct
FROM (
SELECT
user_id,
argMax(source, event_time) AS source,
countIf(event_type = 'purchase') AS purchase_count
FROM ecommerce_events
WHERE user_id IS NOT NULL
GROUP BY user_id
)
GROUP BY source
ORDER BY total_customers DESC;
For real-time personalization, segment membership can be computed on the fly at request time rather than pre-batching into a separate CRM system. A parameterized SQL endpoint returns the segment for a given user in milliseconds, which is fast enough to use as a targeting signal in your product.
A/B testing and experiments
Ecommerce teams run constant experiments: new checkout flows, pricing changes, recommendation algorithms. ClickHouse handles the statistical aggregation for these experiments natively:
SELECT
variant,
uniq(user_id) AS users,
countIf(event_type = 'purchase') AS conversions,
round(
countIf(event_type = 'purchase') /
uniq(user_id) * 100, 3
) AS cvr_pct,
round(sumIf(price, event_type = 'purchase'), 2) AS total_revenue,
round(
sumIf(price, event_type = 'purchase') /
uniq(user_id), 2
) AS revenue_per_user
FROM ecommerce_events
WHERE experiment_id = 'checkout_v2'
AND event_time >= '2026-05-15'
GROUP BY variant
ORDER BY variant;
For statistical significance on conversion rates, add the normal approximation confidence interval:
SELECT
variant,
cvr,
1.96 * sqrt(cvr * (1 - cvr) / users) AS margin_of_error
FROM (
SELECT
variant,
uniq(user_id) AS users,
countIf(event_type = 'purchase') / uniq(user_id) AS cvr
FROM ecommerce_events
WHERE experiment_id = 'checkout_v2'
GROUP BY variant
)
ORDER BY variant;
Running this directly in ClickHouse over full experiment data is faster than exporting a sample to Python, and avoids the sampling bias that comes from only analyzing a subset of the experiment.
Order status tracking for real-time customer dashboards
For in-product dashboards that customers see, such as an order history page that shows live statuses, ClickHouse with parameterized SQL endpoints handles the per-customer lookup pattern without per-user licensing:
-- order_history.pipe
NODE status
SQL >
%
SELECT
order_id,
argMax(status, updated_at) AS current_status,
argMax(total_amount, updated_at) AS total_amount,
min(created_at) AS placed_at,
max(updated_at) AS last_updated
FROM orders
WHERE user_id = {{ String(user_id, required=True) }}
AND _deleted = 0
GROUP BY order_id
ORDER BY placed_at DESC
LIMIT {{ Int32(limit, 20) }}
TYPE ENDPOINT
Your storefront calls /order_history.json?user_id=abc123&limit=10. No intermediary service, no per-user query routing, no BI tool license. The SQL enforces that each user only sees their own data.
Inventory and catalog analytics
ClickHouse handles inventory as a streaming event log, treating every sale and restock as an event rather than updating a mutable row. This is real-time inventory management without the complexity of lambda architectures:
-- Real-time low stock alerts
SELECT
product_id,
sumIf(quantity_delta, event_type = 'restock') AS total_restocked,
abs(sumIf(quantity_delta, event_type = 'sale')) AS total_sold,
sumIf(quantity_delta, event_type = 'restock') -
abs(sumIf(quantity_delta, event_type = 'sale')) AS current_stock
FROM inventory_events
WHERE product_id IN (SELECT product_id FROM product_catalog WHERE active = 1)
GROUP BY product_id
HAVING current_stock < 10
ORDER BY current_stock ASC;
For high-velocity products (flash sales, hot items), materialize current stock in a ReplacingMergeTree table and query that instead, using the event log only for historical audit trails.
Trend detection for restocking decisions:
SELECT
product_id,
toStartOfWeek(event_time) AS week,
abs(sumIf(quantity_delta, event_type = 'sale')) AS units_sold
FROM inventory_events
WHERE event_time >= today() - 90
AND event_type = 'sale'
GROUP BY product_id, week
ORDER BY product_id, week;
Combine with a ranking window function to surface which products are accelerating in sales velocity over the past 4 weeks.
Tinybird for ecommerce
Tinybird is managed ClickHouse with streaming ingestion and HTTP endpoints. The Events API ingests behavioral events directly from your frontend at 1,000+ requests per second. SQL Pipes become the API layer your storefront and internal dashboards call.
For ecommerce teams, the typical setup is: ingest events via the Events API, sync order state via CDC from Postgres, define Pipes for the queries above, and expose them as JSON endpoints. The pipeline from raw event to customer-facing dashboard runs without a separate analytics service.
