Adtech has one of the most demanding data profiles in software. Bid requests arrive at millions per second. Impressions and clicks need deduplication across distributed collectors. Campaign reporting needs to be fresh enough that traders can act on it. And all of this has to run while keeping p99 query latency low enough for dashboards to feel live.
Postgres doesn't survive contact with this workload. Most teams end up running Kafka for ingestion, Spark for aggregation, and a custom reporting layer on top. It works. It also requires a dedicated team to operate.
ClickHouse® is the database that production adtech platforms actually use. InMobi migrated their ad reporting APIs to ClickHouse and cut P99 latency from over 60 seconds to under 3 seconds at peak hours, while reducing monthly infrastructure costs by 80% (from $40K to $8K). The platform now handles 400,000+ queries per day across 10TB of uncompressed data.
This post covers the schema patterns, ingestion architecture, and SQL that make those results possible, and how Tinybird puts managed ClickHouse with a built-in API layer in front of all of it.
Why ClickHouse fits adtech
Real-time bidding systems process billions of auction events daily. Each auction produces bid requests, responses, wins, and losses, all within milliseconds. The reporting layer has to aggregate all of that into campaign metrics fast enough to be actionable.
The effective adtech analytics stack follows a clear pattern:
Kafka → ClickHouse Raw Events → Materialized Views → Aggregated Tables → Reporting APIs
ClickHouse fits this pattern because of its columnar storage, vectorized execution, and high ingestion throughput. It scans only the columns a query needs, not entire rows, which means aggregating campaign spend over billions of impression rows reads only the spend column and the partition matching the time filter.
Impression and click tracking schema
The core event table needs to handle high insert volume and support fast time-range aggregations for reporting:
CREATE TABLE ad_events
(
event_time DateTime,
event_type LowCardinality(String), -- 'impression', 'click', 'conversion'
ad_id String,
campaign_id String,
publisher_id String,
user_id String,
region LowCardinality(String),
device_type LowCardinality(String),
bid_price Nullable(Float32),
clearing_price Nullable(Float32),
impression_id String,
won UInt8 -- 1 = win, 0 = loss
)
ENGINE = ReplacingMergeTree(event_time)
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (campaign_id, ad_id, event_time, impression_id);
ReplacingMergeTree handles deduplication. When the same impression_id arrives more than once from distributed collectors, ClickHouse keeps the row with the latest event_time during background merges. For exact deduplication at query time, add FINAL:
SELECT count() FROM ad_events FINAL
WHERE campaign_id = 'camp_abc'
AND event_type = 'impression'
AND event_time >= now() - INTERVAL 1 HOUR;
FINAL is slower than omitting it. For dashboards where a small overcounting margin is acceptable, drop it and accept eventual consistency on deduplication.
Pre-aggregated campaign metrics rollup
Counting raw impressions and clicks on every dashboard load doesn't scale at billions of rows. Build a materialized rollup that pre-aggregates by hour:
CREATE TABLE campaign_metrics_1h
(
hour DateTime,
campaign_id String,
ad_id String,
region LowCardinality(String),
device_type LowCardinality(String),
impressions UInt64,
clicks UInt64,
wins UInt64,
spend AggregateFunction(sum, Float32),
unique_users AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, campaign_id, ad_id, region, device_type);
Unlike scheduled jobs, materialized views in ClickHouse act as insert triggers. When data arrives in the source table, the view transforms and inserts it into the target table automatically:
CREATE MATERIALIZED VIEW campaign_metrics_mv
TO campaign_metrics_1h AS
SELECT
toStartOfHour(event_time) AS hour,
campaign_id,
ad_id,
region,
device_type,
countIf(event_type = 'impression') AS impressions,
countIf(event_type = 'click') AS clicks,
sum(won) AS wins,
sumState(clearing_price) AS spend,
uniqState(user_id) AS unique_users
FROM ad_events
GROUP BY hour, campaign_id, ad_id, region, device_type;
Queries against campaign_metrics_1h scan pre-aggregated rows instead of raw events. A dashboard querying 7 days reads at most 168 rows per campaign (one per hour) instead of millions of raw impressions.
CTR, win rate, and CPC in SQL
Standard adtech metrics against the pre-aggregated rollup:
SELECT
campaign_id,
sum(impressions) AS total_impressions,
sum(clicks) AS total_clicks,
sum(wins) AS total_wins,
round(sum(clicks) /
nullIf(sum(impressions), 0) * 100, 2) AS ctr_pct,
round(sum(wins) /
nullIf(sum(impressions), 0) * 100, 2) AS win_rate_pct,
sumMerge(spend) AS total_spend,
round(sumMerge(spend) /
nullIf(sum(clicks), 0), 4) AS cpc,
uniqMerge(unique_users) AS reach
FROM campaign_metrics_1h
WHERE hour >= now() - INTERVAL 7 DAY
GROUP BY campaign_id
ORDER BY total_impressions DESC
LIMIT 50;
sumMerge and uniqMerge finalize the pre-computed aggregate states from AggregatingMergeTree. Counting distinct users over a week re-merges HyperLogLog sketches, cheap regardless of the time range.
Real-time budget pacing
Budget pacing, checking whether campaigns are spending at the expected rate given the time of day, is one of the harder real-time reporting problems in adtech. In ClickHouse it's a straightforward query against the wins rollup:
SELECT
campaign_id,
sumMerge(spend) AS spend_so_far,
toHour(now()) / 24.0 AS day_fraction,
round(sumMerge(spend) / nullIf(toHour(now()) / 24.0, 0), 2) AS projected_daily_spend
FROM campaign_metrics_1h
WHERE hour >= toStartOfDay(now())
GROUP BY campaign_id
ORDER BY projected_daily_spend DESC;
A campaign with projected_daily_spend far above its budget needs throttling. One far below is underpacing. This query runs in milliseconds, fast enough to call on every trader dashboard refresh.
Bid stream ingestion via Kafka
For production bid streams, the standard pattern is Kafka → ClickHouse Kafka engine → MergeTree storage. Reading from a Kafka engine table is destructive (offsets are committed on read), so you always pair it with a materialized view that redirects consumed messages into a persistent MergeTree table. This three-part pattern (Kafka engine, materialized view, MergeTree storage) is the standard ClickHouse streaming ingestion architecture:
-- 1. Kafka engine table (queue, not storage)
CREATE TABLE bid_events_kafka
(
event_time DateTime,
auction_id String,
campaign_id String,
bid_price Float32,
won UInt8,
region String
)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'bid-events',
kafka_group_name = 'ch-adtech-consumer',
kafka_format = 'JSONEachRow';
-- 2. MergeTree destination (real storage)
CREATE TABLE bid_events (...) ENGINE = MergeTree ...;
-- 3. Materialized view connects the two
CREATE MATERIALIZED VIEW bid_events_mv TO bid_events AS
SELECT * FROM bid_events_kafka;
Once the view exists, ClickHouse begins consuming from Kafka automatically. No application-side batching code required.
If you're using Tinybird, the Kafka connector replaces all three of these steps, you connect your Kafka topic in the workspace UI and Tinybird manages the consumer, the storage engine, and the offset tracking.
Frequency capping
Frequency capping requires knowing how many times a user has seen a specific ad in the last N hours. On the raw event table, it's a narrow index scan, fast because the sorting key puts campaign_id and ad_id near the filter:
SELECT count() AS impressions_last_24h
FROM ad_events
WHERE user_id = {user_id:String}
AND ad_id = {ad_id:String}
AND event_type = 'impression'
AND event_time >= now() - INTERVAL 24 HOUR;
For frequency data that needs to be served at bid latency (under 10ms), pre-aggregate into a per-user summary table updated continuously via materialized view and serve from that.
Tinybird for adtech analytics
Running ClickHouse in production means managing clusters, replication, backups, and schema migrations. For most adtech teams, that's not the core competency. Self-hosted ClickHouse cost adds up fast once you factor in engineering time.
Tinybird is managed ClickHouse with a serverless API layer. Define schemas as .datasource files, write transformations as SQL Pipes, and deploy with tb deploy. The Events API handles ingestion at 1K+ req/sec without batching overhead. Each Pipe becomes an HTTP endpoint your reporting layer or trading desk can call directly.
# Ingest bid events
curl -X POST "https://api.tinybird.co/v0/events?name=ad_events" \
-H "Authorization: Bearer $TB_TOKEN" \
-d '{"event_time":"2026-05-25T10:00:00Z","event_type":"impression","campaign_id":"camp_abc","won":1,"clearing_price":2.40,"region":"us"}'
The campaign report as a parameterized endpoint:
-- campaign_report.pipe
NODE report
SQL >
%
SELECT
campaign_id,
sum(impressions) AS impressions,
sum(clicks) AS clicks,
round(sum(clicks) / nullIf(sum(impressions), 0) * 100, 2) AS ctr_pct,
round(sum(wins) / nullIf(sum(impressions), 0) * 100, 2) AS win_rate_pct,
sumMerge(spend) AS spend
FROM campaign_metrics_1h
WHERE hour >= now() - INTERVAL {{ Int32(hours, 24) }} HOUR
AND campaign_id = {{ String(campaign_id, '') }}
AND region = {{ String(region, '') }}
GROUP BY campaign_id
ORDER BY impressions DESC
TYPE ENDPOINT
Your trading desk hits campaign_report.json?hours=1 for the last hour, or ?hours=24®ion=us for US performance yesterday. Filters are query parameters. ClickHouse runs underneath. You don't manage any of it.
Ship your adtech analytics layerTinybird is managed ClickHouse® with ingestion, deduplication, and HTTP endpoints. No cluster to run.Try Tinybird free