ClickHouse for adtech platforms

Adtech has one of the most demanding data profiles in software. Bid requests arrive at millions per second. Impressions and clicks need deduplication across distributed collectors. Campaign reporting needs to be fresh enough that traders can act on it. And all of this has to run while keeping p99 query latency low enough for dashboards to feel live.

Postgres doesn't survive contact with this workload. Most teams end up running Kafka for ingestion, Spark for aggregation, and a custom reporting layer on top. It works. It also requires a dedicated team to operate.

ClickHouse® is the database that production adtech platforms actually use. InMobi migrated their ad reporting APIs to ClickHouse and cut P99 latency from over 60 seconds to under 3 seconds at peak hours, while reducing monthly infrastructure costs by 80% (from $40K to $8K). The platform now handles 400,000+ queries per day across 10TB of uncompressed data.

This post covers the schema patterns, ingestion architecture, and SQL that make those results possible, and how Tinybird puts managed ClickHouse with a built-in API layer in front of all of it.

Why ClickHouse fits adtech

Real-time bidding systems process billions of auction events daily. Each auction produces bid requests, responses, wins, and losses, all within milliseconds. The reporting layer has to aggregate all of that into campaign metrics fast enough to be actionable.

The effective adtech analytics stack follows a clear pattern:

Kafka → ClickHouse Raw Events → Materialized Views → Aggregated Tables → Reporting APIs

ClickHouse fits this pattern because of its columnar storage, vectorized execution, and high ingestion throughput. It scans only the columns a query needs, not entire rows, which means aggregating campaign spend over billions of impression rows reads only the spend column and the partition matching the time filter.

Impression and click tracking schema

The core event table needs to handle high insert volume and support fast time-range aggregations for reporting:

CREATE TABLE ad_events
(
    event_time      DateTime,
    event_type      LowCardinality(String),   -- 'impression', 'click', 'conversion'
    ad_id           String,
    campaign_id     String,
    publisher_id    String,
    user_id         String,
    region          LowCardinality(String),
    device_type     LowCardinality(String),
    bid_price       Nullable(Float32),
    clearing_price  Nullable(Float32),
    impression_id   String,
    won             UInt8                     -- 1 = win, 0 = loss
)
ENGINE = ReplacingMergeTree(event_time)
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (campaign_id, ad_id, event_time, impression_id);

ReplacingMergeTree handles deduplication. When the same impression_id arrives more than once from distributed collectors, ClickHouse keeps the row with the latest event_time during background merges. For exact deduplication at query time, add FINAL:

SELECT count() FROM ad_events FINAL
WHERE campaign_id = 'camp_abc'
  AND event_type = 'impression'
  AND event_time >= now() - INTERVAL 1 HOUR;

FINAL is slower than omitting it. For dashboards where a small overcounting margin is acceptable, drop it and accept eventual consistency on deduplication.

Pre-aggregated campaign metrics rollup

Counting raw impressions and clicks on every dashboard load doesn't scale at billions of rows. Build a materialized rollup that pre-aggregates by hour:

CREATE TABLE campaign_metrics_1h
(
    hour            DateTime,
    campaign_id     String,
    ad_id           String,
    region          LowCardinality(String),
    device_type     LowCardinality(String),
    impressions     UInt64,
    clicks          UInt64,
    wins            UInt64,
    spend           AggregateFunction(sum, Float32),
    unique_users    AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, campaign_id, ad_id, region, device_type);

Unlike scheduled jobs, materialized views in ClickHouse act as insert triggers. When data arrives in the source table, the view transforms and inserts it into the target table automatically:

CREATE MATERIALIZED VIEW campaign_metrics_mv
TO campaign_metrics_1h AS
SELECT
    toStartOfHour(event_time)              AS hour,
    campaign_id,
    ad_id,
    region,
    device_type,
    countIf(event_type = 'impression')     AS impressions,
    countIf(event_type = 'click')          AS clicks,
    sum(won)                               AS wins,
    sumState(clearing_price)               AS spend,
    uniqState(user_id)                     AS unique_users
FROM ad_events
GROUP BY hour, campaign_id, ad_id, region, device_type;

Queries against campaign_metrics_1h scan pre-aggregated rows instead of raw events. A dashboard querying 7 days reads at most 168 rows per campaign (one per hour) instead of millions of raw impressions.

CTR, win rate, and CPC in SQL

Standard adtech metrics against the pre-aggregated rollup:

SELECT
    campaign_id,
    sum(impressions)                                           AS total_impressions,
    sum(clicks)                                                AS total_clicks,
    sum(wins)                                                  AS total_wins,
    round(sum(clicks) /
          nullIf(sum(impressions), 0) * 100, 2)                AS ctr_pct,
    round(sum(wins) /
          nullIf(sum(impressions), 0) * 100, 2)                AS win_rate_pct,
    sumMerge(spend)                                            AS total_spend,
    round(sumMerge(spend) /
          nullIf(sum(clicks), 0), 4)                           AS cpc,
    uniqMerge(unique_users)                                    AS reach
FROM campaign_metrics_1h
WHERE hour >= now() - INTERVAL 7 DAY
GROUP BY campaign_id
ORDER BY total_impressions DESC
LIMIT 50;

sumMerge and uniqMerge finalize the pre-computed aggregate states from AggregatingMergeTree. Counting distinct users over a week re-merges HyperLogLog sketches, cheap regardless of the time range.

Real-time budget pacing

Budget pacing, checking whether campaigns are spending at the expected rate given the time of day, is one of the harder real-time reporting problems in adtech. In ClickHouse it's a straightforward query against the wins rollup:

SELECT
    campaign_id,
    sumMerge(spend)                                                 AS spend_so_far,
    toHour(now()) / 24.0                                            AS day_fraction,
    round(sumMerge(spend) / nullIf(toHour(now()) / 24.0, 0), 2)   AS projected_daily_spend
FROM campaign_metrics_1h
WHERE hour >= toStartOfDay(now())
GROUP BY campaign_id
ORDER BY projected_daily_spend DESC;

A campaign with projected_daily_spend far above its budget needs throttling. One far below is underpacing. This query runs in milliseconds, fast enough to call on every trader dashboard refresh.

Bid stream ingestion via Kafka

For production bid streams, the standard pattern is Kafka → ClickHouse Kafka engine → MergeTree storage. Reading from a Kafka engine table is destructive (offsets are committed on read), so you always pair it with a materialized view that redirects consumed messages into a persistent MergeTree table. This three-part pattern (Kafka engine, materialized view, MergeTree storage) is the standard ClickHouse streaming ingestion architecture:

-- 1. Kafka engine table (queue, not storage)
CREATE TABLE bid_events_kafka
(
    event_time     DateTime,
    auction_id     String,
    campaign_id    String,
    bid_price      Float32,
    won            UInt8,
    region         String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list  = 'bid-events',
    kafka_group_name  = 'ch-adtech-consumer',
    kafka_format      = 'JSONEachRow';

-- 2. MergeTree destination (real storage)
CREATE TABLE bid_events (...) ENGINE = MergeTree ...;

-- 3. Materialized view connects the two
CREATE MATERIALIZED VIEW bid_events_mv TO bid_events AS
SELECT * FROM bid_events_kafka;

Once the view exists, ClickHouse begins consuming from Kafka automatically. No application-side batching code required.

If you're using Tinybird, the Kafka connector replaces all three of these steps, you connect your Kafka topic in the workspace UI and Tinybird manages the consumer, the storage engine, and the offset tracking.

Frequency capping

Frequency capping requires knowing how many times a user has seen a specific ad in the last N hours. On the raw event table, it's a narrow index scan, fast because the sorting key puts campaign_id and ad_id near the filter:

SELECT count() AS impressions_last_24h
FROM ad_events
WHERE user_id      = {user_id:String}
  AND ad_id        = {ad_id:String}
  AND event_type   = 'impression'
  AND event_time   >= now() - INTERVAL 24 HOUR;

For frequency data that needs to be served at bid latency (under 10ms), pre-aggregate into a per-user summary table updated continuously via materialized view and serve from that.

Tinybird for adtech analytics

Running ClickHouse in production means managing clusters, replication, backups, and schema migrations. For most adtech teams, that's not the core competency. Self-hosted ClickHouse cost adds up fast once you factor in engineering time.

Tinybird is managed ClickHouse with a serverless API layer. Define schemas as .datasource files, write transformations as SQL Pipes, and deploy with tb deploy. The Events API handles ingestion at 1K+ req/sec without batching overhead. Each Pipe becomes an HTTP endpoint your reporting layer or trading desk can call directly.

# Ingest bid events
curl -X POST "https://api.tinybird.co/v0/events?name=ad_events" \
  -H "Authorization: Bearer $TB_TOKEN" \
  -d '{"event_time":"2026-05-25T10:00:00Z","event_type":"impression","campaign_id":"camp_abc","won":1,"clearing_price":2.40,"region":"us"}'

The campaign report as a parameterized endpoint:

-- campaign_report.pipe
NODE report
SQL >
  %
  SELECT
    campaign_id,
    sum(impressions)                                           AS impressions,
    sum(clicks)                                                AS clicks,
    round(sum(clicks) / nullIf(sum(impressions), 0) * 100, 2) AS ctr_pct,
    round(sum(wins)   / nullIf(sum(impressions), 0) * 100, 2) AS win_rate_pct,
    sumMerge(spend)                                            AS spend
  FROM campaign_metrics_1h
  WHERE hour >= now() - INTERVAL {{ Int32(hours, 24) }} HOUR
    AND campaign_id = {{ String(campaign_id, '') }}
    AND region = {{ String(region, '') }}
  GROUP BY campaign_id
  ORDER BY impressions DESC

TYPE ENDPOINT
Your trading desk hits campaign_report.json?hours=1 for the last hour, or ?hours=24&region=us for US performance yesterday. Filters are query parameters. ClickHouse runs underneath. You don't manage any of it.
Ship your adtech analytics layer
Tinybird is managed ClickHouse® with ingestion, deduplication, and HTTP endpoints. No cluster to run.
Try Tinybird free

Skip the infra work. Deploy your first ClickHouse^®
project now.

Blog

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse for adtech platforms

Why ClickHouse fits adtech

Impression and click tracking schema

Pre-aggregated campaign metrics rollup

CTR, win rate, and CPC in SQL

Real-time budget pacing

Bid stream ingestion via Kafka

Frequency capping

Tinybird for adtech analytics

Ship your adtech analytics layer

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

Blog

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse for adtech platforms

Why ClickHouse fits adtech

Impression and click tracking schema

Pre-aggregated campaign metrics rollup

CTR, win rate, and CPC in SQL

Real-time budget pacing

Bid stream ingestion via Kafka

Frequency capping

Tinybird for adtech analytics

Ship your adtech analytics layer

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.