---
title: "ClickHouse for adtech platforms"
excerpt: "Adtech runs on high-volume event streams, sub-second campaign reporting, and deduplication at scale. Here's how ClickHouse handles all three, with real production benchmarks and patterns used at scale."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-05-25 00:00:00"
publishedOn: "2026-05-25 00:00:00"
updatedOn: "2026-05-25 00:00:00"
status: "published"
---

Adtech has one of the most demanding data profiles in software. Bid requests arrive at millions per second. Impressions and clicks need deduplication across distributed collectors. Campaign reporting needs to be fresh enough that traders can act on it. And all of this has to run while keeping p99 query latency low enough for dashboards to feel live.

Postgres doesn't survive contact with this workload. Most teams end up running Kafka for ingestion, Spark for aggregation, and a custom reporting layer on top. It works. It also requires a dedicated team to operate.

ClickHouse® is the database that production adtech platforms actually use. InMobi migrated their ad reporting APIs to ClickHouse and [cut P99 latency from over 60 seconds to under 3 seconds](https://clickhouse.com/blog/inmobi) at peak hours, while reducing monthly infrastructure costs by 80% (from $40K to $8K). The platform now handles 400,000+ queries per day across 10TB of uncompressed data.

This post covers the schema patterns, ingestion architecture, and SQL that make those results possible, and how Tinybird puts managed ClickHouse with a built-in API layer in front of all of it.

## Why ClickHouse fits adtech

Real-time bidding systems process billions of auction events daily. Each auction produces bid requests, responses, wins, and losses, all within milliseconds. The reporting layer has to aggregate all of that into campaign metrics fast enough to be actionable.

The effective adtech analytics stack follows a clear pattern:

```text
Kafka → ClickHouse Raw Events → Materialized Views → Aggregated Tables → Reporting APIs
```

ClickHouse fits this pattern because of its columnar storage, vectorized execution, and high ingestion throughput. It scans only the columns a query needs, not entire rows, which means aggregating campaign spend over billions of impression rows reads only the spend column and the partition matching the time filter.

## Impression and click tracking schema

The core event table needs to handle high insert volume and support fast time-range aggregations for reporting:

```sql
CREATE TABLE ad_events
(
    event_time      DateTime,
    event_type      LowCardinality(String),   -- 'impression', 'click', 'conversion'
    ad_id           String,
    campaign_id     String,
    publisher_id    String,
    user_id         String,
    region          LowCardinality(String),
    device_type     LowCardinality(String),
    bid_price       Nullable(Float32),
    clearing_price  Nullable(Float32),
    impression_id   String,
    won             UInt8                     -- 1 = win, 0 = loss
)
ENGINE = ReplacingMergeTree(event_time)
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (campaign_id, ad_id, event_time, impression_id);
```

`ReplacingMergeTree` handles deduplication. When the same `impression_id` arrives more than once from distributed collectors, ClickHouse keeps the row with the latest `event_time` during background merges. For exact deduplication at query time, add `FINAL`:

```sql
SELECT count() FROM ad_events FINAL
WHERE campaign_id = 'camp_abc'
  AND event_type = 'impression'
  AND event_time >= now() - INTERVAL 1 HOUR;
```

`FINAL` is slower than omitting it. For dashboards where a small overcounting margin is acceptable, drop it and accept eventual consistency on deduplication.

## Pre-aggregated campaign metrics rollup

Counting raw impressions and clicks on every dashboard load doesn't scale at billions of rows. Build a materialized rollup that pre-aggregates by hour:

```sql
CREATE TABLE campaign_metrics_1h
(
    hour            DateTime,
    campaign_id     String,
    ad_id           String,
    region          LowCardinality(String),
    device_type     LowCardinality(String),
    impressions     UInt64,
    clicks          UInt64,
    wins            UInt64,
    spend           AggregateFunction(sum, Float32),
    unique_users    AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, campaign_id, ad_id, region, device_type);
```

Unlike scheduled jobs, materialized views in ClickHouse act as insert triggers. When data arrives in the source table, the view transforms and inserts it into the target table automatically:

```sql
CREATE MATERIALIZED VIEW campaign_metrics_mv
TO campaign_metrics_1h AS
SELECT
    toStartOfHour(event_time)              AS hour,
    campaign_id,
    ad_id,
    region,
    device_type,
    countIf(event_type = 'impression')     AS impressions,
    countIf(event_type = 'click')          AS clicks,
    sum(won)                               AS wins,
    sumState(clearing_price)               AS spend,
    uniqState(user_id)                     AS unique_users
FROM ad_events
GROUP BY hour, campaign_id, ad_id, region, device_type;
```

Queries against `campaign_metrics_1h` scan pre-aggregated rows instead of raw events. A dashboard querying 7 days reads at most 168 rows per campaign (one per hour) instead of millions of raw impressions.

## CTR, win rate, and CPC in SQL

Standard adtech metrics against the pre-aggregated rollup:

```sql
SELECT
    campaign_id,
    sum(impressions)                                           AS total_impressions,
    sum(clicks)                                                AS total_clicks,
    sum(wins)                                                  AS total_wins,
    round(sum(clicks) /
          nullIf(sum(impressions), 0) * 100, 2)                AS ctr_pct,
    round(sum(wins) /
          nullIf(sum(impressions), 0) * 100, 2)                AS win_rate_pct,
    sumMerge(spend)                                            AS total_spend,
    round(sumMerge(spend) /
          nullIf(sum(clicks), 0), 4)                           AS cpc,
    uniqMerge(unique_users)                                    AS reach
FROM campaign_metrics_1h
WHERE hour >= now() - INTERVAL 7 DAY
GROUP BY campaign_id
ORDER BY total_impressions DESC
LIMIT 50;
```

`sumMerge` and `uniqMerge` finalize the pre-computed aggregate states from `AggregatingMergeTree`. Counting distinct users over a week re-merges HyperLogLog sketches, cheap regardless of the time range.

## Real-time budget pacing

Budget pacing, checking whether campaigns are spending at the expected rate given the time of day, is one of the harder real-time reporting problems in adtech. In ClickHouse it's a straightforward query against the wins rollup:

```sql
SELECT
    campaign_id,
    sumMerge(spend)                                                 AS spend_so_far,
    toHour(now()) / 24.0                                            AS day_fraction,
    round(sumMerge(spend) / nullIf(toHour(now()) / 24.0, 0), 2)   AS projected_daily_spend
FROM campaign_metrics_1h
WHERE hour >= toStartOfDay(now())
GROUP BY campaign_id
ORDER BY projected_daily_spend DESC;
```

A campaign with `projected_daily_spend` far above its budget needs throttling. One far below is underpacing. This query runs in milliseconds, fast enough to call on every trader dashboard refresh.

## Bid stream ingestion via Kafka

For production bid streams, the standard pattern is Kafka → ClickHouse Kafka engine → MergeTree storage. Reading from a Kafka engine table is destructive (offsets are committed on read), so you always pair it with a materialized view that redirects consumed messages into a persistent MergeTree table. This three-part pattern (Kafka engine, materialized view, MergeTree storage) is the standard [ClickHouse streaming ingestion](https://www.tinybird.co/blog/clickhouse-streaming-analytics) architecture:

```sql
-- 1. Kafka engine table (queue, not storage)
CREATE TABLE bid_events_kafka
(
    event_time     DateTime,
    auction_id     String,
    campaign_id    String,
    bid_price      Float32,
    won            UInt8,
    region         String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list  = 'bid-events',
    kafka_group_name  = 'ch-adtech-consumer',
    kafka_format      = 'JSONEachRow';

-- 2. MergeTree destination (real storage)
CREATE TABLE bid_events (...) ENGINE = MergeTree ...;

-- 3. Materialized view connects the two
CREATE MATERIALIZED VIEW bid_events_mv TO bid_events AS
SELECT * FROM bid_events_kafka;
```

Once the view exists, ClickHouse begins consuming from Kafka automatically. No application-side batching code required.

If you're using Tinybird, the [Kafka connector](https://www.tinybird.co/blog/clickhouse-kafka-connector) replaces all three of these steps, you connect your Kafka topic in the workspace UI and Tinybird manages the consumer, the storage engine, and the offset tracking.

## Frequency capping

Frequency capping requires knowing how many times a user has seen a specific ad in the last N hours. On the raw event table, it's a narrow index scan, fast because the sorting key puts `campaign_id` and `ad_id` near the filter:

```sql
SELECT count() AS impressions_last_24h
FROM ad_events
WHERE user_id      = {user_id:String}
  AND ad_id        = {ad_id:String}
  AND event_type   = 'impression'
  AND event_time   >= now() - INTERVAL 24 HOUR;
```

For frequency data that needs to be served at bid latency (under 10ms), pre-aggregate into a per-user summary table updated continuously via materialized view and serve from that.

## Tinybird for adtech analytics

Running ClickHouse in production means managing clusters, replication, backups, and schema migrations. For most adtech teams, that's not the core competency. [Self-hosted ClickHouse cost](https://www.tinybird.co/blog/self-hosted-clickhouse-cost) adds up fast once you factor in engineering time.

Tinybird is managed ClickHouse with a serverless API layer. Define schemas as `.datasource` files, write transformations as SQL Pipes, and deploy with `tb deploy`. The Events API handles ingestion at 1K+ req/sec without batching overhead. Each Pipe becomes an HTTP endpoint your reporting layer or trading desk can call directly.

```bash
# Ingest bid events
curl -X POST "https://api.tinybird.co/v0/events?name=ad_events" \
  -H "Authorization: Bearer $TB_TOKEN" \
  -d '{"event_time":"2026-05-25T10:00:00Z","event_type":"impression","campaign_id":"camp_abc","won":1,"clearing_price":2.40,"region":"us"}'
```

The campaign report as a parameterized endpoint:

```sql
-- campaign_report.pipe
NODE report
SQL >
  %
  SELECT
    campaign_id,
    sum(impressions)                                           AS impressions,
    sum(clicks)                                                AS clicks,
    round(sum(clicks) / nullIf(sum(impressions), 0) * 100, 2) AS ctr_pct,
    round(sum(wins)   / nullIf(sum(impressions), 0) * 100, 2) AS win_rate_pct,
    sumMerge(spend)                                            AS spend
  FROM campaign_metrics_1h
  WHERE hour >= now() - INTERVAL {{ Int32(hours, 24) }} HOUR
    {% if defined(campaign_id) %}
    AND campaign_id = {{ String(campaign_id, '') }}
    {% end %}
    {% if defined(region) %}
    AND region = {{ String(region, '') }}
    {% end %}
  GROUP BY campaign_id
  ORDER BY impressions DESC

TYPE ENDPOINT
```

Your trading desk hits `campaign_report.json?hours=1` for the last hour, or `?hours=24&region=us` for US performance yesterday. Filters are query parameters. ClickHouse runs underneath. You don't manage any of it.

{% cta
  title="Ship your adtech analytics layer"
  text="Tinybird is managed ClickHouse® with ingestion, deduplication, and HTTP endpoints. No cluster to run."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
