---
title: "ClickHouse for backend engineers"
excerpt: "How to integrate ClickHouse into a backend service: schema design for event tracking, ingestion via HTTP or Kafka, querying from application code, and when managed ClickHouse saves more time than running your own cluster."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-05-25 00:00:00"
publishedOn: "2026-05-25 00:00:00"
updatedOn: "2026-05-25 00:00:00"
status: "published"
---

Your Postgres database is slowing down under analytics queries. Your product team wants user-facing dashboards. Your data team wants to run cohort analysis. None of this belongs in your primary database.

ClickHouse® is where analytics workloads go. It is "a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP)", purpose-built for the aggregations, time-series queries, and GROUP BY operations that OLTP databases handle poorly.

This post is the integration guide for backend engineers who haven't used ClickHouse before: schema design, ingestion patterns, application-side querying, and when to use managed ClickHouse instead of running your own cluster.

## How ClickHouse is different from Postgres

The key mental model shift: ClickHouse is not a general-purpose database. It's optimized for one specific workload, scanning large amounts of data and aggregating it fast.

| | Postgres | ClickHouse |
| --- | --- | --- |
| Storage format | Row-store | Columnar |
| Best for | OLTP: many small reads/writes | OLAP: large scans, aggregations |
| Updates/deletes | Fast | Expensive, async |
| JOINs | Strong | Use with care on large tables |
| Point lookups | Fast | Slower than Postgres |
| GROUP BY over millions of rows | Slow | Milliseconds |
| Compression ratio | ~1:1 | ~5:10x on typical event data |

Use Postgres for your application data: user accounts, orders, sessions, anything with frequent row-level writes. Use ClickHouse for analytics: event tracking, time-series metrics, anything you aggregate at query time. They're complementary, not competing.

## Table design for event tracking

The most common ClickHouse pattern in a backend service is an event table: user actions, service errors, or system metrics.

```sql
CREATE TABLE user_events
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        LowCardinality(String),
    properties   String,           -- JSON blob for flexible attributes
    duration_ms  Nullable(UInt32),
    error        Nullable(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (org_id, user_id, timestamp);
```

Three decisions that matter most:

**`LowCardinality(String)` on `event`.** Event names are bounded, a few hundred at most. `LowCardinality` dictionary-encodes them internally, making GROUP BY and WHERE on that column 2-4x faster than plain `String`, with much better compression.

**Partition by month, sort by `(org_id, user_id, timestamp)`.** Multi-tenant SaaS almost always filters by org first. Putting `org_id` first in the sort key means per-org queries scan only that org's rows within a partition. Queries for a specific user within an org use both prefix columns.

**`properties` as a JSON string.** Flexible schemas in ClickHouse work best with a JSON blob for variable attributes. Access fields with `JSONExtractString(properties, 'key')` at query time. For fields you query often, extract them into typed columns after your access patterns stabilize, it's a cheap `ALTER TABLE ... ADD COLUMN`.

Always specify only the columns you need and avoid `SELECT *` in production queries. Wide tables with unused columns waste I/O on every scan.

## Writing events: HTTP ingestion

ClickHouse exposes an HTTP interface. Use the official `@clickhouse/client` Node.js package rather than raw fetch, it handles connection pooling, query retries, and format encoding:

```bash
npm install @clickhouse/client
```

```typescript
import { createClient } from "@clickhouse/client";

const client = createClient({
  host:     process.env.CLICKHOUSE_URL,
  database: "analytics",
  username: "writer",
  password: process.env.CLICKHOUSE_PASSWORD,
  max_open_connections: 10,
});

async function trackEvent(event: UserEvent) {
  await client.insert({
    table:  "user_events",
    values: [event],
    format: "JSONEachRow",
  });
}
```

In practice, batching is more efficient. ClickHouse processes inserts in "parts", many tiny inserts create many small parts and strain the background merge process. Batch to at least 1,000 rows per insert or flush on a time interval:

```typescript
class EventBuffer {
  private buffer: UserEvent[] = [];
  private timer: NodeJS.Timeout;

  constructor(private readonly flushMs = 1000) {
    this.timer = setInterval(() => this.flush(), this.flushMs);
  }

  push(event: UserEvent) {
    this.buffer.push(event);
    if (this.buffer.length >= 1000) this.flush();
  }

  private async flush() {
    if (this.buffer.length === 0) return;
    const batch = this.buffer.splice(0);
    await client.insert({ table: "user_events", values: batch, format: "JSONEachRow" });
  }
}
```

## Writing events: Kafka ingestion

If you're already running Kafka, [Kafka to ClickHouse](https://www.tinybird.co/blog/kafka-to-clickhouse-example) covers the full setup. The table engine consumes directly from a topic. The standard pattern uses three components: a Kafka engine table (the queue), a MergeTree destination (the storage), and a materialized view that connects them:

```sql
-- 1. Kafka engine: queue only, not storage
CREATE TABLE user_events_kafka
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        String,
    properties   String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list  = 'kafka:9092',
    kafka_topic_list   = 'user-events',
    kafka_group_name   = 'ch-analytics-consumer',
    kafka_format       = 'JSONEachRow';

-- 2. MergeTree destination (real storage)
CREATE TABLE user_events (...) ENGINE = MergeTree ...;

-- 3. Materialized view: fires on every consumed batch
CREATE MATERIALIZED VIEW user_events_mv TO user_events AS
SELECT * FROM user_events_kafka;
```

Once the view exists, ClickHouse begins consuming automatically and persisting rows into `user_events`. No application-side consumer code required.

Key Kafka engine parameters:

- `kafka_broker_list`: comma-separated `host:port` list
- `kafka_topic_list`: topic names to consume
- `kafka_group_name`: consumer group ID, ClickHouse tracks offsets per group
- `kafka_format`: `JSONEachRow`, `Avro`, `ProtobufSingle`, `CSV`, etc.
- `kafka_num_consumers`: number of consumer threads per table

Monitor consumer lag via `system.kafka_consumers` for operational visibility.

## Querying from application code

Use the `@clickhouse/client` query API with parameterized queries to avoid injection:

```typescript
async function getOrgMetrics(orgId: string, days: number) {
  const result = await client.query({
    query: `
      SELECT
        toStartOfDay(timestamp) AS day,
        event,
        count()                 AS total,
        uniq(user_id)           AS unique_users
      FROM user_events
      WHERE org_id    = {org_id:String}
        AND timestamp >= now() - INTERVAL {days:UInt32} DAY
      GROUP BY day, event
      ORDER BY day
    `,
    query_params: { org_id: orgId, days },
    format: "JSONEachRow",
  });

  return result.json();
}
```

ClickHouse's `{name:Type}` parameterization validates types before the query reaches the database, passing a string for `days` returns an error at the API layer, not a runtime exception from ClickHouse.

## Caching for user-facing APIs

For dashboard APIs serving many users, cache ClickHouse responses at the application layer. A simple Redis TTL cache in front of repeated identical queries reduces load without sacrificing freshness:

```typescript
const CACHE_TTL = 60; // seconds

async function getCachedMetrics(orgId: string, days: number) {
  const cacheKey = `metrics:${orgId}:${days}`;
  const cached   = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const data = await getOrgMetrics(orgId, days);
  await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(data));
  return data;
}
```

For queries powering public-facing dashboards that many users share (same parameters, different users), a 60-second TTL typically cuts ClickHouse load by 80-90% without any visible staleness.

## Enriching analytics with your OLTP data

Sometimes you need to join analytics data in ClickHouse with reference data that lives in Postgres. ClickHouse's PostgreSQL integration engine lets you query Postgres tables directly from ClickHouse SQL without copying data:

```sql
-- Reference data via PostgreSQL engine (read-only)
CREATE TABLE pg_users
ENGINE = PostgreSQL('postgres:5432', 'mydb', 'users', 'ch_reader', 'password');

-- Join ClickHouse events with Postgres user data
SELECT
    e.event,
    u.plan,
    count() AS events
FROM user_events e
JOIN pg_users u ON e.user_id = u.id
WHERE e.timestamp >= now() - INTERVAL 7 DAY
GROUP BY e.event, u.plan;
```

This is useful for enriching analytics with plan tier, account status, or other dimensions that live in your transactional database and change over time.

## Schema migrations

ClickHouse supports adding nullable columns without a full table rewrite, but [schema migrations while streaming](https://www.tinybird.co/blog/clickhouse-schema-migration-while-streaming) require more care than a simple `ALTER TABLE`:

```sql
-- Safe: adding a new nullable column
ALTER TABLE user_events ADD COLUMN plan LowCardinality(Nullable(String));

-- Safe: adding a column with a default value
ALTER TABLE user_events ADD COLUMN sdk_version String DEFAULT '';
```

Renaming columns and changing types are more involved, ClickHouse doesn't have the same `ALTER COLUMN TYPE` semantics as Postgres. Plan your schema upfront. Use the JSON blob pattern for attributes that are still evolving, and promote hot fields to typed columns as access patterns stabilize.

Tinybird handles schema migrations automatically. When you change a `.datasource` file and run `tb deploy`, Tinybird executes the migration safely, adding or modifying columns without downtime.

## When to use managed ClickHouse

Running ClickHouse in production means managing: cluster provisioning and scaling, replication and failover, schema migrations across nodes, backups, monitoring, and alerting. For teams where analytics infrastructure isn't the core product, that's significant ongoing overhead. [Self-hosted ClickHouse cost](https://www.tinybird.co/blog/self-hosted-clickhouse-cost) adds up fast once you factor in engineering time.

Tinybird is managed ClickHouse with a built-in API layer. The Events API accepts JSON at 1K+ req/sec, no client library or batching code needed:

```typescript
// No ClickHouse client, no batching code
await fetch("https://api.tinybird.co/v0/events?name=user_events", {
  method:  "POST",
  headers: { Authorization: `Bearer ${process.env.TB_TOKEN}` },
  body:    JSON.stringify(event),
});
```

Your query becomes a parameterized endpoint your frontend can call directly, the pattern behind [build real-time apps](https://www.tinybird.co/blog/build-real-time-apps) without a custom backend layer:

```sql
-- org_metrics.pipe
NODE metrics
SQL >
  %
  SELECT
    toStartOfDay(timestamp)  AS day,
    event,
    count()                  AS total,
    uniq(user_id)            AS unique_users
  FROM user_events
  WHERE org_id    = {{ String(org_id, required=True) }}
    AND timestamp >= now() - INTERVAL {{ Int32(days, 30) }} DAY
  GROUP BY day, event
  ORDER BY day

TYPE ENDPOINT
```

`required=True` means the endpoint returns a 400 if `org_id` is missing, no accidental full-table scans. Auth, parameter validation, connection pooling, and HTTP serving are all handled. No backend route needed for your analytics layer.

{% cta
  title="Add analytics to your backend without the infra"
  text="Tinybird is managed ClickHouse® with an HTTP ingestion API and SQL endpoints. Ship in hours, not weeks."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
