ClickHouse for backend engineers

Your Postgres database is slowing down under analytics queries. Your product team wants user-facing dashboards. Your data team wants to run cohort analysis. None of this belongs in your primary database.

ClickHouse® is where analytics workloads go. It is "a high-performance, column-oriented SQL database management system (DBMS) for online analytical processing (OLAP)", purpose-built for the aggregations, time-series queries, and GROUP BY operations that OLTP databases handle poorly.

This post is the integration guide for backend engineers who haven't used ClickHouse before: schema design, ingestion patterns, application-side querying, and when to use managed ClickHouse instead of running your own cluster.

How ClickHouse is different from Postgres

The key mental model shift: ClickHouse is not a general-purpose database. It's optimized for one specific workload, scanning large amounts of data and aggregating it fast.

	Postgres	ClickHouse
Storage format	Row-store	Columnar
Best for	OLTP: many small reads/writes	OLAP: large scans, aggregations
Updates/deletes	Fast	Expensive, async
JOINs	Strong	Use with care on large tables
Point lookups	Fast	Slower than Postgres
GROUP BY over millions of rows	Slow	Milliseconds
Compression ratio	~1:1	~5:10x on typical event data

Use Postgres for your application data: user accounts, orders, sessions, anything with frequent row-level writes. Use ClickHouse for analytics: event tracking, time-series metrics, anything you aggregate at query time. They're complementary, not competing.

Table design for event tracking

The most common ClickHouse pattern in a backend service is an event table: user actions, service errors, or system metrics.

CREATE TABLE user_events
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        LowCardinality(String),
    properties   String,           -- JSON blob for flexible attributes
    duration_ms  Nullable(UInt32),
    error        Nullable(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (org_id, user_id, timestamp);

Three decisions that matter most:

LowCardinality(String) on event. Event names are bounded, a few hundred at most. LowCardinality dictionary-encodes them internally, making GROUP BY and WHERE on that column 2-4x faster than plain String, with much better compression.

Partition by month, sort by (org_id, user_id, timestamp). Multi-tenant SaaS almost always filters by org first. Putting org_id first in the sort key means per-org queries scan only that org's rows within a partition. Queries for a specific user within an org use both prefix columns.

properties as a JSON string. Flexible schemas in ClickHouse work best with a JSON blob for variable attributes. Access fields with JSONExtractString(properties, 'key') at query time. For fields you query often, extract them into typed columns after your access patterns stabilize, it's a cheap ALTER TABLE ... ADD COLUMN.

Always specify only the columns you need and avoid SELECT * in production queries. Wide tables with unused columns waste I/O on every scan.

Writing events: HTTP ingestion

ClickHouse exposes an HTTP interface. Use the official @clickhouse/client Node.js package rather than raw fetch, it handles connection pooling, query retries, and format encoding:

npm install @clickhouse/client

import { createClient } from "@clickhouse/client";

const client = createClient({
  host:     process.env.CLICKHOUSE_URL,
  database: "analytics",
  username: "writer",
  password: process.env.CLICKHOUSE_PASSWORD,
  max_open_connections: 10,
});

async function trackEvent(event: UserEvent) {
  await client.insert({
    table:  "user_events",
    values: [event],
    format: "JSONEachRow",
  });
}

In practice, batching is more efficient. ClickHouse processes inserts in "parts", many tiny inserts create many small parts and strain the background merge process. Batch to at least 1,000 rows per insert or flush on a time interval:

class EventBuffer {
  private buffer: UserEvent[] = [];
  private timer: NodeJS.Timeout;

  constructor(private readonly flushMs = 1000) {
    this.timer = setInterval(() => this.flush(), this.flushMs);
  }

  push(event: UserEvent) {
    this.buffer.push(event);
    if (this.buffer.length >= 1000) this.flush();
  }

  private async flush() {
    if (this.buffer.length === 0) return;
    const batch = this.buffer.splice(0);
    await client.insert({ table: "user_events", values: batch, format: "JSONEachRow" });
  }
}

Writing events: Kafka ingestion

If you're already running Kafka, Kafka to ClickHouse covers the full setup. The table engine consumes directly from a topic. The standard pattern uses three components: a Kafka engine table (the queue), a MergeTree destination (the storage), and a materialized view that connects them:

-- 1. Kafka engine: queue only, not storage
CREATE TABLE user_events_kafka
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        String,
    properties   String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list  = 'kafka:9092',
    kafka_topic_list   = 'user-events',
    kafka_group_name   = 'ch-analytics-consumer',
    kafka_format       = 'JSONEachRow';

-- 2. MergeTree destination (real storage)
CREATE TABLE user_events (...) ENGINE = MergeTree ...;

-- 3. Materialized view: fires on every consumed batch
CREATE MATERIALIZED VIEW user_events_mv TO user_events AS
SELECT * FROM user_events_kafka;

Once the view exists, ClickHouse begins consuming automatically and persisting rows into user_events. No application-side consumer code required.

Key Kafka engine parameters:

kafka_broker_list: comma-separated host:port list
kafka_topic_list: topic names to consume
kafka_group_name: consumer group ID, ClickHouse tracks offsets per group
kafka_format: JSONEachRow, Avro, ProtobufSingle, CSV, etc.
kafka_num_consumers: number of consumer threads per table

Monitor consumer lag via system.kafka_consumers for operational visibility.

Querying from application code

Use the @clickhouse/client query API with parameterized queries to avoid injection:

async function getOrgMetrics(orgId: string, days: number) {
  const result = await client.query({
    query: `
      SELECT
        toStartOfDay(timestamp) AS day,
        event,
        count()                 AS total,
        uniq(user_id)           AS unique_users
      FROM user_events
      WHERE org_id    = {org_id:String}
        AND timestamp >= now() - INTERVAL {days:UInt32} DAY
      GROUP BY day, event
      ORDER BY day
    `,
    query_params: { org_id: orgId, days },
    format: "JSONEachRow",
  });

  return result.json();
}

ClickHouse's {name:Type} parameterization validates types before the query reaches the database, passing a string for days returns an error at the API layer, not a runtime exception from ClickHouse.

Caching for user-facing APIs

For dashboard APIs serving many users, cache ClickHouse responses at the application layer. A simple Redis TTL cache in front of repeated identical queries reduces load without sacrificing freshness:

const CACHE_TTL = 60; // seconds

async function getCachedMetrics(orgId: string, days: number) {
  const cacheKey = `metrics:${orgId}:${days}`;
  const cached   = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const data = await getOrgMetrics(orgId, days);
  await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(data));
  return data;
}

For queries powering public-facing dashboards that many users share (same parameters, different users), a 60-second TTL typically cuts ClickHouse load by 80-90% without any visible staleness.

Enriching analytics with your OLTP data

Sometimes you need to join analytics data in ClickHouse with reference data that lives in Postgres. ClickHouse's PostgreSQL integration engine lets you query Postgres tables directly from ClickHouse SQL without copying data:

-- Reference data via PostgreSQL engine (read-only)
CREATE TABLE pg_users
ENGINE = PostgreSQL('postgres:5432', 'mydb', 'users', 'ch_reader', 'password');

-- Join ClickHouse events with Postgres user data
SELECT
    e.event,
    u.plan,
    count() AS events
FROM user_events e
JOIN pg_users u ON e.user_id = u.id
WHERE e.timestamp >= now() - INTERVAL 7 DAY
GROUP BY e.event, u.plan;

This is useful for enriching analytics with plan tier, account status, or other dimensions that live in your transactional database and change over time.

Schema migrations

ClickHouse supports adding nullable columns without a full table rewrite, but schema migrations while streaming require more care than a simple ALTER TABLE:

-- Safe: adding a new nullable column
ALTER TABLE user_events ADD COLUMN plan LowCardinality(Nullable(String));

-- Safe: adding a column with a default value
ALTER TABLE user_events ADD COLUMN sdk_version String DEFAULT '';

Renaming columns and changing types are more involved, ClickHouse doesn't have the same ALTER COLUMN TYPE semantics as Postgres. Plan your schema upfront. Use the JSON blob pattern for attributes that are still evolving, and promote hot fields to typed columns as access patterns stabilize.

Tinybird handles schema migrations automatically. When you change a .datasource file and run tb deploy, Tinybird executes the migration safely, adding or modifying columns without downtime.

When to use managed ClickHouse

Running ClickHouse in production means managing: cluster provisioning and scaling, replication and failover, schema migrations across nodes, backups, monitoring, and alerting. For teams where analytics infrastructure isn't the core product, that's significant ongoing overhead. Self-hosted ClickHouse cost adds up fast once you factor in engineering time.

Tinybird is managed ClickHouse with a built-in API layer. The Events API accepts JSON at 1K+ req/sec, no client library or batching code needed:

// No ClickHouse client, no batching code
await fetch("https://api.tinybird.co/v0/events?name=user_events", {
  method:  "POST",
  headers: { Authorization: `Bearer ${process.env.TB_TOKEN}` },
  body:    JSON.stringify(event),
});

Your query becomes a parameterized endpoint your frontend can call directly, the pattern behind build real-time apps without a custom backend layer:

-- org_metrics.pipe
NODE metrics
SQL >
  %
  SELECT
    toStartOfDay(timestamp)  AS day,
    event,
    count()                  AS total,
    uniq(user_id)            AS unique_users
  FROM user_events
  WHERE org_id    = {{ String(org_id, required=True) }}
    AND timestamp >= now() - INTERVAL {{ Int32(days, 30) }} DAY
  GROUP BY day, event
  ORDER BY day

TYPE ENDPOINT

required=True means the endpoint returns a 400 if org_id is missing, no accidental full-table scans. Auth, parameter validation, connection pooling, and HTTP serving are all handled. No backend route needed for your analytics layer.

Add analytics to your backend without the infra

Tinybird is managed ClickHouse® with an HTTP ingestion API and SQL endpoints. Ship in hours, not weeks.

Try Tinybird free

How ClickHouse is different from Postgres

The key mental model shift: ClickHouse is not a general-purpose database. It's optimized for one specific workload, scanning large amounts of data and aggregating it fast.

	Postgres	ClickHouse
Storage format	Row-store	Columnar
Best for	OLTP: many small reads/writes	OLAP: large scans, aggregations
Updates/deletes	Fast	Expensive, async
JOINs	Strong	Use with care on large tables
Point lookups	Fast	Slower than Postgres
GROUP BY over millions of rows	Slow	Milliseconds
Compression ratio	~1:1	~5:10x on typical event data

Table design for event tracking

The most common ClickHouse pattern in a backend service is an event table: user actions, service errors, or system metrics.

CREATE TABLE user_events
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        LowCardinality(String),
    properties   String,           -- JSON blob for flexible attributes
    duration_ms  Nullable(UInt32),
    error        Nullable(String)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (org_id, user_id, timestamp);

Three decisions that matter most:

Always specify only the columns you need and avoid SELECT * in production queries. Wide tables with unused columns waste I/O on every scan.

Writing events: HTTP ingestion

ClickHouse exposes an HTTP interface. Use the official @clickhouse/client Node.js package rather than raw fetch, it handles connection pooling, query retries, and format encoding:

npm install @clickhouse/client

import { createClient } from "@clickhouse/client";

const client = createClient({
  host:     process.env.CLICKHOUSE_URL,
  database: "analytics",
  username: "writer",
  password: process.env.CLICKHOUSE_PASSWORD,
  max_open_connections: 10,
});

async function trackEvent(event: UserEvent) {
  await client.insert({
    table:  "user_events",
    values: [event],
    format: "JSONEachRow",
  });
}

class EventBuffer {
  private buffer: UserEvent[] = [];
  private timer: NodeJS.Timeout;

  constructor(private readonly flushMs = 1000) {
    this.timer = setInterval(() => this.flush(), this.flushMs);
  }

  push(event: UserEvent) {
    this.buffer.push(event);
    if (this.buffer.length >= 1000) this.flush();
  }

  private async flush() {
    if (this.buffer.length === 0) return;
    const batch = this.buffer.splice(0);
    await client.insert({ table: "user_events", values: batch, format: "JSONEachRow" });
  }
}

Writing events: Kafka ingestion

-- 1. Kafka engine: queue only, not storage
CREATE TABLE user_events_kafka
(
    timestamp    DateTime,
    user_id      String,
    org_id       String,
    event        String,
    properties   String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list  = 'kafka:9092',
    kafka_topic_list   = 'user-events',
    kafka_group_name   = 'ch-analytics-consumer',
    kafka_format       = 'JSONEachRow';

-- 2. MergeTree destination (real storage)
CREATE TABLE user_events (...) ENGINE = MergeTree ...;

-- 3. Materialized view: fires on every consumed batch
CREATE MATERIALIZED VIEW user_events_mv TO user_events AS
SELECT * FROM user_events_kafka;

Once the view exists, ClickHouse begins consuming automatically and persisting rows into user_events. No application-side consumer code required.

Key Kafka engine parameters:

kafka_broker_list: comma-separated host:port list
kafka_topic_list: topic names to consume
kafka_group_name: consumer group ID, ClickHouse tracks offsets per group
kafka_format: JSONEachRow, Avro, ProtobufSingle, CSV, etc.
kafka_num_consumers: number of consumer threads per table

Monitor consumer lag via system.kafka_consumers for operational visibility.

Querying from application code

Use the @clickhouse/client query API with parameterized queries to avoid injection:

async function getOrgMetrics(orgId: string, days: number) {
  const result = await client.query({
    query: `
      SELECT
        toStartOfDay(timestamp) AS day,
        event,
        count()                 AS total,
        uniq(user_id)           AS unique_users
      FROM user_events
      WHERE org_id    = {org_id:String}
        AND timestamp >= now() - INTERVAL {days:UInt32} DAY
      GROUP BY day, event
      ORDER BY day
    `,
    query_params: { org_id: orgId, days },
    format: "JSONEachRow",
  });

  return result.json();
}

Caching for user-facing APIs

For dashboard APIs serving many users, cache ClickHouse responses at the application layer. A simple Redis TTL cache in front of repeated identical queries reduces load without sacrificing freshness:

const CACHE_TTL = 60; // seconds

async function getCachedMetrics(orgId: string, days: number) {
  const cacheKey = `metrics:${orgId}:${days}`;
  const cached   = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);

  const data = await getOrgMetrics(orgId, days);
  await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(data));
  return data;
}

For queries powering public-facing dashboards that many users share (same parameters, different users), a 60-second TTL typically cuts ClickHouse load by 80-90% without any visible staleness.

Enriching analytics with your OLTP data

-- Reference data via PostgreSQL engine (read-only)
CREATE TABLE pg_users
ENGINE = PostgreSQL('postgres:5432', 'mydb', 'users', 'ch_reader', 'password');

-- Join ClickHouse events with Postgres user data
SELECT
    e.event,
    u.plan,
    count() AS events
FROM user_events e
JOIN pg_users u ON e.user_id = u.id
WHERE e.timestamp >= now() - INTERVAL 7 DAY
GROUP BY e.event, u.plan;

This is useful for enriching analytics with plan tier, account status, or other dimensions that live in your transactional database and change over time.

Schema migrations

ClickHouse supports adding nullable columns without a full table rewrite, but schema migrations while streaming require more care than a simple ALTER TABLE:

-- Safe: adding a new nullable column
ALTER TABLE user_events ADD COLUMN plan LowCardinality(Nullable(String));

-- Safe: adding a column with a default value
ALTER TABLE user_events ADD COLUMN sdk_version String DEFAULT '';

Tinybird handles schema migrations automatically. When you change a .datasource file and run tb deploy, Tinybird executes the migration safely, adding or modifying columns without downtime.

When to use managed ClickHouse

Tinybird is managed ClickHouse with a built-in API layer. The Events API accepts JSON at 1K+ req/sec, no client library or batching code needed:

// No ClickHouse client, no batching code
await fetch("https://api.tinybird.co/v0/events?name=user_events", {
  method:  "POST",
  headers: { Authorization: `Bearer ${process.env.TB_TOKEN}` },
  body:    JSON.stringify(event),
});

Your query becomes a parameterized endpoint your frontend can call directly, the pattern behind build real-time apps without a custom backend layer:

-- org_metrics.pipe
NODE metrics
SQL >
  %
  SELECT
    toStartOfDay(timestamp)  AS day,
    event,
    count()                  AS total,
    uniq(user_id)            AS unique_users
  FROM user_events
  WHERE org_id    = {{ String(org_id, required=True) }}
    AND timestamp >= now() - INTERVAL {{ Int32(days, 30) }} DAY
  GROUP BY day, event
  ORDER BY day

TYPE ENDPOINT

Add analytics to your backend without the infra

Tinybird is managed ClickHouse® with an HTTP ingestion API and SQL endpoints. Ship in hours, not weeks.

Try Tinybird free

Skip the infra work. Deploy your first ClickHouse^®
project now.

Blog

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse for backend engineers

How ClickHouse is different from Postgres

Table design for event tracking

Writing events: HTTP ingestion

Writing events: Kafka ingestion

Querying from application code

Caching for user-facing APIs

Enriching analytics with your OLTP data

Schema migrations

When to use managed ClickHouse

Add analytics to your backend without the infra

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse for backend engineers

How ClickHouse is different from Postgres

Table design for event tracking

Writing events: HTTP ingestion

Writing events: Kafka ingestion

Querying from application code

Caching for user-facing APIs

Enriching analytics with your OLTP data

Schema migrations

When to use managed ClickHouse

Add analytics to your backend without the infra

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

Blog

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse for backend engineers

How ClickHouse is different from Postgres

Table design for event tracking

Writing events: HTTP ingestion

Writing events: Kafka ingestion

Querying from application code

Caching for user-facing APIs

Enriching analytics with your OLTP data

Schema migrations

When to use managed ClickHouse

Add analytics to your backend without the infra

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse for backend engineers

How ClickHouse is different from Postgres

Table design for event tracking

Writing events: HTTP ingestion

Writing events: Kafka ingestion

Querying from application code

Caching for user-facing APIs

Enriching analytics with your OLTP data

Schema migrations

When to use managed ClickHouse

Add analytics to your backend without the infra

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.