---
title: "How to build real-time APIs with ClickHouse® using Tinybird"
excerpt: "Build real-time APIs on ClickHouse with Tinybird. Skip the infrastructure work and ship analytics endpoints in hours."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-11-07 17:08:23"
publishedOn: "2025-11-07 17:08:23"
updatedOn: "2025-11-07 17:08:23"
status: "published"
---

If you're building an analytics feature into a user-facing application or SaaS, you'll likely need an API route to query and serve data from the database. When those APIs need to handle high concurrency, complex aggregations, and sub-second latency at scale, ClickHouse® becomes the natural choice for your database.

Building these APIs involves more than just exposing ClickHouse® SQL queries over HTTP. This guide walks through the complete process: ingesting streaming data, data modeling for fast queries, writing parameterized SQL, and deploying secure endpoints using Tinybird's [managed ClickHouse® platform](https://www.tinybird.co/product/managed-clickhouse).

## Why ClickHouse® excels at real-time analytics

Building real-time APIs with ClickHouse® means exposing analytical queries through HTTP endpoints that applications can call. ClickHouse® is a great [database for real-time analytics](https://www.tinybird.co/blog/best-database-for-real-time-analytics). It stores data in columns rather than rows, reads only the columns a query needs, and processes data in batches using SIMD instructions. This columnar format combined with vectorized execution makes ClickHouse® fast at aggregations and filters, which are the operations most analytics APIs perform.

ClickHouse® automatically compresses data based on column characteristics, often achieving 10x to 100x compression ratios. Smaller data sizes mean fewer disk reads, which translates directly to faster query execution. For APIs serving millions of requests daily, this compression reduces both infrastructure costs and response times.

The database handles high-concurrency reads on billion-row datasets without performance degradation. Each query runs independently in a shared-nothing architecture, avoiding the lock contention that slows down transactional databases during high traffic.

## Tinybird for managed ClickHouse®

If your end goal with ClickHouse® is an API that can serve user-facing anlaytics features, Tinybird might be a great managed ClickHouse® choice. It offers developer-focused abstractions that allow you to build ClickHouse®-based API endpoints with zero infra setup. You can learn more about Tinybird's approach to ClickHouse® in our [comparison guide between Tinybird and OSS ClickHouse®](https://www.tinybird.co/blog/tinybird-vs-clickhouse). For teams evaluating whether to self-host or use a managed service, our [analysis of self-hosted ClickHouse® costs](https://www.tinybird.co/blog-posts/self-hosted-clickhouse-cost) provides a detailed breakdown of infrastructure and engineering expenses.

## Prerequisities for using Tinybird

You'll need a Tinybird account and the Tinybird CLI installed locally to follow along. [Tinybird](https://tinybird.co) provides a managed ClickHouse® platform that handles infrastructure setup while offering local development through Docker.

### 1. Install the CLI

Install the Tinybird CLI using the installation script:

```bash
curl -L https://tinybird.co | sh
```

Verify the installation by checking the version:

```bash
tb --version
```

### 2. Authenticate and create a workspace

Log in to Tinybird or create a free account:

```bash
tb login
```

The command opens a browser window for authentication. You'll be prompted to created a cloud workspace, which is a logical collection of resources. After logging in, the CLI stores credentials locally.

### 3. Start the local container

Next, start Tinybird Local, a compact version of Tinybird that runs in a Docker container.

```bash
tb local start
```

## Step-by-step pipeline to ingest streaming data

Real-time APIs require continuous data ingestion from streaming sources. Tinybird provides two primary ways to stream data: Kafka connections and the Events API.

### Connect a Kafka topic to Tinybird

For those with existing Kafka infrastructure, the Kafka connector provides a managed ingestion connector that handles[consumer creation, schema validation, streaming ingestion](https://www.tinybird.co/blog/kafka-to-api-endpoints) automatically without any additional infrastructure.

Create a Kafka connector in Tinybird by specifying broker addresses and authentication credentials. Navigate to the Tinybird UI and select "add data source" then "Kafka". You can also use the CLI:

```bash
tb connection create kafka
```

This kicks off a setup wizard that ultimately creates a `.connection` file that will look something like this:

```tinybird
TYPE kafka
KAFKA_BOOTSTRAP_SERVERS <BOOTSTRAP_SERVERS:PORT>
KAFKA_SECURITY_PROTOCOL SASL_SSL
KAFKA_SASL_MECHANISM PLAIN
KAFKA_KEY {{ tb_secret("KAFKA_KEY", "key") }}
KAFKA_SECRET {{ tb_secret("KAFKA_SECRET", "secret") }}
```

Note the use of Tinybird's Environment Variables to store and retrieve secrets - useful for authenticating the connection across different environments.

Then, you create a data source file to consume events from the Kafka stream and define their storage schema:

```tinybird
SCHEMA >
   `timestamp` DateTime(3) `json:$.timestamp`,
   `session_id` String `json:$.session_id`,
   `action` LowCardinality(String) `json:$.action`,
   `version` LowCardinality(String) `json:$.version`,
   `payload` String `json:$.payload`,
   `data` String `json:$`

KAFKA_CONNECTION_NAME kafka_sample # The name of the .connection file
KAFKA_TOPIC test_topic
KAFKA_GROUP_ID {{ tb_secret("KAFKA_GROUP_ID") }}
```

Once the connection and data source are created and deployed, Tinybird validates the connection and begins consuming messages from the specified topic.

### Stream to Tinybird with HTTP

For those without existing Kafka setups, the Tinybird Events API provides a lightweight HTTP streaming endpoint that accepts 1,000 req/s at 20 MB/s limit. You just POST a JSON payload to the endpoint and Tinybird handles writing it to the database.

```bash
curl \
    -H "Authorization: Bearer <import_token>" \
    -d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' \
    'https://api.us-west-2.aws.tinybird.co/v0/events?name=events_test'
```

Similarly, you'd create a `.datasource` file that defines the schema for incoming events from the Events API, setting JSONPaths to define how the incoming JSON messages get parsed and stored. This file also specifies column names, data types, table engine and other table configuration:

```tinybird
SCHEMA >
    `event_id` String `json:$.event.id`,
    `user_id` String `json:$.user.id`,
    `event_type` String `json:$.event.type`,
    `timestamp` DateTime64(3) `json:$.timestamp`,
    `payload` String `json:$`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"
ENGINE_TTL "toDateTime(timestamp) + toIntervalDay(90)"
```

The partition key organizes data by month. The sorting key optimizes queries that filter by timestamp or user\_id, which are common patterns in API queries.

### Define your API endpoint

In Tinybird, pipes define SQL transformations that can be automatically deployed as API endpoints. Pipes can be broken into multiple nodes and include support for templating language to define logic and query parameters:

```tinybird
TOKEN "analytics_read" READ -- creates a read token for this API

NODE aggregate_user_events
SQL >
    %
    SELECT
        user_id,
        event_type,
        count() as event_count,
        uniq(event_id) as unique_events,
        min(timestamp) as first_event,
        max(timestamp) as last_event,
        round(avg(length(payload)), 2) as avg_payload_size
    FROM user_events
    WHERE timestamp >= {{ DateTime(date_from) }} -- query params
      AND timestamp < {{ DateTime(date_to) }}
      {% if defined(user_id) %}
        AND user_id = {{ String(user_id) }}
      {% else %}
        {{ error('user_id must be defined') }} -- API error logic
      {% end %}
      {% if defined(event_type) %}
        AND event_type IN {{ Array(event_type, 'String') }}
      {% end %}
    GROUP BY user_id
    ORDER BY event_count DESC
    LIMIT {{ Int32(limit, 100) }} -- dynamic limit

TYPE endpoint -- publish as an API endpoint
```

### Validate locally

Test your project configuration before deploying to production:

```bash
tb dev
```

This builds and validates your project locally (with hot reload on change)

You can generate sample data and append it to your data source:

```bash
tb mock user_events --rows 1000
tb datasource append user_events fixtures/user_events.ndjson
```

You can then fetch your API locally and validate results:

```bash
tb token copy "analytics_read" && EXPORT READ_TOKEN=$(pbpaste)
curl -H "Authorization: Bearer $READ_TOKEN" -d http://localhost:7181/v0/pipes/my_api.json?user_id=123&date_from=2025-10-01&date_to=2025-11-01
```

### Deploy

Finally, deploy to the cloud:

```bash
tb --cloud deploy
```

You'll now have a hosted and scalable real-time API endpoint you can fetch from your application:

```javascript
const TB_HOST = process.env.TB_HOST;
const READ_TOKEN = process.env.READ_TOKEN;

async function fetchMyApi(params = {}) {
  const queryString = new URLSearchParams(params).toString();
  const url = `${TB_HOST}/v0/pipes/my_api.json?${queryString}`;
  const res = await fetch(url, {
    headers: { Authorization: `Bearer ${READ_TOKEN}` }
  });
  if (!res.ok) {
    throw new Error(`Request failed: ${res.status} ${res.statusText}`);
  }
  return res.json();
}

// Example usage
fetchMyApi({ user_id: 123, date_from: '2025-10-01', date_to: '2025-11-01' })
  .then(data => console.log(data))
  .catch(err => console.error(err));
```

## Modeling data for low-latency queries

Schema design directly impacts API performance. The table engine, partition strategy, and indexes determine whether queries return in milliseconds or seconds.

### Choosing the right table engine

The MergeTree family offers several engines optimized for different use cases:

* `MergeTree`: The default choice for most analytical workloads, providing fast inserts and efficient queries
* `ReplacingMergeTree`: Deduplicates rows with the same sorting key, useful for upsert patterns where you want only the latest version of each record
* `AggregatingMergeTree`: Pre-aggregates data during merges, reducing query time for dashboards that always show the same aggregations

For user event tracking, `MergeTree` works well:

```sql
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp)
```

For user profiles that update frequently, `ReplacingMergeTree` prevents duplicates:

```sql
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY user_id
```

### Partitioning by time and id

Partitioning splits large tables into smaller physical chunks that ClickHouse® can prune during queries. Time-based partitioning works well for event data because queries often filter by date ranges.

Partition by month for tables with moderate write volumes:

```sql
PARTITION BY toYYYYMM(timestamp)
```

For [high-volume tables](https://www.tinybird.co/blog/real-time-analytics-with-billion-rows-at-scale), you could partition by day:

```sql
PARTITION BY toDate(timestamp)
```

Avoid creating thousands of small partitions. ClickHouse® tracks metadata for each partition, and too many partitions degrade performance. A good guideline is keeping partitions between 10GB and 100GB after compression.

### Prevent heavy scans with filters

Always include filters on indexed columns to avoid full table scans. Queries without proper filters can timeout or consume excessive resources.

Add a `LIMIT` clause as an additional safety measure:

```tinybird
LIMIT {{Int32(limit, 100)}}
```

This prevents accidentally returning millions of rows when a query is too broad. For production APIs, consider adding a maximum limit to prevent abuse.

### Auto-generated OpenAPI docs

Tinybird automatically generates OpenAPI documentation for your endpoints. Access the docs in the UI or export them for integration with API management tools. The documentation includes parameter descriptions, response schemas, and example requests.

## Monitoring API performance with service data sources

Tinybird provides built-in observability through observability UIs as well as service data sources that automatically log every API request, data source operation, and system metric. These queryable logs enable performance monitoring, cost tracking, and debugging without external tools.

### Real-time API statistics with pipe_stats_rt

The `tinybird.pipe_stats_rt` Service Data Source logs every API request in real-time and retains data for the last week. Query this data source to analyze endpoint performance:

```tinybird
SELECT
    pipe_name,
    count() as request_count,
    avg(duration) as avg_latency_seconds,
    quantile(0.95)(duration) as p95_latency_seconds,
    sum(read_bytes) as total_bytes_read,
    sum(read_rows) as total_rows_read
FROM tinybird.pipe_stats_rt
WHERE start_datetime >= now() - interval 1 hour
  AND pipe_name = 'aggregate_user_events'
GROUP BY pipe_name
```

This query shows request volume, average latency, and p95 latency for the last hour. The `duration` field measures total request time in seconds, including query execution and data transfer.

### Monitoring data source operations

The `tinybird.datasources_ops_log` tracks all operations on your data sources, including appends, deletes, and materialized view populates. Monitor ingestion performance and identify failed operations:

```tinybird
SELECT
    datasource_name,
    event_type,
    result,
    count() as operation_count,
    avg(elapsed_time) as avg_elapsed_seconds,
    sum(rows) as total_rows_affected
FROM tinybird.datasources_ops_log
WHERE timestamp >= now() - interval 24 hour
  AND result = 'error'
GROUP BY datasource_name, event_type, result
ORDER BY operation_count DESC
```

The `elapsed_time` field measures operation duration in seconds. The `error` field contains detailed error messages when `result = 'error'`, which helps debug failed ingestions or materializations.

### Tracking vCPU consumption

Monitor compute usage by querying CPU time across pipes and data sources. This helps identify resource-intensive operations and optimize costs:

```tinybird
SELECT
    pipe_name,
    count() as request_count,
    round(sum(cpu_time), 2) as total_cpu_seconds,
    round(avg(cpu_time), 4) as avg_cpu_per_request
FROM tinybird.pipe_stats_rt
WHERE start_datetime >= now() - interval 7 day
GROUP BY pipe_name
ORDER BY total_cpu_seconds DESC
LIMIT 10
```

For data source operations, check `datasources_ops_log`:

```tinybird
SELECT
    datasource_name,
    event_type,
    round(sum(cpu_time), 2) as total_cpu_seconds,
    count() as total_operations
FROM tinybird.datasources_ops_log
WHERE timestamp >= now() - interval 7 day
GROUP BY datasource_name, event_type
ORDER BY total_cpu_seconds DESC
```

The `cpu_time` field measures actual CPU seconds consumed, which directly impacts vCPU hour billing on Developer and Enterprise plans.

### Building custom monitoring endpoints

Create pipes that query service data sources to build custom monitoring dashboards. This endpoint monitors error rates across all APIs:

```tinybird
NODE error_rate_by_endpoint
SQL >
    SELECT
        pipe_name,
        countIf(error = 1) as error_count,
        count() as total_requests,
        round(error_count / total_requests * 100, 2) as error_rate_percent
    FROM tinybird.pipe_stats_rt
    WHERE start_datetime >= now() - interval 1 hour
    GROUP BY pipe_name
    HAVING error_rate_percent > 0
    ORDER BY error_rate_percent DESC

TYPE endpoint
```

Deploy this as an API endpoint and query it from external monitoring tools or set up alerts based on the response.

## When to choose managed Tinybird over self-hosting ClickHouse®

Self-hosting ClickHouse® gives you full control but requires significant DevOps investment. Managed services trade some flexibility for faster time to value and reduced operational burden.

### Time to value for small teams

Setting up a production-ready ClickHouse® cluster involves configuring replication, backups, monitoring, and security. This process typically takes weeks or months for teams unfamiliar with ClickHouse® internals. Tinybird provides a working environment in minutes, letting developers focus on building features rather than managing infrastructure.

Small engineering teams often lack dedicated DevOps resources. Managed services eliminate the need to hire specialists or train existing engineers on ClickHouse® operations.

### Built-in autoscaling and backups

Tinybird automatically scales compute resources based on query load, preventing over-provisioning during low traffic and maintaining availability during spikes. This elasticity reduces costs compared to maintaining fixed-size clusters that handle peak load.

Automated backups run continuously without manual configuration. Point-in-time recovery lets you restore data to any moment in the past, protecting against accidental deletions or data corruption.

[Sign up for a free Tinybird account](https://cloud.tinybird.co/signup) to start building ClickHouse®-powered APIs without infrastructure management. The free tier includes 10GB of storage and 10 million API requests per month.

## Frequently asked questions about building real-time APIs with ClickHouse®

### How do I migrate existing data from self-hosted ClickHouse® to Tinybird?

Use the `clickhouse-client` to export data as NDJSON, then import it using `tb datasource append`. For large datasets, export in batches to avoid memory issues. Tinybird also supports direct ClickHouse®-to-ClickHouse® replication for ongoing sync during migration.

### Can I invoke Tinybird endpoints from serverless functions like AWS Lambda?

Yes, Tinybird endpoints are standard HTTPS APIs that work with any HTTP client. Include your API token in the request header for authentication. Serverless functions can call Tinybird APIs just like any other REST service.

### What happens if my API query takes longer than the timeout limit?

Tinybird enforces a 10-second timeout for API queries by default. Queries exceeding this limit return a timeout error. Optimize slow queries by adding indexes, using projections, or pre-aggregating data with materialized views.

/
