If you're building an analytics feature into a user-facing application or SaaS, you'll likely need an API route to query and serve data from the database. When those APIs need to handle high concurrency, complex aggregations, and sub-second latency at scale, ClickHouse becomes the natural choice for your database.
Building these APIs involves more than just exposing ClickHouse SQL queries over HTTP. This guide walks through the complete process: ingesting streaming data, data modeling for fast queries, writing parameterized SQL, and deploying secure endpoints using Tinybird's managed ClickHouse platform.
Why ClickHouse excels at real-time analytics
Building real-time APIs with ClickHouse means exposing analytical queries through HTTP endpoints that applications can call. ClickHouse is a great database for real-time analytics. It stores data in columns rather than rows, reads only the columns a query needs, and processes data in batches using SIMD instructions. This columnar format combined with vectorized execution makes ClickHouse fast at aggregations and filters, which are the operations most analytics APIs perform.
ClickHouse automatically compresses data based on column characteristics, often achieving 10x to 100x compression ratios. Smaller data sizes mean fewer disk reads, which translates directly to faster query execution. For APIs serving millions of requests daily, this compression reduces both infrastructure costs and response times.
The database handles high-concurrency reads on billion-row datasets without performance degradation. Each query runs independently in a shared-nothing architecture, avoiding the lock contention that slows down transactional databases during high traffic.
Tinybird for managed ClickHouse
If your end goal with ClickHouse is an API that can serve user-facing anlaytics features, Tinybird might be a great managed ClickHouse choice. It offers developer-focused abstractions that allow you to build ClickHouse-based API endpoints with zero infra setup. You can learn more about Tinybird's approach to ClickHouse in our comparison guide between Tinybird and OSS ClickHouse. For teams evaluating whether to self-host or use a managed service, our analysis of self-hosted ClickHouse costs provides a detailed breakdown of infrastructure and engineering expenses.
Prerequisities for using Tinybird
You'll need a Tinybird account and the Tinybird CLI installed locally to follow along. Tinybird provides a managed ClickHouse platform that handles infrastructure setup while offering local development through Docker.
1. Install the CLI
Install the Tinybird CLI using the installation script:
curl -L https://tinybird.co | sh
Verify the installation by checking the version:
tb --version
2. Authenticate and create a workspace
Log in to Tinybird or create a free account:
tb login
The command opens a browser window for authentication. You'll be prompted to created a cloud workspace, which is a logical collection of resources. After logging in, the CLI stores credentials locally.
3. Start the local container
Next, start Tinybird Local, a compact version of Tinybird that runs in a Docker container.
tb local start
Step-by-step pipeline to ingest streaming data
Real-time APIs require continuous data ingestion from streaming sources. Tinybird provides two primary ways to stream data: Kafka connections and the Events API.
Connect a Kafka topic to Tinybird
For those with existing Kafka infrastructure, the Kafka connector provides a managed ingestion connector that handlesconsumer creation, schema validation, streaming ingestion automatically without any additional infrastructure.
Create a Kafka connector in Tinybird by specifying broker addresses and authentication credentials. Navigate to the Tinybird UI and select "add data source" then "Kafka". You can also use the CLI:
tb connection create kafka
This kicks off a setup wizard that ultimately creates a .connection file that will look something like this:
TYPE kafka
KAFKA_BOOTSTRAP_SERVERS <BOOTSTRAP_SERVERS:PORT>
KAFKA_SECURITY_PROTOCOL SASL_SSL
KAFKA_SASL_MECHANISM PLAIN
KAFKA_KEY {{ tb_secret("KAFKA_KEY", "key") }}
KAFKA_SECRET {{ tb_secret("KAFKA_SECRET", "secret") }}
Note the use of Tinybird's Environment Variables to store and retrieve secrets - useful for authenticating the connection across different environments.
Then, you create a data source file to consume events from the Kafka stream and define their storage schema:
SCHEMA >
`timestamp` DateTime(3) `json:$.timestamp`,
`session_id` String `json:$.session_id`,
`action` LowCardinality(String) `json:$.action`,
`version` LowCardinality(String) `json:$.version`,
`payload` String `json:$.payload`,
`data` String `json:$`
KAFKA_CONNECTION_NAME kafka_sample # The name of the .connection file
KAFKA_TOPIC test_topic
KAFKA_GROUP_ID {{ tb_secret("KAFKA_GROUP_ID") }}
Once the connection and data source are created and deployed, Tinybird validates the connection and begins consuming messages from the specified topic.
Stream to Tinybird with HTTP
For those without existing Kafka setups, the Tinybird Events API provides a lightweight HTTP streaming endpoint that accepts 1,000 req/s at 20 MB/s limit. You just POST a JSON payload to the endpoint and Tinybird handles writing it to the database.
curl \
-H "Authorization: Bearer <import_token>" \
-d $'{"date": "2020-04-05 00:05:38", "city": "Chicago"}\n{"date": "2020-04-05 00:07:22", "city": "Madrid"}\n' \
'https://api.us-west-2.aws.tinybird.co/v0/events?name=events_test'
Similarly, you'd create a .datasource file that defines the schema for incoming events from the Events API, setting JSONPaths to define how the incoming JSON messages get parsed and stored. This file also specifies column names, data types, table engine and other table configuration:
SCHEMA >
`event_id` String `json:$.event.id`,
`user_id` String `json:$.user.id`,
`event_type` String `json:$.event.type`,
`timestamp` DateTime64(3) `json:$.timestamp`,
`payload` String `json:$`
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"
ENGINE_TTL "toDateTime(timestamp) + toIntervalDay(90)"
The partition key organizes data by month. The sorting key optimizes queries that filter by timestamp or user_id, which are common patterns in API queries.
Define your API endpoint
In Tinybird, pipes define SQL transformations that can be automatically deployed as API endpoints. Pipes can be broken into multiple nodes and include support for templating language to define logic and query parameters:
TOKEN "analytics_read" READ -- creates a read token for this API
NODE aggregate_user_events
SQL >
%
SELECT
user_id,
event_type,
count() as event_count,
uniq(event_id) as unique_events,
min(timestamp) as first_event,
max(timestamp) as last_event,
round(avg(length(payload)), 2) as avg_payload_size
FROM user_events
WHERE timestamp >= {{ DateTime(date_from) }} -- query params
AND timestamp < {{ DateTime(date_to) }}
AND user_id = {{ String(user_id) }}
{{ error('user_id must be defined') }} -- API error logic
AND event_type IN {{ Array(event_type, 'String') }}
GROUP BY user_id
ORDER BY event_count DESC
LIMIT {{ Int32(limit, 100) }} -- dynamic limit
TYPE endpoint -- publish as an API endpoint
Validate locally
Test your project configuration before deploying to production:
tb dev
This builds and validates your project locally (with hot reload on change)
You can generate sample data and append it to your data source:
tb mock user_events --rows 1000
tb datasource append user_events fixtures/user_events.ndjson
You can then fetch your API locally and validate results:
tb token copy "analytics_read" && EXPORT READ_TOKEN=$(pbpaste)
curl -H "Authorization: Bearer $READ_TOKEN" -d http://localhost:7181/v0/pipes/my_api.json?user_id=123&date_from=2025-10-01&date_to=2025-11-01
Deploy
Finally, deploy to the cloud:
tb --cloud deploy
You'll now have a hosted and scalable real-time API endpoint you can fetch from your application:
const TB_HOST = process.env.TB_HOST;
const READ_TOKEN = process.env.READ_TOKEN;
async function fetchMyApi(params = {}) {
const queryString = new URLSearchParams(params).toString();
const url = `${TB_HOST}/v0/pipes/my_api.json?${queryString}`;
const res = await fetch(url, {
headers: { Authorization: `Bearer ${READ_TOKEN}` }
});
if (!res.ok) {
throw new Error(`Request failed: ${res.status} ${res.statusText}`);
}
return res.json();
}
// Example usage
fetchMyApi({ user_id: 123, date_from: '2025-10-01', date_to: '2025-11-01' })
.then(data => console.log(data))
.catch(err => console.error(err));
Modeling data for low-latency queries
Schema design directly impacts API performance. The table engine, partition strategy, and indexes determine whether queries return in milliseconds or seconds.
Choosing the right table engine
The MergeTree family offers several engines optimized for different use cases:
MergeTree: The default choice for most analytical workloads, providing fast inserts and efficient queriesReplacingMergeTree: Deduplicates rows with the same sorting key, useful for upsert patterns where you want only the latest version of each recordAggregatingMergeTree: Pre-aggregates data during merges, reducing query time for dashboards that always show the same aggregations
For user event tracking, MergeTree works well:
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp)
For user profiles that update frequently, ReplacingMergeTree prevents duplicates:
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY user_id
Partitioning by time and id
Partitioning splits large tables into smaller physical chunks that ClickHouse can prune during queries. Time-based partitioning works well for event data because queries often filter by date ranges.
Partition by month for tables with moderate write volumes:
PARTITION BY toYYYYMM(timestamp)
For high-volume tables, you could partition by day:
PARTITION BY toDate(timestamp)
Avoid creating thousands of small partitions. ClickHouse tracks metadata for each partition, and too many partitions degrade performance. A good guideline is keeping partitions between 10GB and 100GB after compression.
Prevent heavy scans with filters
Always include filters on indexed columns to avoid full table scans. Queries without proper filters can timeout or consume excessive resources.
Add a LIMIT clause as an additional safety measure:
LIMIT {{Int32(limit, 100)}}
This prevents accidentally returning millions of rows when a query is too broad. For production APIs, consider adding a maximum limit to prevent abuse.
Auto-generated OpenAPI docs
Tinybird automatically generates OpenAPI documentation for your endpoints. Access the docs in the UI or export them for integration with API management tools. The documentation includes parameter descriptions, response schemas, and example requests.
Monitoring API performance with service data sources
Tinybird provides built-in observability through observability UIs as well as service data sources that automatically log every API request, data source operation, and system metric. These queryable logs enable performance monitoring, cost tracking, and debugging without external tools.
Real-time API statistics with pipe_stats_rt
The tinybird.pipe_stats_rt Service Data Source logs every API request in real-time and retains data for the last week. Query this data source to analyze endpoint performance:
SELECT
pipe_name,
count() as request_count,
avg(duration) as avg_latency_seconds,
quantile(0.95)(duration) as p95_latency_seconds,
sum(read_bytes) as total_bytes_read,
sum(read_rows) as total_rows_read
FROM tinybird.pipe_stats_rt
WHERE start_datetime >= now() - interval 1 hour
AND pipe_name = 'aggregate_user_events'
GROUP BY pipe_name
This query shows request volume, average latency, and p95 latency for the last hour. The duration field measures total request time in seconds, including query execution and data transfer.
Monitoring data source operations
The tinybird.datasources_ops_log tracks all operations on your data sources, including appends, deletes, and materialized view populates. Monitor ingestion performance and identify failed operations:
SELECT
datasource_name,
event_type,
result,
count() as operation_count,
avg(elapsed_time) as avg_elapsed_seconds,
sum(rows) as total_rows_affected
FROM tinybird.datasources_ops_log
WHERE timestamp >= now() - interval 24 hour
AND result = 'error'
GROUP BY datasource_name, event_type, result
ORDER BY operation_count DESC
The elapsed_time field measures operation duration in seconds. The error field contains detailed error messages when result = 'error', which helps debug failed ingestions or materializations.
Tracking vCPU consumption
Monitor compute usage by querying CPU time across pipes and data sources. This helps identify resource-intensive operations and optimize costs:
SELECT
pipe_name,
count() as request_count,
round(sum(cpu_time), 2) as total_cpu_seconds,
round(avg(cpu_time), 4) as avg_cpu_per_request
FROM tinybird.pipe_stats_rt
WHERE start_datetime >= now() - interval 7 day
GROUP BY pipe_name
ORDER BY total_cpu_seconds DESC
LIMIT 10
For data source operations, check datasources_ops_log:
SELECT
datasource_name,
event_type,
round(sum(cpu_time), 2) as total_cpu_seconds,
count() as total_operations
FROM tinybird.datasources_ops_log
WHERE timestamp >= now() - interval 7 day
GROUP BY datasource_name, event_type
ORDER BY total_cpu_seconds DESC
The cpu_time field measures actual CPU seconds consumed, which directly impacts vCPU hour billing on Developer and Enterprise plans.
Building custom monitoring endpoints
Create pipes that query service data sources to build custom monitoring dashboards. This endpoint monitors error rates across all APIs:
NODE error_rate_by_endpoint
SQL >
SELECT
pipe_name,
countIf(error = 1) as error_count,
count() as total_requests,
round(error_count / total_requests * 100, 2) as error_rate_percent
FROM tinybird.pipe_stats_rt
WHERE start_datetime >= now() - interval 1 hour
GROUP BY pipe_name
HAVING error_rate_percent > 0
ORDER BY error_rate_percent DESC
TYPE endpoint
Deploy this as an API endpoint and query it from external monitoring tools or set up alerts based on the response.
When to choose managed Tinybird over self-hosting ClickHouse
Self-hosting ClickHouse gives you full control but requires significant DevOps investment. Managed services trade some flexibility for faster time to value and reduced operational burden.
Time to value for small teams
Setting up a production-ready ClickHouse cluster involves configuring replication, backups, monitoring, and security. This process typically takes weeks or months for teams unfamiliar with ClickHouse internals. Tinybird provides a working environment in minutes, letting developers focus on building features rather than managing infrastructure.
Small engineering teams often lack dedicated DevOps resources. Managed services eliminate the need to hire specialists or train existing engineers on ClickHouse operations.
Built-in autoscaling and backups
Tinybird automatically scales compute resources based on query load, preventing over-provisioning during low traffic and maintaining availability during spikes. This elasticity reduces costs compared to maintaining fixed-size clusters that handle peak load.
Automated backups run continuously without manual configuration. Point-in-time recovery lets you restore data to any moment in the past, protecting against accidental deletions or data corruption.
Sign up for a free Tinybird account to start building ClickHouse-powered APIs without infrastructure management. The free tier includes 10GB of storage and 10 million API requests per month.
Frequently asked questions about building real-time APIs with ClickHouse
How do I migrate existing data from self-hosted ClickHouse to Tinybird?
Use the clickhouse-client to export data as NDJSON, then import it using tb datasource append. For large datasets, export in batches to avoid memory issues. Tinybird also supports direct ClickHouse-to-ClickHouse replication for ongoing sync during migration.
Can I invoke Tinybird endpoints from serverless functions like AWS Lambda?
Yes, Tinybird endpoints are standard HTTPS APIs that work with any HTTP client. Include your API token in the request header for authentication. Serverless functions can call Tinybird APIs just like any other REST service.
What happens if my API query takes longer than the timeout limit?
Tinybird enforces a 10-second timeout for API queries by default. Queries exceeding this limit return a timeout error. Optimize slow queries by adding indexes, using projections, or pre-aggregating data with materialized views.
/
