Kafka is capable of producing millions of events per second, but those events only become useful when you can consume and query them. ClickHouse is a popular database for analyzing Kafka topic streams, and there are several ways to consume Kafka streams into ClickHouse, directly through its built-in Kafka table engine as well as through managed connectors from Tinybird
This guide walks through complete examples of connecting Kafka to ClickHouse, from basic table setup to production-ready streaming pipelines with materialized views and API endpoints.
What is the ClickHouse Kafka engine?
The Kafka table engine is a built-in ClickHouse feature that reads streaming data directly from Apache Kafka topics. It acts as a consumer that continuously pulls messages from Kafka and makes them queryable in ClickHouse without needing separate ETL tools or batch loading scripts.
Unlike batch ingestion that loads data at scheduled intervals, the Kafka engine provides continuous data flow that 90% of organizations consider important or very important for their analytics needs. Messages arrive in ClickHouse as soon as they're published to Kafka, so your analytics reflect what's happening right now rather than what happened hours ago. This is a capability that 59% of SMBs are now using for real-time analytics.
Here's how it works: you create a special table type that connects to your Kafka cluster and subscribes to one or more topics. When you query this table, ClickHouse reads the latest messages from Kafka. The data isn't stored in this table permanently though, so you'll typically use a materialized view to move it into a MergeTree table for long-term storage.
Prerequisites for a Kafka-to-ClickHouse pipeline
Before connecting Kafka to ClickHouse, you'll want a few components in place. These requirements make sure your pipeline can establish connections, authenticate properly, and handle data flow between systems.
Kafka broker
You'll want a running Kafka instance with network accessibility from your ClickHouse server. The Kafka broker handles message storage and delivery, and you'll want permissions to create topics and produce or consume messages from them.
ClickHouse server or Tinybird workspace
You can use either a self-hosted ClickHouse installation or a managed ClickHouse service like Tinybird. Self-hosting gives you complete control but requires expertise in distributed systems, storage optimization, and performance tuning, while 57.7% of organizations have already moved to cloud-based streaming analytics solutions. Tinybird provides a managed ClickHouse service that handles infrastructure automatically, letting you focus on building data pipelines rather than managing clusters.
Network and auth requirements
Your ClickHouse server requires network access to your Kafka brokers, which might mean configuring firewall rules or security groups. You'll also want Kafka connection strings, authentication credentials (if your cluster uses SASL or SSL), and any consumer group configurations your organization requires.
Quickstart example: streaming JSON from Kafka to ClickHouse
This example walks through the complete workflow for streaming JSON data from a Kafka topic into a ClickHouse table using the Kafka table engine. You'll create a Kafka topic, define the necessary ClickHouse tables, and verify that data flows correctly through the pipeline.
1. Create a Kafka topic
First, create a Kafka topic to hold your streaming data. This command creates a topic called user_events
with a single partition:
kafka-topics.sh --create --topic user_events \
--bootstrap-server localhost:9092 \
--partitions 1 \
--replication-factor 1
2. Create the Kafka engine table
The Kafka engine table acts as a consumer that reads from your topic. This table definition specifies the Kafka broker address, topic name, consumer group, and message format:
CREATE TABLE user_events_kafka (
user_id String,
event_type String,
timestamp DateTime64(3),
properties String
)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'user_events',
kafka_group_name = 'clickhouse_consumer',
kafka_format = 'JSONEachRow';
The JSONEachRow
format expects one JSON object per line, which is how most Kafka producers send data.
3. Create the target MergeTree table
Data from the Kafka engine table requires a permanent home. Create a MergeTree table with the same schema to store your events:
CREATE TABLE user_events (
user_id String,
event_type String,
timestamp DateTime64(3),
properties String
)
ENGINE = MergeTree()
ORDER BY (event_type, timestamp);
The ORDER BY
clause determines how ClickHouse sorts and stores data on disk, which affects query performance. Ordering by event_type
and timestamp
works well for queries that filter by event type and time range.
4. Insert sample messages
Push test messages to your Kafka topic using the Kafka console producer:
echo '{"user_id":"user_123","event_type":"page_view","timestamp":"2024-01-15 10:30:00.000","properties":"{}"}' | \
kafka-console-producer.sh --topic user_events --bootstrap-server localhost:9092
You can send multiple messages by repeating this command or by reading from a file containing one JSON object per line.
5. Verify the data
Query the Kafka engine table to see the latest messages. This query reads directly from Kafka without storing anything:
SELECT * FROM user_events_kafka LIMIT 5;
The Kafka engine table only shows messages that haven't been consumed yet. Once you set up a materialized view in the next section, those messages will move to permanent storage automatically.
Alternative approach: Stream with HTTP
If you don't need the full complexity of Kafka but still want streaming ingestion, Tinybird's Events API provides a lightweight HTTP-based alternative. Instead of managing Kafka brokers, topics, and consumer groups, you can stream data directly to ClickHouse using standard HTTP POST requests.