ClickHouse® + Elasticsearch — 3 Ways to Connect in 2026

ClickHouse® + Elasticsearch — 3 Ways to Connect in {{ year }}

These are the main options for a ClickHouse® integration Elasticsearch pipeline:

Tinybird
ClickHouse® Cloud + ClickPipes (Kafka source)
Self-managed or custom (Kafka bridge, Logstash, or batch via scroll API)

Elasticsearch is a distributed search and analytics engine used for full-text search, log analytics, observability, and APM. Many teams want a copy of that data in ClickHouse® for columnar analytical queries, reporting, and real-time dashboards at scale.

A ClickHouse® integration Elasticsearch setup uses Logstash, Kafka bridging, or the Elasticsearch scroll API to move Elasticsearch data into ClickHouse® for real-time analytics without putting analytical load on Elasticsearch.

Below we compare all three options in depth—architecture, real configuration examples, trade-offs, and when to use each.

Looking for minimal ops and instant APIs?

Tinybird combines managed ingestion from your Logstash→Kafka or Logstash→HTTP path, managed ClickHouse®, and one-click API publishing from SQL—no Kafka engine to operate yourself.

Three ways to implement ClickHouse® integration Elasticsearch

This section is the core: the three options to connect Elasticsearch to ClickHouse®, in order.

Option 1: Tinybird — managed ClickHouse® with API layer

Tinybird is a real-time data platform built on ClickHouse®. It combines ingestion, storage, and API publishing in one product.

How it works: configure a Logstash pipeline with an HTTP or Kafka output that forwards documents to Tinybird. The http output sends documents directly to the Tinybird Events API. The kafka output writes to a topic consumed by the Tinybird Kafka connector.

Data lands in Tinybird's ClickHouse®-backed data sources. You define Pipes (SQL) and publish them as REST endpoints.

Logstash pipeline → Tinybird Events API:

input {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "logs-*"
    user => "elastic"
    password => "${ES_PASSWORD}"
    schedule => "* * * * *"
    query => '{"query": {"range": {"@timestamp": {"gte": "now-1m"}}}}'
  }
}

filter {
  mutate {
    rename => { "@timestamp" => "ts" }
    remove_field => ["@version", "tags"]
  }
}

output {
  http {
    url => "https://api.tinybird.co/v0/events?name=es_logs"
    http_method => "post"
    headers => { "Authorization" => "Bearer ${TB_TOKEN}" }
    format => "json_batch"
    codec => "json"
  }
}

When Tinybird fits:

You want getting Elasticsearch data into ClickHouse® with minimal ops
You need APIs and real-time dashboards from the same data
You prefer an Elasticsearch to ClickHouse® pipeline with an API layer built in

Prerequisites: Logstash 8.x with the elasticsearch input and either http or kafka output. Data flows: Elasticsearch → Logstash → Kafka or HTTP → Tinybird. You run Elasticsearch and Logstash; Tinybird runs ingestion, storage, and API publishing.

Option 2: ClickHouse® Cloud + ClickPipes (Kafka source)

ClickHouse® Cloud's ClickPipes supports Kafka as a data source. There is no native Elasticsearch connector.

How it works: configure a Logstash pipeline with a Kafka output that writes documents to a topic. Then create a Kafka ClickPipe in the ClickHouse® Cloud console pointing to your broker and topic.

Logstash pipeline → Kafka:

input {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "logs-*"
    schedule => "* * * * *"
  }
}

output {
  kafka {
    bootstrap_servers => "kafka:9092"
    topic_id => "es-logs"
    codec => "json"
  }
}

ClickHouse® Cloud destination table:

CREATE TABLE es_logs
(
    ts          DateTime,
    index_name  LowCardinality(String),
    level       LowCardinality(String),
    service     LowCardinality(String),
    message     String,
    doc_id      String
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, level, ts, doc_id);

When it fits:

You want managed ClickHouse® with a Kafka-based ingestion path
You're already on ClickHouse® Cloud and can configure Logstash→Kafka
You'll add your own API or BI layer on top of ClickHouse® Cloud

Prerequisites: Logstash with a Kafka output. Data flows: Elasticsearch → Logstash → Kafka → ClickPipes → ClickHouse® Cloud.

Option 3: Self-managed or custom (Kafka, Logstash, or batch scroll API)

With self-managed ClickHouse®, the streaming pattern uses Logstash → Kafka → Kafka table engine → materialized view → MergeTree.

ClickHouse® Kafka table engine + materialized view:

-- Kafka engine reads from topic
CREATE TABLE es_logs_kafka
(
    ts         DateTime,
    index_name String,
    level      String,
    service    String,
    message    String,
    doc_id     String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'es-logs',
    kafka_group_name = 'clickhouse_es_consumer',
    kafka_format = 'JSONEachRow';

-- Target MergeTree table
CREATE TABLE es_logs
(
    ts         DateTime,
    index_name LowCardinality(String),
    level      LowCardinality(String),
    service    LowCardinality(String),
    message    String,
    doc_id     String
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, level, ts, doc_id);

-- Materialized view
CREATE MATERIALIZED VIEW es_logs_mv TO es_logs AS
SELECT * FROM es_logs_kafka;

Batch sync via the scroll API (for initial load or periodic sync):

from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
import clickhouse_driver

es = Elasticsearch("https://localhost:9200", basic_auth=("elastic", "password"))
ch = clickhouse_driver.Client("localhost")

rows = []
for doc in scan(es, index="logs-*", query={"query": {"match_all": {}}}):
    src = doc["_source"]
    rows.append({
        "ts": src.get("@timestamp"),
        "index_name": doc["_index"],
        "level": src.get("level", ""),
        "service": src.get("service", ""),
        "message": src.get("message", ""),
        "doc_id": doc["_id"]
    })
    if len(rows) >= 1000:
        ch.execute("INSERT INTO es_logs VALUES", rows)
        rows.clear()
if rows:
    ch.execute("INSERT INTO es_logs VALUES", rows)

When it fits:

You already run ClickHouse® and Kafka and want full control
You have data-engineering capacity to manage the full stack
Batch scroll export works when sub-minute freshness is not required

Decision framework: which option fits your situation

The right choice depends on four variables: freshness requirement, team capacity, whether you need an API layer, and cost tolerance.

Situation	Recommended option
Real-time analytics, need REST APIs, minimal ops	Tinybird
Already on ClickHouse® Cloud, have Kafka + Logstash	ClickPipes + Kafka
Self-managed ClickHouse® + Kafka, full control	Self-managed
One-time migration or periodic batch sync	Batch scroll API
Observability platform with long-retention analytics	Tinybird or self-managed

Choose Tinybird when you need real-time data ingestion from Elasticsearch and REST APIs from the same data—without operating ClickHouse® infrastructure.

Choose ClickPipes when you're already on ClickHouse® Cloud and have Logstash→Kafka running. You get managed ClickHouse® ingestion but build your own serving layer.

Choose self-managed when your team is comfortable operating Logstash, Kafka, and ClickHouse® and needs full schema and configuration control.

Summary table

Option	Ingestion path	API layer	Ops burden	Kafka required
Tinybird	Logstash HTTP or Kafka	Built in (Pipes)	Low	No
ClickHouse® Cloud ClickPipes	Logstash → Kafka	Build your own	Medium	Yes
Self-managed	Logstash → Kafka or batch scroll	Build your own	High	Depends

What is Elasticsearch and why integrate it with ClickHouse®?

Elasticsearch as the data source

Elasticsearch is a distributed search engine built on Lucene. It provides an HTTP REST API, inverted-index storage, and a flexible JSON document model. Teams use it for full-text search, log ingestion, observability (APM traces, metrics), and Kibana dashboards.

For analytical workloads at scale—long-retention analytics, aggregations across millions of documents, or user-facing analytics—running heavy Elasticsearch aggregations becomes expensive and impacts search performance.

Sending a copy of that data to ClickHouse® gives you a dedicated analytical store: columnar, optimized for large scans and low latency aggregations, without affecting Elasticsearch query performance.

How to get data out of Elasticsearch

Elasticsearch does not have a native ClickHouse® output. Data moves out via Logstash pipelines.

The elasticsearch input plugin supports scheduled queries with query DSL. The kafka output writes documents to Kafka topics. The http output sends documents to HTTP endpoints (e.g. Tinybird Events API).

For batch: the scroll API or point-in-time + search_after supports paginated export by time range. The Python elasticsearch.helpers.scan() helper abstracts the scroll protocol, processing documents in configurable batch sizes.

Why route Elasticsearch data to ClickHouse®

An Elasticsearch to ClickHouse® pipeline lets you run analytical queries over Elasticsearch data at scale without impacting search performance or scaling Elasticsearch for analytical loads.

ClickHouse®'s columnar storage achieves 10x–100x compression compared to Elasticsearch's inverted-index model for typical log data. Aggregation queries that take seconds in Elasticsearch often run in milliseconds in ClickHouse®.

Schema and pipeline design

Mapping Elasticsearch documents to ClickHouse® tables

Elasticsearch document data typically includes @timestamp, _index, level, service, message, and nested JSON fields.

A well-designed ClickHouse® target schema:

CREATE TABLE es_logs
(
    ts          DateTime,
    index_name  LowCardinality(String),  -- Elasticsearch index name
    level       LowCardinality(String),  -- log/error/warn
    service     LowCardinality(String),  -- source service
    message     String,
    doc_id      String,                  -- Elasticsearch _id for dedup
    extra       String                   -- JSON for variable fields
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, service, level, ts, doc_id);

Use LowCardinality(String) for low-cardinality fields (level, index, service) to improve compression and filtering speed. Use ReplacingMergeTree with doc_id for deduplication.

Handling large indices and schema drift

For batch scroll of large indices, chunk by time range using search_after with a point-in-time (PIT):

# Open a point-in-time
pit = es.open_point_in_time(index="logs-*", keep_alive="5m")
pit_id = pit["id"]

search_after = None
while True:
    body = {
        "query": {"range": {"@timestamp": {"gte": "2026-01-01", "lt": "2026-02-01"}}},
        "sort": [{"@timestamp": "asc"}, {"_id": "asc"}],
        "size": 1000,
        "pit": {"id": pit_id, "keep_alive": "5m"}
    }
    if search_after:
        body["search_after"] = search_after

    result = es.search(body=body)
    hits = result["hits"]["hits"]
    if not hits:
        break
    # process hits...
    search_after = hits[-1]["sort"]

Schema drift: new Elasticsearch fields require updating the ClickHouse® schema. Plan for nullable columns (Nullable(String)) for optional fields, or a catch-all extra String JSON column.

Failure modes to plan for

Duplicates from retries: Logstash at-least-once delivery can re-send documents. Use _id as deduplication key with ReplacingMergeTree.
Pipeline lag: monitor Kafka consumer lag and define a freshness SLA. Alert when lag exceeds your threshold.
Large index scans: for batch scroll, avoid unbounded queries. Always filter by time range and use search_after + PIT for large indices.
Elasticsearch query load: Logstash input queries add load to Elasticsearch. Schedule during off-peak hours or use a dedicated replica node.

Why ClickHouse® for Elasticsearch analytics

ClickHouse® is a columnar OLAP database built for analytical queries over large volumes. MergeTree tables and vectorized execution deliver sub-second queries on billions of rows.

ClickHouse®'s columnar compression reduces I/O for log and event analytics by 10x–100x compared to Elasticsearch's inverted-index model. For the same log data, ClickHouse® stores more data, queries faster, and costs less for analytical workloads.

A ClickHouse® integration Elasticsearch setup fits real-time analytics, long-retention log analytics, and user-facing analytics—without scaling Elasticsearch to meet analytical demand.

Why Tinybird is the best Elasticsearch to ClickHouse® option

Most teams don't need to operate ClickHouse® infrastructure—they need fast analytics on Elasticsearch data exposed as APIs.

Tinybird is purpose-built for this. You configure a Logstash pipeline that sends data to Tinybird's Events API or Kafka connector. Tinybird handles ingestion, storage, and API publishing in one product.

You avoid operating the ClickHouse® Kafka engine and the overhead of maintaining a separate API layer. Define Pipes in SQL, publish as REST endpoints, and serve dashboards or product features with sub-100ms latency and automatic scaling.

The Logstash http output replaces the need for Kafka entirely: documents go directly from Elasticsearch → Logstash → Tinybird.

Frequently Asked Questions (FAQs)

Does ClickHouse® Cloud support Elasticsearch natively?

ClickHouse® Cloud does not have a native Elasticsearch connector in ClickPipes. You use the Kafka data source: configure a Logstash pipeline with a Kafka output, then create a Kafka ClickPipe to load into ClickHouse® Cloud.

You operate the Elasticsearch → Logstash → Kafka path; ClickHouse® Cloud ingests from Kafka.

Can I use Tinybird for Elasticsearch to ClickHouse® without Kafka?

Yes. Use the Logstash HTTP output to send documents directly to the Tinybird Events API. No Kafka cluster required.

Tinybird stores data in ClickHouse®-backed data sources and lets you publish Pipes as REST APIs. You own the Logstash pipeline; Tinybird is the destination and API layer.

How do I handle document deduplication in ClickHouse® integration Elasticsearch?

Use the Elasticsearch _id field as your deduplication key in ClickHouse®. Configure ReplacingMergeTree where the version column is derived from a monotonic timestamp or update counter.

With Logstash at-least-once delivery, the same document may arrive more than once. ReplacingMergeTree collapses duplicates during background merges. Use FINAL at query time for strong consistency.

What is the difference between Logstash streaming and scroll API batch sync?

Logstash streaming sends documents as they are indexed in Elasticsearch—near real-time (seconds latency). This keeps ClickHouse® data fresh but adds continuous load to Elasticsearch.

Scroll API batch sync exports documents on a schedule (e.g. hourly or daily). Latency equals your schedule interval. Use batch sync for one-time migrations or when freshness requirements are relaxed.

Is a ClickHouse® integration Elasticsearch good for long-retention log analytics?

Yes—this is one of the primary motivations. Elasticsearch is expensive for long-retention: storage and compute scale linearly with data volume, and hot-tier costs are high.

Move aged logs to ClickHouse® where columnar compression reduces storage by 10x and analytical query costs drop significantly. Keep Elasticsearch for recent logs (hot tier, search) and ClickHouse® for historical analytics and reporting.

How does ClickHouse® compare to Elasticsearch for aggregation queries?

ClickHouse® significantly outperforms Elasticsearch for analytical aggregations. Columnar storage means queries only read the columns they need. Elasticsearch's document-oriented inverted index reads full documents and aggregates at query time.

For GROUP BY aggregations, time-series rollups, and large-range scans, ClickHouse® typically runs 10x–100x faster than Elasticsearch at equivalent data volumes.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Blog

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® + Elasticsearch — 3 Ways to Connect in 2026

Three ways to implement ClickHouse® integration Elasticsearch

Option 1: Tinybird — managed ClickHouse® with API layer

Option 2: ClickHouse® Cloud + ClickPipes (Kafka source)

Option 3: Self-managed or custom (Kafka, Logstash, or batch scroll API)

Decision framework: which option fits your situation

Summary table

What is Elasticsearch and why integrate it with ClickHouse®?

Elasticsearch as the data source

How to get data out of Elasticsearch

Why route Elasticsearch data to ClickHouse®

Schema and pipeline design

Mapping Elasticsearch documents to ClickHouse® tables

Handling large indices and schema drift

Failure modes to plan for

Why ClickHouse® for Elasticsearch analytics

Why Tinybird is the best Elasticsearch to ClickHouse® option

Frequently Asked Questions (FAQs)

Does ClickHouse® Cloud support Elasticsearch natively?

Can I use Tinybird for Elasticsearch to ClickHouse® without Kafka?

How do I handle document deduplication in ClickHouse® integration Elasticsearch?

What is the difference between Logstash streaming and scroll API batch sync?

Is a ClickHouse® integration Elasticsearch good for long-retention log analytics?

How does ClickHouse® compare to Elasticsearch for aggregation queries?

Ship faster
with Tinybird

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

Blog

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse® + Elasticsearch — 3 Ways to Connect in 2026

Three ways to implement ClickHouse® integration Elasticsearch

Option 1: Tinybird — managed ClickHouse® with API layer

Option 2: ClickHouse® Cloud + ClickPipes (Kafka source)

Option 3: Self-managed or custom (Kafka, Logstash, or batch scroll API)

Decision framework: which option fits your situation

Summary table

What is Elasticsearch and why integrate it with ClickHouse®?

Elasticsearch as the data source

How to get data out of Elasticsearch

Why route Elasticsearch data to ClickHouse®

Schema and pipeline design

Mapping Elasticsearch documents to ClickHouse® tables

Handling large indices and schema drift

Failure modes to plan for

Why ClickHouse® for Elasticsearch analytics

Why Tinybird is the best Elasticsearch to ClickHouse® option

Frequently Asked Questions (FAQs)

Does ClickHouse® Cloud support Elasticsearch natively?

Can I use Tinybird for Elasticsearch to ClickHouse® without Kafka?

How do I handle document deduplication in ClickHouse® integration Elasticsearch?

What is the difference between Logstash streaming and scroll API batch sync?

Is a ClickHouse® integration Elasticsearch good for long-retention log analytics?

How does ClickHouse® compare to Elasticsearch for aggregation queries?

Ship faster with Tinybird

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® + Elasticsearch — 3 Ways to Connect in 2026

Ship faster
with Tinybird