---
title: "ClickHouse® + Elasticsearch — 3 Ways to Connect in 2026"
excerpt: "Get Elasticsearch data into ClickHouse® for analytics. Tinybird, ClickPipes Kafka, or self-managed—compare the three options and choose the right pipeline architecture."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-03-30 00:00:00"
publishedOn: "2026-03-30 00:00:00"
updatedOn: "2026-03-30 00:00:00"
status: "published"
---

**ClickHouse® + Elasticsearch — 3 Ways to Connect in {{ year }}**

These are the main options for a **ClickHouse® integration Elasticsearch** pipeline:

1. [Tinybird](https://www.tinybird.co/)
2. ClickHouse® Cloud + ClickPipes (Kafka source)
3. Self-managed or custom (Kafka bridge, Logstash, or batch via scroll API)

**Elasticsearch** is a **distributed search and analytics engine** used for **full-text search**, **log analytics**, **observability**, and **APM**. Many teams want a copy of that data in ClickHouse® for **columnar analytical queries**, **reporting**, and [real-time dashboards](https://www.tinybird.co/blog/real-time-dashboards-are-they-worth-it) at scale.

A **ClickHouse® integration Elasticsearch** setup uses **Logstash**, **Kafka bridging**, or the **Elasticsearch scroll API** to move Elasticsearch data into ClickHouse® for [real-time analytics](https://www.tinybird.co/blog/real-time-analytics-a-definitive-guide) without putting analytical load on Elasticsearch.

Below we compare all three options in depth—architecture, real configuration examples, trade-offs, and when to use each.

**Looking for minimal ops and instant APIs?**

Tinybird combines **managed ingestion** from your Logstash→Kafka or Logstash→HTTP path, **managed ClickHouse®**, and **one-click API publishing** from SQL—no Kafka engine to operate yourself.

## **Three ways to implement ClickHouse® integration Elasticsearch**

This section is the core: the three options to connect Elasticsearch to ClickHouse®, in order.

### **Option 1: Tinybird — managed ClickHouse® with API layer**

Tinybird is a **real-time data platform** built on ClickHouse®. It combines ingestion, storage, and API publishing in one product.

**How it works:** configure a **Logstash pipeline** with an HTTP or Kafka output that forwards documents to Tinybird. The `http` output sends documents directly to the **Tinybird Events API**. The `kafka` output writes to a topic consumed by the **Tinybird Kafka connector**.

Data lands in Tinybird's ClickHouse®-backed data sources. You define **Pipes** (SQL) and publish them as REST endpoints.

**Logstash pipeline → Tinybird Events API:**

```ruby
input {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "logs-*"
    user => "elastic"
    password => "${ES_PASSWORD}"
    schedule => "* * * * *"
    query => '{"query": {"range": {"@timestamp": {"gte": "now-1m"}}}}'
  }
}

filter {
  mutate {
    rename => { "@timestamp" => "ts" }
    remove_field => ["@version", "tags"]
  }
}

output {
  http {
    url => "https://api.tinybird.co/v0/events?name=es_logs"
    http_method => "post"
    headers => { "Authorization" => "Bearer ${TB_TOKEN}" }
    format => "json_batch"
    codec => "json"
  }
}
```

**When Tinybird fits:**

- You want **getting Elasticsearch data into ClickHouse®** with minimal ops
- You need APIs and [real-time dashboards](https://www.tinybird.co/blog/real-time-dashboards-are-they-worth-it) from the same data
- You prefer an **Elasticsearch to ClickHouse® pipeline** with an API layer built in

**Prerequisites:** Logstash 8.x with the `elasticsearch` input and either `http` or `kafka` output. Data flows: Elasticsearch → Logstash → Kafka or HTTP → Tinybird. You run Elasticsearch and Logstash; Tinybird runs ingestion, storage, and API publishing.

### **Option 2: ClickHouse® Cloud + ClickPipes (Kafka source)**

ClickHouse® Cloud's **ClickPipes** supports **Kafka** as a data source. There is no native Elasticsearch connector.

**How it works:** configure a **Logstash pipeline** with a Kafka output that writes documents to a topic. Then create a **Kafka ClickPipe** in the ClickHouse® Cloud console pointing to your broker and topic.

**Logstash pipeline → Kafka:**

```ruby
input {
  elasticsearch {
    hosts => ["https://localhost:9200"]
    index => "logs-*"
    schedule => "* * * * *"
  }
}

output {
  kafka {
    bootstrap_servers => "kafka:9092"
    topic_id => "es-logs"
    codec => "json"
  }
}
```

**ClickHouse® Cloud destination table:**

```sql
CREATE TABLE es_logs
(
    ts          DateTime,
    index_name  LowCardinality(String),
    level       LowCardinality(String),
    service     LowCardinality(String),
    message     String,
    doc_id      String
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, level, ts, doc_id);
```

**When it fits:**

- You want managed ClickHouse® with a Kafka-based ingestion path
- You're already on ClickHouse® Cloud and can configure Logstash→Kafka
- You'll add your own API or BI layer on top of ClickHouse® Cloud

**Prerequisites:** Logstash with a Kafka output. Data flows: Elasticsearch → Logstash → Kafka → ClickPipes → ClickHouse® Cloud.

### **Option 3: Self-managed or custom (Kafka, Logstash, or batch scroll API)**

With **self-managed ClickHouse®**, the streaming pattern uses **Logstash** → **Kafka** → **Kafka table engine** → **materialized view** → **MergeTree**.

**ClickHouse® Kafka table engine + materialized view:**

```sql
-- Kafka engine reads from topic
CREATE TABLE es_logs_kafka
(
    ts         DateTime,
    index_name String,
    level      String,
    service    String,
    message    String,
    doc_id     String
)
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'es-logs',
    kafka_group_name = 'clickhouse_es_consumer',
    kafka_format = 'JSONEachRow';

-- Target MergeTree table
CREATE TABLE es_logs
(
    ts         DateTime,
    index_name LowCardinality(String),
    level      LowCardinality(String),
    service    LowCardinality(String),
    message    String,
    doc_id     String
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, level, ts, doc_id);

-- Materialized view
CREATE MATERIALIZED VIEW es_logs_mv TO es_logs AS
SELECT * FROM es_logs_kafka;
```

**Batch sync** via the **scroll API** (for initial load or periodic sync):

```python
from elasticsearch import Elasticsearch
from elasticsearch.helpers import scan
import clickhouse_driver

es = Elasticsearch("https://localhost:9200", basic_auth=("elastic", "password"))
ch = clickhouse_driver.Client("localhost")

rows = []
for doc in scan(es, index="logs-*", query={"query": {"match_all": {}}}):
    src = doc["_source"]
    rows.append({
        "ts": src.get("@timestamp"),
        "index_name": doc["_index"],
        "level": src.get("level", ""),
        "service": src.get("service", ""),
        "message": src.get("message", ""),
        "doc_id": doc["_id"]
    })
    if len(rows) >= 1000:
        ch.execute("INSERT INTO es_logs VALUES", rows)
        rows.clear()
if rows:
    ch.execute("INSERT INTO es_logs VALUES", rows)
```

**When it fits:**

- You already run ClickHouse® and Kafka and want full control
- You have data-engineering capacity to manage the full stack
- Batch scroll export works when sub-minute freshness is not required

### **Decision framework: which option fits your situation**

The right choice depends on **four variables**: freshness requirement, team capacity, whether you need an API layer, and cost tolerance.

| Situation | Recommended option |
| --- | --- |
| Real-time analytics, need REST APIs, minimal ops | Tinybird |
| Already on ClickHouse® Cloud, have Kafka + Logstash | ClickPipes + Kafka |
| Self-managed ClickHouse® + Kafka, full control | Self-managed |
| One-time migration or periodic batch sync | Batch scroll API |
| Observability platform with long-retention analytics | Tinybird or self-managed |

**Choose Tinybird** when you need [real-time data ingestion](https://www.tinybird.co/blog/real-time-data-ingestion) from Elasticsearch and REST APIs from the same data—without operating ClickHouse® infrastructure.

**Choose ClickPipes** when you're already on ClickHouse® Cloud and have Logstash→Kafka running. You get managed ClickHouse® ingestion but build your own serving layer.

**Choose self-managed** when your team is comfortable operating Logstash, Kafka, and ClickHouse® and needs full schema and configuration control.

### **Summary table**

| Option | Ingestion path | API layer | Ops burden | Kafka required |
| --- | --- | --- | --- | --- |
| Tinybird | Logstash HTTP or Kafka | Built in (Pipes) | Low | No |
| ClickHouse® Cloud ClickPipes | Logstash → Kafka | Build your own | Medium | Yes |
| Self-managed | Logstash → Kafka or batch scroll | Build your own | High | Depends |

## **What is Elasticsearch and why integrate it with ClickHouse®?**

### **Elasticsearch as the data source**

Elasticsearch is a **distributed search engine** built on Lucene. It provides an HTTP REST API, **inverted-index** storage, and a flexible JSON document model. Teams use it for **full-text search**, **log ingestion**, **observability** (APM traces, metrics), and **Kibana dashboards**.

For **analytical workloads at scale**—long-retention analytics, aggregations across millions of documents, or [user-facing analytics](https://www.tinybird.co/blog/user-facing-analytics)—running heavy Elasticsearch aggregations becomes expensive and impacts search performance.

Sending a copy of that data to ClickHouse® gives you a **dedicated analytical store**: columnar, optimized for large scans and [low latency](https://www.cisco.com/site/us/en/learn/topics/cloud-networking/what-is-low-latency.html) aggregations, without affecting Elasticsearch query performance.

### **How to get data out of Elasticsearch**

Elasticsearch does not have a native ClickHouse® output. Data moves out via **Logstash** pipelines.

The **`elasticsearch` input plugin** supports scheduled queries with `query` DSL. The `kafka` output writes documents to Kafka topics. The `http` output sends documents to HTTP endpoints (e.g. Tinybird Events API).

For batch: the **scroll API** or **point-in-time + search_after** supports paginated export by time range. The Python `elasticsearch.helpers.scan()` helper abstracts the scroll protocol, processing documents in configurable batch sizes.

### **Why route Elasticsearch data to ClickHouse®**

An **Elasticsearch to ClickHouse® pipeline** lets you run analytical queries over Elasticsearch data at scale without impacting search performance or scaling Elasticsearch for analytical loads.

ClickHouse®'s **columnar storage** achieves **10x–100x compression** compared to Elasticsearch's inverted-index model for typical log data. Aggregation queries that take seconds in Elasticsearch often run in milliseconds in ClickHouse®.

## **Schema and pipeline design**

### **Mapping Elasticsearch documents to ClickHouse® tables**

Elasticsearch document data typically includes `@timestamp`, `_index`, `level`, `service`, `message`, and nested JSON fields.

A well-designed ClickHouse® target schema:

```sql
CREATE TABLE es_logs
(
    ts          DateTime,
    index_name  LowCardinality(String),  -- Elasticsearch index name
    level       LowCardinality(String),  -- log/error/warn
    service     LowCardinality(String),  -- source service
    message     String,
    doc_id      String,                  -- Elasticsearch _id for dedup
    extra       String                   -- JSON for variable fields
)
ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(ts)
ORDER BY (index_name, service, level, ts, doc_id);
```

Use `LowCardinality(String)` for **low-cardinality fields** (level, index, service) to improve compression and filtering speed. Use `ReplacingMergeTree` with `doc_id` for deduplication.

### **Handling large indices and schema drift**

For **batch scroll** of large indices, chunk by time range using `search_after` with a point-in-time (PIT):

```python
# Open a point-in-time
pit = es.open_point_in_time(index="logs-*", keep_alive="5m")
pit_id = pit["id"]

search_after = None
while True:
    body = {
        "query": {"range": {"@timestamp": {"gte": "2026-01-01", "lt": "2026-02-01"}}},
        "sort": [{"@timestamp": "asc"}, {"_id": "asc"}],
        "size": 1000,
        "pit": {"id": pit_id, "keep_alive": "5m"}
    }
    if search_after:
        body["search_after"] = search_after

    result = es.search(body=body)
    hits = result["hits"]["hits"]
    if not hits:
        break
    # process hits...
    search_after = hits[-1]["sort"]
```

**Schema drift:** new Elasticsearch fields require updating the ClickHouse® schema. Plan for **nullable columns** (`Nullable(String)`) for optional fields, or a catch-all `extra String` JSON column.

### **Failure modes to plan for**

- **Duplicates from retries:** Logstash at-least-once delivery can re-send documents. Use `_id` as deduplication key with `ReplacingMergeTree`.
- **Pipeline lag:** monitor Kafka consumer lag and define a freshness SLA. Alert when lag exceeds your threshold.
- **Large index scans:** for batch scroll, avoid unbounded queries. Always filter by time range and use `search_after` + PIT for large indices.
- **Elasticsearch query load:** Logstash input queries add load to Elasticsearch. Schedule during off-peak hours or use a dedicated replica node.

## **Why ClickHouse® for Elasticsearch analytics**

ClickHouse® is a **columnar OLAP [database](https://www.oracle.com/database/what-is-database/)** built for analytical queries over large volumes. **MergeTree** tables and **vectorized execution** deliver **sub-second queries** on billions of rows.

ClickHouse®'s columnar compression reduces I/O for log and event analytics by **10x–100x** compared to Elasticsearch's inverted-index model. For the same log data, ClickHouse® stores more data, queries faster, and costs less for analytical workloads.

A **ClickHouse® integration Elasticsearch** setup fits [real-time analytics](https://www.tinybird.co/blog/real-time-analytics-a-definitive-guide), long-retention log analytics, and [user-facing analytics](https://www.tinybird.co/blog/user-facing-analytics)—without scaling Elasticsearch to meet analytical demand.

## **Why Tinybird is the best Elasticsearch to ClickHouse® option**

Most teams don't need to operate ClickHouse® infrastructure—they need **fast analytics on Elasticsearch data exposed as APIs**.

Tinybird is purpose-built for this. You configure a Logstash pipeline that sends data to Tinybird's Events API or Kafka connector. Tinybird handles **ingestion, storage, and API publishing** in one product.

You **avoid operating the ClickHouse® Kafka engine** and the overhead of maintaining a separate API layer. Define Pipes in SQL, publish as REST endpoints, and serve dashboards or product features with **sub-100ms latency** and automatic scaling.

The Logstash `http` output replaces the need for Kafka entirely: documents go directly from Elasticsearch → Logstash → Tinybird.

## **Frequently Asked Questions (FAQs)**

### **Does ClickHouse® Cloud support Elasticsearch natively?**

ClickHouse® Cloud does not have a **native Elasticsearch connector** in ClickPipes. You use the **Kafka** data source: configure a Logstash pipeline with a Kafka output, then create a Kafka ClickPipe to load into ClickHouse® Cloud.

You operate the Elasticsearch → Logstash → Kafka path; ClickHouse® Cloud ingests from Kafka.

### **Can I use Tinybird for Elasticsearch to ClickHouse® without Kafka?**

Yes. Use the **Logstash HTTP output** to send documents directly to the **Tinybird Events API**. No Kafka cluster required.

Tinybird stores data in ClickHouse®-backed data sources and lets you publish Pipes as REST APIs. You own the Logstash pipeline; Tinybird is the destination and API layer.

### **How do I handle document deduplication in ClickHouse® integration Elasticsearch?**

Use the Elasticsearch **`_id`** field as your deduplication key in ClickHouse®. Configure **ReplacingMergeTree** where the version column is derived from a monotonic timestamp or update counter.

With Logstash at-least-once delivery, the same document may arrive more than once. `ReplacingMergeTree` collapses duplicates during background merges. Use `FINAL` at query time for strong consistency.

### **What is the difference between Logstash streaming and scroll API batch sync?**

**Logstash streaming** sends documents as they are indexed in Elasticsearch—near real-time (seconds latency). This keeps ClickHouse® data fresh but adds continuous load to Elasticsearch.

**Scroll API batch sync** exports documents on a schedule (e.g. hourly or daily). Latency equals your schedule interval. Use batch sync for one-time migrations or when freshness requirements are relaxed.

### **Is a ClickHouse® integration Elasticsearch good for long-retention log analytics?**

Yes—this is one of the primary motivations. Elasticsearch is expensive for long-retention: storage and compute scale linearly with data volume, and hot-tier costs are high.

Move aged logs to ClickHouse® where columnar compression reduces storage by **10x** and analytical query costs drop significantly. Keep Elasticsearch for recent logs (hot tier, search) and ClickHouse® for historical analytics and reporting.

### **How does ClickHouse® compare to Elasticsearch for aggregation queries?**

ClickHouse® significantly outperforms Elasticsearch for analytical aggregations. Columnar storage means queries only read the columns they need. Elasticsearch's document-oriented inverted index reads full documents and aggregates at query time.

For `GROUP BY` aggregations, time-series rollups, and large-range scans, ClickHouse® typically runs **10x–100x faster** than Elasticsearch at equivalent data volumes.
