Developers choosing between ClickHouse® and Elasticsearch often assume they're picking between two databases with overlapping capabilities. The reality is more nuanced: ClickHouse® excels at analytical queries over structured data, while Elasticsearch specializes in full-text search and log exploration.
This article explains what each system does well, where they struggle, and whether ClickHouse® can replace Elasticsearch for search workloads. You'll learn how their architectures differ, when to use one versus the other, and how to integrate both systems when you need specialized capabilities from each.
What each database is built to solve
ClickHouse® is a columnar database built for analytical processing (OLAP), real-time analytics, and data warehousing. Elasticsearch is a search engine built on Apache Lucene for full-text search, log analysis, and document exploration.
The core difference comes down to what each system optimizes for. ClickHouse® stores data in columns and excels at aggregating billions of rows quickly. Elasticsearch uses an inverted index that maps words to documents, making text search and relevance ranking fast.
Real-time analytics workloads
ClickHouse® handles queries like GROUP BY aggregations, time-series analysis, and dashboard metrics by reading only the columns you need. When you run a query that sums revenue by region across 10 billion rows, ClickHouse® skips the columns it doesn't need, which reduces I/O and speeds up the query.
Columnar storage also compresses better because similar data types stored together compress more efficiently than mixed row data. This means you store more data in less space and scan through it faster.
Log and document search workloads
Elasticsearch specializes in finding text patterns and ranking results by relevance. The inverted index maps every word to the documents containing it, so searching for "error" across millions of log entries happens in milliseconds.
Beyond search, Elasticsearch handles log aggregation and exploratory queries where you're filtering semi-structured JSON documents. Tools like Kibana connect directly to Elasticsearch for visualization and exploration.
How data is stored and indexed
ClickHouse® and Elasticsearch organize data differently, which determines what each does well and where it struggles.
| Feature | ClickHouse® | Elasticsearch |
|---|---|---|
| Storage model | Columnar segments | Document-oriented with inverted index |
| Index type | Sparse primary key | Inverted index per field |
| Compression | High (10x-100x) | Moderate (inverted index overhead) |
| Write pattern | Batch-optimized | Near real-time indexing |
Columnar segments and sparse indexes in ClickHouse®
ClickHouse® stores each column separately in compressed segments called granules. When you query specific columns, ClickHouse® only reads those columns from disk rather than entire rows.
The sparse primary key index stores one entry per granule (typically 8,192 rows) instead of indexing every row. This keeps the index small enough to fit in memory while still providing fast range scans for analytical queries.
Materialized views in ClickHouse® pre-aggregate data at write time, turning expensive GROUP BY queries into fast lookups. You define a materialized view once, and it maintains aggregations automatically as new data arrives.
Inverted index and shards in Elasticsearch
Elasticsearch builds an inverted index for each field, mapping terms to documents. This structure makes text search fast but requires more storage and processing compared to columnar formats.
Sharding distributes data across nodes, with each shard holding a subset of documents. Queries run in parallel across shards and merge results, providing horizontal scalability for both indexing and search.
Query languages and developer experience
ClickHouse® uses standard SQL. Elasticsearch uses a JSON-based Query DSL that requires learning new syntax.
SQL and materialized views in ClickHouse®
SQL in ClickHouse® works like you'd expect, with support for joins, subqueries, window functions, and aggregations. Here's a query counting events by type:
SELECT event_type, count() AS total
FROM events
WHERE timestamp >= now() - INTERVAL 1 DAY
GROUP BY event_type
ORDER BY total DESC
Materialized views pre-compute aggregations that update automatically. This turns expensive queries into fast lookups without changing your application code.
JSON DSL and pipeline tooling in Elasticsearch
Elasticsearch queries use nested JSON that can get complex quickly. Here's a basic aggregation:
{
"query": {
"range": {
"timestamp": {
"gte": "now-1d"
}
}
},
"aggs": {
"by_type": {
"terms": {
"field": "event_type"
}
}
}
}
Tools like Kibana provide visual query builders that generate the JSON for you. However, programmatic queries still require building JSON structures rather than composing SQL strings.
Performance comparison on ingest, storage, and aggregations
Both systems deliver sub-second query latency, but they excel at different workloads.
Batch and streaming ingest throughput
ClickHouse® achieves high ingest rates by batching inserts and writing compressed columnar blocks. The native protocol supports millions of rows per second on commodity hardware.
Elasticsearch indexes documents individually or in small batches through its REST API. The indexing process builds inverted indexes in near real-time, which adds overhead but makes data searchable within seconds.
Compression and storage footprint
Columnar compression in ClickHouse® typically achieves 10x to 100x compression depending on data types. Storing integers, dates, and low-cardinality strings together compresses very efficiently.
Elasticsearch stores the original document plus inverted indexes for each field. The inverted index overhead means Elasticsearch typically uses 12x to 19x more disk space than ClickHouse® for the same raw data.
Aggregation latency at high cardinality
ClickHouse® handles high-cardinality GROUP BY queries by reading only needed columns and using vectorized execution. Queries aggregating billions of rows often complete in under a second.
Elasticsearch aggregations work well for moderate cardinality but slow down when grouping by high-cardinality fields. Memory pressure increases as Elasticsearch builds aggregation buckets.
Can ClickHouse® do full-text search and relevance ranking?
ClickHouse® provides basic text matching but lacks the relevance scoring and linguistic features that Elasticsearch offers. You can search for text patterns, but ClickHouse® won’t rank results by relevance automatically.
The architecture explains this limitation. ClickHouse® optimizes for scanning and aggregating columns, not for maintaining inverted indexes that map terms to documents efficiently.
Tokenization and n-gram index options
ClickHouse® offers tokenbf_v1 and ngrambf_v1 bloom filter indexes for basic text matching. These speed up LIKE and hasToken() queries by filtering out granules that don’t contain the search terms:
CREATE TABLE logs (
message String,
INDEX message_tokens message TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY timestamp;
However, these indexes don’t provide ranking or relevance scoring. They simply speed up filtering by reducing the granules ClickHouse® reads from disk, though text search at scale is still achievable with the right approach.
Rank functions and limit by for scoring
You can implement basic scoring using string functions like position() to find term locations or countMatches() to count occurrences:
SELECT message, position(message, 'error') AS match_position
FROM logs
WHERE message LIKE '%error%'
ORDER BY match_position
LIMIT 100
This differs fundamentally from BM25 and other information retrieval algorithms that Elasticsearch uses. ClickHouse® finds text matches but won’t automatically rank results by relevance, term frequency, or document importance.
OpenSearch ClickHouse® integration options
Many teams run both systems together, using ClickHouse® for analytics and Elasticsearch for search.
Kafka or connector pipelines
Kafka acts as a buffer between systems, with producers writing events once and multiple consumers reading for different purposes. Both ClickHouse® and Elasticsearch consume from the same Kafka topics:
CREATE TABLE events_queue (
event_id String,
user_id String,
timestamp DateTime
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'events',
kafka_group_name = 'clickhouse_consumer',
kafka_format = 'JSONEachRow';
Connector frameworks like Airbyte or custom scripts can also sync data between systems. You might load raw events into ClickHouse®, run aggregations, then push summary statistics to Elasticsearch for visualization.
Cross-engine dictionary lookups
ClickHouse® dictionaries can query external data sources including HTTP endpoints. You could use this to enrich ClickHouse® queries with data stored in Elasticsearch, though performance depends heavily on network latency.
This pattern works better for small, slowly-changing reference data than for large-scale joins. The dictionary cache helps, but frequent lookups to Elasticsearch can become a bottleneck.
ClickHouse® OpenSearch compatibility considerations
Migrating between ClickHouse® and Elasticsearch requires careful mapping of data types and query patterns. Trip.com’s migration achieved 4x to 30x faster query performance after moving from Elasticsearch to ClickHouse®.
Field mapping and type conversion
ClickHouse® uses strict typing with explicit conversion functions. Elasticsearch infers types from JSON documents and handles some conversions automatically. A String in ClickHouse® maps to text or keyword in Elasticsearch depending on whether you need full-text search or exact matching.
Nested JSON structures work differently too. ClickHouse® flattens nested objects into separate columns or uses the Nested data type for arrays of objects. Elasticsearch stores nested documents and queries them with special nested query syntax.
Refresh intervals and consistency
ClickHouse® writes are visible immediately within the same connection but might take seconds to appear in other connections. The MergeTree family of engines merges data parts in the background, which is transparent to queries.
Elasticsearch uses a refresh interval (default 1 second) before new documents become searchable. You can force a refresh for immediate visibility, but this impacts indexing throughput.
Operational cost and scaling differences
Infrastructure requirements vary significantly between ClickHouse® and Elasticsearch.
Hardware efficiency and disk usage
ClickHouse® typically requires less memory and disk space for analytical workloads due to columnar compression. A dataset using 1 TB in ClickHouse® might need 3–5 TB in Elasticsearch because of inverted index overhead.
Memory requirements also differ. ClickHouse® can process queries larger than available RAM by streaming data from disk. Elasticsearch relies heavily on heap memory for aggregations and caching, often requiring more expensive high-memory instances.
Cluster management overhead
ClickHouse® clusters use a shared-nothing architecture where each node stores a complete copy of its data shards. Replication happens at the table level, and you manage cluster topology through configuration files or ClickHouse® Keeper.
Elasticsearch handles cluster management automatically with master nodes coordinating shard allocation and rebalancing. This automation helps but adds complexity, especially when nodes fail or network partitions occur.
When to choose ClickHouse®, Elasticsearch, or both
The choice depends on your primary workload and whether you need specialized features from each system.
Choose ClickHouse® when:
- Your queries aggregate or filter structured data more than searching text
- You need to store large volumes of time-series or event data cost-effectively
- Sub-second analytical queries on billions of rows matter more than text search
- Your team prefers SQL over JSON query syntax
Choose Elasticsearch when:
- Full-text search with relevance ranking is a core requirement
- You're building a log analysis or observability platform
- Document-oriented data with flexible schemas fits your use case
- You need the Elastic Stack ecosystem (Kibana, Logstash, Beats)
Use both when:
- You need both analytical aggregations and full-text search
- Different teams have different query patterns (analytics vs search)
- You can justify the operational overhead of running two systems
Single-stack analytics and search
ClickHouse® can handle basic text matching for applications where search is secondary to analytics. If you're building an internal dashboard that occasionally filters by text but mostly aggregates metrics, ClickHouse® alone might work.
The tokenbf_v1 and ngrambf_v1 indexes provide acceptable performance for simple text filters. You won't get relevance ranking, but for many internal tools, exact matching or simple pattern matching is enough.
Split-stack with ETL offload
Running both systems makes sense when search and analytics are equally important. You can stream events to both ClickHouse® and Elasticsearch from Kafka, or use ClickHouse® for raw data storage and push aggregated results to Elasticsearch for visualization.
This architecture separates concerns but requires coordination. Schema changes, data quality issues, and synchronization delays all become operational considerations when managing two systems.
Ship search-grade analytics faster with Tinybird
Tinybird provides a managed ClickHouse® platform that eliminates infrastructure setup and cluster management. Developers can focus on writing SQL and building features rather than tuning ClickHouse® configurations or managing DevOps.
The platform includes streaming ingestion from sources like Kafka, data source versioning, and automatically generated REST APIs from SQL queries. This means you define a ClickHouse® table, write a query, and get a production-ready API endpoint in minutes. Sign up for a free Tinybird plan to try ClickHouse® without the infrastructure work.
FAQs about ClickHouse® and Elasticsearch
How does ClickHouse® relevance scoring compare to BM25?
ClickHouse® lacks built-in relevance algorithms like BM25 that Elasticsearch uses for ranking search results. You can implement basic scoring with string functions, but it won't match Elasticsearch's text ranking capabilities.
Does ClickHouse® support geo-search with polygons?
ClickHouse® provides basic geographic functions for points and simple shapes, but lacks the complex polygon search and geo-aggregation features that Elasticsearch offers through its geo-spatial data types.
What security features differ between the two engines?
Both systems support user authentication and SSL encryption, but Elasticsearch provides more granular field-level security and document-level permissions that ClickHouse® doesn't offer natively.
/
