---
title: "9 ClickHouse® Features Most Teams Don't Know They're Missing"
excerpt: "Uncover 9 hidden ClickHouse® features that boost query speed and simplify analytics far beyond basic OLAP capabilities."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-01-15 00:00:00"
publishedOn: "2026-01-15 00:00:00"
updatedOn: "2026-01-15 00:00:00"
status: "published"
---

**These are the ClickHouse® features and capabilities that extend beyond basic OLAP queries:**

1. **Projections (automatic query acceleration with alternative layouts)**  
2. **Materialized Views (incremental and refreshable pre-aggregations)**  
3. **Dictionaries (in-memory key-value lookups replacing expensive joins)**  
4. **Lightweight Deletes (fast logical deletion without rewriting parts)**  
5. **ReplacingMergeTree (upsert patterns through merge-time deduplication)**  
6. **Data Skipping Indexes (secondary indexes for selective queries)**  
7. **ClickHouse® Cloud (managed operations eliminating cluster administration)**  
8. **ClickHouse® Keeper (coordination for replication and distributed DDL)**  
9. **Tinybird: Real-Time Analytics APIs Platform Built on ClickHouse®**

ClickHouse® is a columnar OLAP [database](https://www.oracle.com/database/what-is-database/) delivering exceptional performance for analytical queries through sparse primary indexes, vectorized execution, and MergeTree storage optimized for scans and aggregations.

It excels at raw query speed on append-only data. For many teams, it's also **missing capabilities when used as complete analytics backend**—natural upserts, complex join optimization, operational simplicity, and platform features beyond database engine.

Here's what actually happens: You choose ClickHouse® because you need **fast analytical queries** on large datasets. Query performance is excellent—sub-second aggregations over billions of rows with proper ORDER BY optimization.

So you deploy ClickHouse® for production analytics. Design physical layouts carefully with ORDER BY selection for sparse primary index efficiency. Build [real-time data ingestion](https://www.tinybird.co/blog/real-time-data-ingestion) pipelines handling streaming data from Kafka. Denormalize tables into wide structures avoiding expensive joins. Manage cluster operations—replication, backups, upgrades, monitoring.

Six months later, you have **excellent query performance** on properly modeled data. You also discover that **ClickHouse® has more features than you're using**—capabilities solving common pain points without leaving ClickHouse® ecosystem:

**Projections** provide alternative physical layouts or pre-aggregations chosen automatically by optimizer—accelerating queries without manual rewriting.

**Materialized views** shift aggregation work from query time to insert time—incremental updates for streaming or scheduled refreshes for batch.

**Dictionaries** replace expensive dimension joins with in-memory O(1) lookups—dramatically faster enrichment without join complexity.

**Lightweight deletes** enable fast logical deletion visible immediately—compliance requirements met without mutation rewrite costs.

**ReplacingMergeTree** handles CDC and upserts naturally—latest record wins during merges without manual deduplication logic.

**Data skipping indexes** accelerate selective queries on non-primary-key columns—filtering efficiency beyond sparse primary index.

**ClickHouse® Cloud** eliminates operational overhead—managed backups, upgrades, scaling, and high availability.

**ClickHouse® Keeper** provides coordination infrastructure—replication and distributed DDL without ZooKeeper dependency.

Someone asks: "Can ClickHouse® handle upserts?" or "How do we accelerate this query without changing ORDER BY?" The answer reveals what ClickHouse® actually provides—**extensive feature set** beyond basic columnar storage that most teams underutilize.

**The uncomfortable reality: most teams seeking "ClickHouse® with more features" don't need different databases—they need to discover and use ClickHouse®'s built-in capabilities solving their exact problems.**

This article explores ClickHouse® features extending core OLAP capabilities—when projections accelerate queries transparently, when dictionaries replace joins efficiently, when materialized views optimize serving patterns, and when operational features simplify production deployments.

## **1\. Projections: Automatic Query Acceleration with Alternative Layouts**

**Projections** provide ClickHouse®'s **most underutilized acceleration feature**—storing data in alternative physical layouts or pre-aggregated forms chosen automatically by the optimizer.

### **What projections provide**

Projections deliver **transparent query optimization** without SQL rewriting:

**Normal projections** store same data with different ORDER BY—optimizer selects layout matching query filters and sorting automatically.

**Aggregate projections** pre-compute aggregations (SUM, COUNT, AVG, GROUP BY)—queries reading aggregates hit pre-computed results instead of scanning raw data.

**Automatic selection** by query optimizer—no query rewriting required when projection matches query pattern.

**Multiple projections per table**—define several alternative layouts or aggregations serving different query patterns.

**Inspectable through system tables**—`system.projections` shows defined projections and their configurations.

### **How to use projections effectively**

Define projections for **specific query patterns** causing performance issues:

\-- Normal projection with alternative ORDER BY  
ALTER TABLE events  
ADD PROJECTION p_user_time (  
SELECT \*  
ORDER BY user_id, timestamp  
);

\-- Aggregate projection for common dashboard query  
ALTER TABLE events  
ADD PROJECTION p_daily_counts (  
SELECT  
toDate(timestamp) AS day,  
event_type,  
count() AS cnt  
GROUP BY day, event_type  
);

**Build projections** after defining—initial population processes existing data:

ALTER TABLE events MATERIALIZE PROJECTION p_daily_counts;

### **The storage trade-off**

Projections provide **query acceleration** with **storage costs**:

**Duplicate storage**—each projection stores data additionally (normal projections) or aggregated subset (aggregate projections).

**Merge overhead**—projections rebuild during background merges maintaining consistency with base data.

**Design carefully**—too many projections multiply storage and merge costs without proportional query benefits.

**Monitor usage**—track which projections optimizer actually uses via query logs and `system.query_log`.

### **When projections make sense**

Use projections when:

* **Specific query patterns** execute frequently requiring different ORDER BY than primary key  
* **Pre-aggregated dashboards**—same aggregations queried repeatedly justifying pre-computation  
* **Query rewriting infeasible**—optimizer choosing layout automatically simpler than application changes  
* **Storage acceptable**—additional space cost justified by query performance gains

Projections solve query acceleration transparently. They're ClickHouse®'s native alternative to manual denormalization or external caching.

## **2\. Materialized Views: Incremental and Refreshable Pre-Aggregations**

**Materialized views** in ClickHouse® provide **two distinct patterns**—incremental (trigger-based) and refreshable (scheduled)—for shifting computation from query time to insert/refresh time.

### **What materialized views provide**

ClickHouse® materialized views deliver **pre-computation** with different semantics than traditional databases:

**Incremental materialized views** apply query transformations to each inserted block—function as insert triggers writing to destination tables.

**Refreshable materialized views** execute query periodically on complete dataset—scheduled full recomputation like traditional database MVs.

**Destination table control**—incremental MVs write to explicitly defined tables you query directly.

**Aggregation at insert time**—incremental MVs shift computation from reads (query time) to writes (insert time).

**Scheduled refresh**—refreshable MVs execute on configurable intervals recomputing from source data.

### **How to use incremental materialized views**

Define MV as **insert trigger** writing aggregated data:

\-- Create destination table for aggregated data  
CREATE TABLE events_hourly (  
hour DateTime,  
event_type String,  
count UInt64  
) ENGINE \= SummingMergeTree()  
ORDER BY (hour, event_type);

\-- Incremental MV applying to each insert  
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly  
AS SELECT  
toStartOfHour(timestamp) AS hour,  
event_type,  
count() AS count  
FROM events  
GROUP BY hour, event_type;

**Incremental updates**—each insert to `events` triggers MV query writing aggregates to `events_hourly`.

### **How to use refreshable materialized views**

Define MV with **scheduled refresh** for full recomputation:

CREATE MATERIALIZED VIEW dashboard_summary  
REFRESH EVERY 1 HOUR  
AS SELECT  
toDate(timestamp) AS day,  
user_id,  
count() AS events,  
sum(revenue) AS total_revenue  
FROM events  
WHERE timestamp \>= today() \- INTERVAL 30 DAY  
GROUP BY day, user_id;

**Periodic execution**—MV refreshes every hour recomputing from source table completely.

### **The incremental versus refreshable choice**

Choose based on **update pattern** and **data characteristics**:

**Incremental MVs** for streaming data—continuous updates as events arrive; query destination table directly for pre-aggregated results.

**Refreshable MVs** for expensive queries—full recomputation acceptable on schedule (hourly, daily); simpler than incremental logic for complex transformations.

**Incremental trade-offs**—faster queries (pre-aggregated) but more insert overhead and destination table management.

**Refreshable trade-offs**—simple scheduled refresh but full recomputation cost and stale data between refreshes.

### **When materialized views make sense**

Use materialized views when:

* **Repeated aggregations**—same GROUP BY queries executed frequently across [real-time dashboards](https://www.tinybird.co/blog/real-time-dashboards-are-they-worth-it) or API endpoints  
* **Streaming ingestion**—incremental MVs update aggregates continuously as events arrive  
* **Expensive queries**—refreshable MVs precompute complex joins or window functions on schedules  
* **Query latency critical**—pre-computation eliminates aggregation cost at query time

Materialized views solve pre-aggregation natively in ClickHouse®. They're essential for serving dashboards efficiently.

## **3\. Dictionaries: In-Memory Key-Value Lookups Replacing Expensive Joins**

**Dictionaries** provide ClickHouse®'s **dimension enrichment optimization**—in-memory key-value structures replacing joins with O(1) lookups.

### **What dictionaries provide**

Dictionaries deliver **fast dimensional lookups** without join execution:

**In-memory key-value storage**—dimensions loaded into memory with primary key access patterns.

**Multiple source types**—load from ClickHouse® tables, HTTP endpoints, external databases (MySQL, PostgreSQL), files, or executable scripts.

**Automatic refresh**—configurable update intervals keeping dictionaries synchronized with source data.

**Dictionary functions**—`dictGet`, `dictGetOrDefault` replacing JOIN syntax with direct lookups.

**Layout types**—flat, hashed, complex_key_hashed, cache optimizing different cardinality and access patterns.

**Inspectable metadata**—`system.dictionaries` shows dictionary status, source, and refresh timing.

### **How to use dictionaries effectively**

Define dictionary for **dimension enrichment**:

CREATE DICTIONARY user_attributes (  
user_id UInt64,  
country String,  
plan String,  
signup_date Date  
)  
PRIMARY KEY user_id  
SOURCE(CLICKHOUSE(  
HOST 'localhost'  
PORT 9000  
USER 'default'  
TABLE 'users'  
DB 'analytics'  
))  
LAYOUT(HASHED())  
LIFETIME(MIN 300 MAX 360);

**Use dictionary functions** instead of JOINs:

\-- Instead of expensive JOIN  
SELECT  
user_id,  
dictGet('user_attributes', 'country', user_id) AS country,  
dictGet('user_attributes', 'plan', user_id) AS plan,  
count() AS events  
FROM events  
GROUP BY user_id;

### **The performance advantage**

Dictionaries provide **dramatic speedup** over joins for dimension enrichment:

**O(1) lookup time**—hash-based access versus join execution with data shuffling.

**Memory locality**—dictionary data loaded in memory versus disk reads for join tables.

**No join overhead**—eliminates join algorithm selection, hash table building, and result materialization.

**Concurrent query efficiency**—shared dictionary in memory versus per-query join execution.

### **The dictionary trade-offs**

Dictionaries optimize **specific patterns** with **constraints**:

**Memory requirements**—dictionaries consume memory proportional to dimension table size.

**Eventual consistency**—refresh intervals mean dictionary lags source; choose LIFETIME balancing freshness versus reload overhead.

**Lookup-only semantics**—dictionaries don't support complex join conditions beyond primary key equality.

**Design for static/slowly-changing**—best for dimensions changing infrequently (countries, products, user profiles) not rapidly mutating data.

### **When dictionaries make sense**

Use dictionaries when:

* **Dimension enrichment** dominates query patterns—adding user attributes, product details, geography to fact events  
* **Small to medium dimensions**—tables fitting comfortably in memory (millions of rows acceptable)  
* **Lookup by primary key**—equality conditions on single key or composite keys  
* **Read-heavy workloads**—dimensions queried frequently but updated infrequently

Dictionaries solve dimensional joins efficiently. They're essential for operational analytics with enrichment.

## **4\. Lightweight Deletes: Fast Logical Deletion Without Rewriting Parts**

**Lightweight DELETE** provides ClickHouse®'s **fast deletion mechanism**—marking rows deleted with immediate query visibility without rewriting data parts.

### **What lightweight DELETE provides**

Lightweight DELETE delivers **operational deletion** with different semantics than mutations:

**Immediate visibility**—deleted rows filtered from queries instantly without waiting for part rewrites.

**Logical deletion**—marks rows deleted in metadata; physical cleanup happens asynchronously during merges.

**Low write amplification**—avoids rewriting entire data parts immediately reducing I/O and latency.

**Simplified compliance**—fast deletion for GDPR/CCPA requirements without mutation complexity.

**Mask-based filtering**—uses deletion masks checked during query execution filtering marked rows.

### **How to use lightweight DELETE**

Execute **standard DELETE syntax** with lightweight semantics:

\-- Lightweight DELETE for compliance  
DELETE FROM events  
WHERE user_id \= 12345;

\-- Lightweight DELETE with complex conditions  
DELETE FROM events  
WHERE event_date \< today() \- INTERVAL 90 DAY  
AND event_type IN ('debug', 'test');

**Enable lightweight deletes** per table or session:

\-- Enable for specific table (if not default)  
ALTER TABLE events  
MODIFY SETTING enable_lightweight_delete \= 1;

\-- Or set in session  
SET lightweight_deletes \= 1;

### **The lightweight versus mutation choice**

Choose deletion strategy based on **requirements**:

**Lightweight DELETE when:**

* Immediate deletion visibility required (compliance, user requests)  
* Deletion patterns are selective (specific users, date ranges)  
* Physical cleanup delay acceptable (deleted data persists on disk until merges)  
* Write amplification minimization matters

**Mutations (ALTER TABLE ... DELETE) when:**

* Physical deletion required immediately (security, disk space)  
* Deleting large portions of table where rewrite cost acceptable  
* Batch deletion patterns during maintenance windows

### **The trade-off understanding**

Lightweight DELETE provides **fast deletion** with **deferred cleanup**:

**Query overhead**—checking deletion masks adds small cost to query execution.

**Storage persistence**—deleted rows occupy disk space until background merges eliminate them physically.

**Merge dependency**—physical cleanup happens during merges; may require triggering merges explicitly for immediate space reclamation.

**Monitoring cleanup**—track merge progress and disk space to verify physical deletion completion.

### **When lightweight DELETE makes sense**

Use lightweight DELETE when:

* **Compliance requirements**—GDPR right to erasure, CCPA deletion requests requiring fast response  
* **Selective deletion**—removing specific users, sessions, or date ranges  
* **Operational corrections**—fixing data quality issues quickly  
* **Write amplification concerns**—minimizing I/O impact of deletion operations

Lightweight DELETE solves fast deletion requirements. It's ClickHouse®'s answer to operational deletion needs.

## **5\. ReplacingMergeTree: Upsert Patterns Through Merge-Time Deduplication**

**ReplacingMergeTree** provides ClickHouse®'s **upsert capability**—automatically deduplicating rows during background merges keeping latest version.

### **What ReplacingMergeTree provides**

ReplacingMergeTree delivers **CDC and upsert patterns** without manual deduplication:

**Merge-time deduplication**—background merges replace duplicate rows (same ORDER BY key) keeping latest version.

**Version column**—optional column determining which row is "latest" (highest value wins during deduplication).

**Sign column**—optional \-1/+1 pattern for deletions (VersionedCollapsingMergeTree variant).

**Eventual consistency**—deduplication happens during merges not immediately on insert.

**CDC compatibility**—natural fit for change data capture streams with updates and deletes.

**No explicit upsert syntax**—insert all events; merges handle deduplication automatically.

### **How to use ReplacingMergeTree effectively**

Define table with **deduplication strategy**:

CREATE TABLE user_profiles (  
user_id UInt64,  
email String,  
plan String,  
updated_at DateTime,  
\-- other columns  
)  
ENGINE \= ReplacingMergeTree(updated_at)  
ORDER BY user_id;

**Insert updates freely**—merges deduplicate keeping row with highest `updated_at`:

\-- Initial insert  
INSERT INTO user_profiles VALUES (123, '[user@example.com](mailto:user@example.com)', 'free', '2025-01-01 10:00:00');

\-- Update (insert new version)  
INSERT INTO user_profiles VALUES (123, '[user@example.com](mailto:user@example.com)', 'pro', '2025-01-14 15:30:00');

\-- Merges eventually keep only latest version per user_id

### **The eventual consistency reality**

ReplacingMergeTree provides **deduplication eventually** with **query considerations**:

**Immediate queries may see duplicates**—merges run asynchronously; FINAL modifier forces deduplication at query time.

**OPTIMIZE TABLE**—manually trigger merges for immediate deduplication:

OPTIMIZE TABLE user_profiles FINAL;

**FINAL modifier**—forces deduplication during query execution (slower queries, guaranteed latest):

SELECT \* FROM user_profiles FINAL WHERE user_id \= 123;

**Application logic awareness**—design applications tolerating duplicates or using FINAL when correctness critical.

### **The merge-time deduplication trade-offs**

ReplacingMergeTree optimizes **insert throughput** with **query complexity**:

**Fast inserts**—no deduplication overhead at insert time; write all updates directly.

**Query cost**—reading without FINAL may return duplicates; FINAL adds deduplication overhead.

**Storage until merge**—duplicates consume space until merges eliminate them.

**Merge monitoring**—track merge progress ensuring deduplication happens timely.

### **When ReplacingMergeTree makes sense**

Use ReplacingMergeTree when:

* **CDC streams—**consuming database change events with updates and deletes that propagate efficiently to each [downstream system](https://medium.com/@ogunodabas/downstream-upstream-system-c1dc6cf4b59e)  
* **Mutable dimensions**—user profiles, product catalogs, configuration data changing over time  
* **Insert performance priority**—high-throughput ingestion more important than immediate deduplication  
* **Eventual consistency acceptable**—application logic handles duplicates or uses FINAL when needed

ReplacingMergeTree solves upsert patterns natively. It's ClickHouse®'s CDC-friendly engine choice.

## **6\. Data Skipping Indexes: Secondary Indexes for Selective Queries**

**Data skipping indexes** provide ClickHouse®'s **secondary indexing**—accelerating queries on non-primary-key columns through granule-level filtering.

### **What data skipping indexes provide**

Data skipping indexes deliver **selective query acceleration** beyond sparse primary index:

**Granule-level filtering**—indexes enable skipping entire granules (8192 rows default) when conditions don't match.

**Multiple index types**—minmax, set, bloom_filter, ngrambf_v1, tokenbf_v1 optimizing different query patterns.

**Column subset indexing**—indexes on specific columns used in WHERE clauses frequently.

**Automatic index usage**—optimizer applies indexes when beneficial without query hints.

**Low storage overhead**—indexes store aggregate information per granule not per row.

### **How to define data skipping indexes**

Create indexes for **selective filter patterns**:

\-- MinMax index for range queries  
ALTER TABLE events  
ADD INDEX idx_user_id user_id TYPE minmax GRANULARITY 4;

\-- Bloom filter for equality checks  
ALTER TABLE events  
ADD INDEX idx_event_type event_type TYPE bloom_filter GRANULARITY 4;

\-- Set index for IN clauses with limited cardinality  
ALTER TABLE events  
ADD INDEX idx_country country TYPE set(100) GRANULARITY 4;

\-- NGram index for LIKE queries  
ALTER TABLE logs  
ADD INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0\) GRANULARITY 4;

**Materialize indexes** for existing data:

ALTER TABLE events MATERIALIZE INDEX idx_user_id;

### **The index type selection**

Choose index types based on **query patterns**:

**minmax**—range queries on numeric or date columns (\>, \<, BETWEEN).

**set(N)**—IN clauses or equality checks with limited distinct values (cardinality \< N).

**bloom_filter**—equality checks (=) or IN with higher cardinality; probabilistic filtering.

**ngrambf_v1**—LIKE queries or substring matching; useful for text search.

**tokenbf_v1**—token-based matching for analyzing text by tokens.

### **The skipping index trade-offs**

Data skipping indexes provide **query acceleration** with **storage and maintenance costs**:

**Index storage**—additional space per granule for index data structures.

**Insert overhead**—indexes update during inserts adding processing time.

**GRANULARITY parameter**—controls index density (higher \= less storage but less precise skipping).

**Effectiveness varies**—indexes help selective queries; full scans don't benefit and pay index overhead.

**Monitor effectiveness**—check `system.query_log` and EXPLAIN to verify index usage.

### **When data skipping indexes make sense**

Use data skipping indexes when:

* **Selective queries** filtering on non-primary-key columns frequently  
* **Known filter patterns**—specific columns in WHERE clauses repeatedly  
* **High-cardinality columns**—primary key doesn't provide sufficient pruning  
* **Range or equality searches**—patterns matching index types (minmax, bloom_filter, set)

Data skipping indexes solve selective query acceleration. They complement sparse primary indexes for specific patterns.

## **7\. ClickHouse® Cloud: Managed Operations Eliminating Cluster Administration**

ClickHouse® Cloud provides managed ClickHouse® service—automated operations eliminating self-managed cluster complexity through modern [cloud computing](https://www.ibm.com/think/topics/cloud-computing) infrastructure.

### **What ClickHouse® Cloud provides**

ClickHouse® Cloud delivers **operational simplicity** through managed infrastructure:

**Automatic backups**—scheduled backups with point-in-time restore capabilities included by default.

**Regular upgrades**—platform handles ClickHouse® version updates without manual coordination.

**Auto-scaling**—serverless compute adjusting resources automatically based on workload.

**High availability**—built-in replication and failover without manual configuration.

**SharedMergeTree architecture**—compute-storage separation with object storage backend.

**Private connectivity**—AWS PrivateLink, GCP Private Service Connect, Azure Private Link support.

**Monitoring dashboards**—integrated observability without deploying separate monitoring infrastructure.

**Support and SLAs**—vendor support with service level agreements for production workloads.

### **The self-managed versus Cloud differences**

ClickHouse® Cloud automates **operations self-managed requires building**:

**Self-managed challenges:**

* Manual backup strategies and recovery procedures  
* Coordinating cluster upgrades across replicas  
* Capacity planning and scaling coordination  
* Replication configuration with Keeper or ZooKeeper  
* Monitoring infrastructure deployment and maintenance  
* Security patching and operational maintenance

**Cloud automation:**

* Backups automatic with UI-based restore  
* Upgrades managed by platform  
* Serverless scaling without capacity planning  
* Replication configured automatically  
* Monitoring included in platform  
* Security patches handled transparently

### **The Cloud trade-offs**

ClickHouse® Cloud provides **operational simplicity** with **cost and control considerations**:

**Consumption-based pricing**—compute and storage charged separately based on usage.

**Less infrastructure control**—platform manages configuration; customization limited compared to self-managed.

**Vendor dependency**—migration from Cloud requires recreating infrastructure elsewhere.

**Network egress costs**—data transfer charges for queries from external applications.

**Regional availability**—service availability depends on Cloud provider regions.

### **When ClickHouse® Cloud makes sense**

Choose ClickHouse® Cloud when:

* **Operational burden**—managing ClickHouse® clusters outweighs benefits of infrastructure control  
* **Team expertise**—SQL strength exceeds distributed database operations knowledge  
* **Rapid deployment**—production ClickHouse® needed quickly without infrastructure setup  
* **Variable workloads**—serverless scaling handles unpredictable traffic automatically  
* **Managed operations value**—backups, upgrades, monitoring, support justify platform costs

ClickHouse® Cloud solves operational complexity. It's managed ClickHouse® for teams prioritizing applications over infrastructure.

## **8\. ClickHouse® Keeper: Coordination for Replication and Distributed DDL**

**ClickHouse® Keeper** provides **coordination infrastructure**—ZooKeeper-compatible service managing replication and distributed operations.

### **What ClickHouse® Keeper provides**

ClickHouse® Keeper delivers **distributed coordination** for ClickHouse® clusters:

**Replication coordination**—ReplicatedMergeTree engines use Keeper for synchronizing data parts across replicas.

**Distributed DDL**—schema changes coordinated across cluster nodes through Keeper.

**Insert deduplication**—Keeper stores insert block hashes preventing duplicate inserts within configurable window.

**ZooKeeper compatibility**—drop-in replacement for ZooKeeper with compatible protocol.

**Simpler operations**—single-purpose service versus general-purpose ZooKeeper reducing operational complexity.

**RAFT consensus**—built on RAFT algorithm providing fault tolerance and leader election.

**Lightweight**—smaller memory footprint than ZooKeeper for ClickHouse®-specific workloads.

### **How ClickHouse® Keeper works**

Keeper provides **metadata storage and coordination**:

**Replication metadata**—ReplicatedMergeTree stores part checksums, merge operations, and mutation tracking in Keeper.

**Leader election**—replicas coordinate which performs merges and mutations through Keeper.

**Insert deduplication**—hash sums of inserted blocks stored in Keeper preventing duplicates within retention window.

**DDL queue**—distributed DDL statements queued in Keeper ensuring consistent schema across replicas.

**Snapshot and logs**—Keeper maintains state through snapshots and operation logs for recovery.

### **The Keeper versus ZooKeeper choice**

ClickHouse® Keeper provides **ClickHouse®-optimized coordination** versus general-purpose ZooKeeper:

**Keeper advantages:**

* Optimized for ClickHouse® access patterns  
* Lower memory usage for typical workloads  
* Simpler deployment (single binary with ClickHouse®)  
* Faster recovery and snapshot operations  
* ClickHouse®-native monitoring integration

**When ZooKeeper still makes sense:**

* Existing ZooKeeper infrastructure serving multiple systems  
* Organizational standards requiring ZooKeeper  
* Complex multi-tenant coordination beyond ClickHouse®

### **The replication deduplication mechanism**

Keeper enables **exactly-once insert semantics** through hash tracking:

**Insert deduplication window**—configurable retention (default minutes to hours) for tracking insert block hashes.

**Async inserts impact**—deduplication applies to async insert boundaries affecting retry behavior.

**Monitoring deduplication**—track duplicate detection through system metrics and logs.

**Tune window size**—balance deduplication guarantees versus Keeper memory usage.

### **When ClickHouse® Keeper matters**

Use ClickHouse® Keeper when:

* **Replicated tables**—ReplicatedMergeTree engines requiring coordination infrastructure  
* **High availability**—multi-replica deployments needing failover coordination  
* **Distributed operations**—coordinating DDL changes across cluster nodes  
* **Insert deduplication**—preventing duplicate data from retry logic or network issues  
* **Simpler than ZooKeeper**—prefer single-purpose ClickHouse®-native service

ClickHouse® Keeper solves coordination infrastructure. It's essential for production ClickHouse® clusters with replication.

## **9\. Tinybird: Real-Time Analytics APIs Platform Built on ClickHouse®**

[Tinybird](https://www.tinybird.co/) provides a development layer on top of ClickHouse® that transforms analytical queries into low-latency APIs—simplifying the construction of data-driven applications and products.

### **What Tinybird provides**

Tinybird delivers development abstractions that accelerate analytical pipeline creation:

**Pipes as SQL transformations**—define transformation logic through chained SQL; each Pipe executes one or more queries on ClickHouse® producing intermediate or final datasets.

**Automatic APIs from SQL**—publish any Pipe as HTTP REST endpoint; Tinybird generates APIs with parameters, authentication, and versioning automatically.

**Managed ingestion**—native connectors for Kafka, HTTP events, CSV/JSON files, and webhooks; backpressure handling and retry logic included.

**CLI and Git workflow**—local development with `tb` CLI; version control for Pipes, Data Sources, and configurations through Git for CI/CD integration.

**Integrated observability**—latency, throughput, and error metrics per endpoint; query traceability from API request to ClickHouse® execution.

**Copy and snapshot management**—automatic Materialized Views for optimization; fixture and branch management for development and testing.

### **How to use Tinybird effectively**

Define Data Sources mapping ClickHouse® tables or ingestion:

sql  
*\-- Data Source with explicit schema*  
SCHEMA \>  
    \`timestamp\` DateTime,  
    \`user_id\` UInt64,  
    \`event_type\` String,  
    \`properties\` String

ENGINE "MergeTree"

ENGINE_SORTING_KEY "timestamp, user_id"

Create Pipes transforming data with parameterized SQL:

sql  
*\-- Pipe: user_events_summary.pipe*  
NODE endpoint  
SQL \>  
    SELECT  
        toDate(timestamp) AS date,  
        event_type,  
        count() AS event_count  
    FROM events  
    WHERE user_id \= {{ UInt64(user_id, 0) }}  
      AND timestamp \>= {{DateTime(start_date, '2025-01-01')}}  
      AND timestamp \< {{DateTime(end_date, now())}}  
    GROUP BY date, event_type

    ORDER BY date DESC

Publish as API with single command:

bash  
tb pipe publish user_events_summary

*\# Generates: https://api.tinybird.co/v0/pipes/user_events_summary.json?user_id=123*

### **The API-first development model**

Tinybird inverts the traditional analytics → application flow:

**Traditional ClickHouse® development:**

1. Write SQL queries in ClickHouse® client  
2. Integrate queries into application code  
3. Manage connection pooling, retry logic, parameters  
4. Deploy entire application for query changes  
5. Build custom monitoring and logging

**Development with Tinybird:**

1. Write SQL Pipes with parameters  
2. Publish as API endpoint  
3. Consume from application via HTTP  
4. Update Pipes independently through Git push  
5. Observability included in platform

### **The Tinybird trade-offs**

Tinybird provides development velocity with cost and abstraction considerations:

**Additional layer**—abstraction over ClickHouse®; complex queries may require understanding both levels.

**Consumption-based pricing**—charged per GB processed and requests; can be more expensive than self-managed ClickHouse® at very high scale.

**Less direct ClickHouse® control**—advanced configurations (custom merge tree settings, distributed tables) limited versus self-managed.

**Moderate vendor lock-in**—Pipes and Data Sources portable to SQL; migration requires rebuilding workflow and APIs.

**Optimized for APIs**—excellent for serving patterns; extensive batch processing may benefit from direct ClickHouse®.

### **When Tinybird makes sense**

Use Tinybird when:

**Product analytics and dashboards**—low-latency APIs serving user-facing applications; real-time metrics.

**Data products**—building analytical features into applications (recommendations, segmentation, personalization).

**Rapid prototyping**—validating analytical ideas without infrastructure; concept to production in hours.

**Small teams**—maximizing output with limited resources; avoiding ClickHouse® infrastructure and API layer management.

**Streaming analytics**—concurrent ingestion and queries on fresh data; sub-100ms latency requirements.

**CI/CD data pipelines**—Git versioning of transformations; automated testing and deployment.

Tinybird solves the "last mile" between ClickHouse® and applications. It's the development platform for teams building products on real-time analytics, prioritizing velocity and simplicity over complete infrastructure control.

---

**Real example**: SaaS metrics dashboard with dynamic filters by customer, date, and dimensions—Tinybird generates 10+ API endpoints from SQL Pipes; frontend consumes APIs directly; logic updates through Git push without application redeployment.

## **Decision Framework: Choosing Which ClickHouse® Features to Use**

### **Start with pain points**

**Slow queries on different ORDER BY?** Projections provide alternative layouts automatically.

**Repeated aggregations?** Materialized views pre-compute results at insert or refresh time.

**Expensive dimension joins?** Dictionaries replace joins with in-memory lookups.

**Deletion requirements?** Lightweight DELETE for compliance; mutations for physical cleanup.

**CDC and upserts?** ReplacingMergeTree handles deduplication through merges.

**Selective queries on non-primary columns?** Data skipping indexes accelerate filtering.

**Operational overhead?** ClickHouse® Cloud eliminates cluster management.

**Replication needs?** ClickHouse® Keeper coordinates distributed operations.

### **Evaluate query patterns**

**Dashboard tiles with same aggregations?** Materialized views or aggregate projections pre-compute.

**Dimensional enrichment everywhere?** Dictionaries optimize lookups versus repeated joins.

**Multiple query patterns on same table?** Normal projections provide alternative ORDER BY options.

**Selective filtering frequent?** Data skipping indexes on commonly filtered columns.

### **Consider operational maturity**

**Self-managing clusters?** Keeper essential for replication; projections and indexes optimize performance.

**Limited operations expertise?** ClickHouse® Cloud simplifies infrastructure; focus on materialized views and dictionaries for query optimization.

**CDC integration required?** ReplacingMergeTree with lightweight deletes handles mutable data patterns.

**Compliance requirements?** Lightweight DELETE provides fast deletion for GDPR/CCPA.

### **Calculate implementation complexity**

**Projections**—medium complexity; define and materialize per table with storage trade-off.

**Materialized views**—medium to high; design destination tables and refresh strategies carefully.

**Dictionaries**—low to medium; straightforward for static dimensions; refresh strategy for changing data.

**Lightweight DELETE**—low complexity; standard SQL syntax with semantic understanding.

**ReplacingMergeTree**—medium complexity; requires FINAL usage decisions and merge monitoring.

**Data skipping indexes**—medium complexity; index type selection and GRANULARITY tuning.

**ClickHouse® Cloud**—low operational complexity; platform handles infrastructure.

**ClickHouse® Keeper**—medium complexity for deployment; low ongoing maintenance.

## **Frequently Asked Questions (FAQs)**

### **Do projections replace materialized views?**

**No—different use cases.** Projections provide alternative physical layouts or pre-aggregations within same table; optimizer chooses automatically. Materialized views create separate destination tables with explicit queries. Use projections for transparent acceleration; use MVs when explicit table management and complex transformations needed.

### **Can dictionaries handle millions of rows?**

**Yes—memory permitting.** Dictionaries load completely in memory; millions of rows work fine on modern servers. Monitor memory usage and choose appropriate LAYOUT (hashed, complex_key_hashed) for your cardinality. For very large dimensions, consider materialized view pre-joins instead.

### **How do lightweight deletes affect query performance?**

**Minimal overhead**—deletion masks checked during query execution add small cost. Performance impact depends on deletion volume; heavily deleted tables may show degradation until merges physically remove data. Monitor query performance and trigger merges if needed.

### **What's the difference between ReplacingMergeTree and CollapsingMergeTree?**

**ReplacingMergeTree** keeps latest row (by version column) during deduplication. **CollapsingMergeTree** uses sign column (-1/+1) collapsing pairs; useful for increment/decrement patterns. **VersionedCollapsingMergeTree** combines both—version and sign. Choose based on CDC pattern: standard updates use Replacing; increment/decrement use Collapsing.

### **Should I always use ClickHouse® Cloud instead of self-managed?**

**Depends on priorities.** ClickHouse® Cloud simplifies operations eliminating backup, upgrade, scaling complexity—choose when operational simplicity matters. Self-managed provides infrastructure control, custom configurations, potentially lower costs at scale—choose when engineering expertise available and control requirements justify operational burden.

### **How many projections can one table have?**

**Technically unlimited—practically few.** Each projection duplicates storage and adds merge overhead. Typically 2-3 projections per table maximum balancing query acceleration versus storage/maintenance costs. Monitor storage growth and merge performance.

### **Do data skipping indexes slow down inserts?**

**Yes—slightly.** Indexes add overhead during inserts building index structures. Impact minimal for most workloads but measurable at very high insert rates. Monitor insert performance and disable unused indexes if overhead unacceptable.

---

**Most teams using ClickHouse® underutilize built-in features solving common problems.**

The question isn't "what database has more features than ClickHouse®?" The question is "**which ClickHouse® features solve my specific requirements?**"

For **query acceleration**:

**Projections** provide alternative layouts and pre-aggregations chosen automatically. **Materialized views** shift computation to insert/refresh time. **Data skipping indexes** accelerate selective filtering.

For **data patterns**:

**Dictionaries** replace expensive joins with in-memory lookups. **ReplacingMergeTree** handles CDC and upserts through merge-time deduplication. **Lightweight DELETE** provides fast deletion for compliance.

For **operations**:

**ClickHouse® Cloud** eliminates cluster management through managed service. **ClickHouse® Keeper** provides coordination for replication and distributed operations.

**The right ClickHouse® features aren't always the newest or most complex—they're the ones matching your specific query patterns, data characteristics, and operational requirements.**

Start by identifying pain points: slow queries, expensive joins, operational complexity. Then evaluate which ClickHouse® features solve those specific problems. Many teams discover solutions already exist in ClickHouse® without needing different databases or extensive custom engineering.

ClickHouse® provides extensive capabilities beyond basic columnar storage. Learning and using these features effectively often delivers better results than seeking alternative databases or evaluating the [best database for real-time analytics](https://www.tinybird.co/blog/best-database-for-real-time-analytics). Understand the trade-offs—storage costs, merge overhead, operational complexity—and apply features strategically where benefits justify costs.  
