These are the ClickHouse® features and capabilities that extend beyond basic OLAP queries:
- Projections (automatic query acceleration with alternative layouts)
- Materialized Views (incremental and refreshable pre-aggregations)
- Dictionaries (in-memory key-value lookups replacing expensive joins)
- Lightweight Deletes (fast logical deletion without rewriting parts)
- ReplacingMergeTree (upsert patterns through merge-time deduplication)
- Data Skipping Indexes (secondary indexes for selective queries)
- ClickHouse® Cloud (managed operations eliminating cluster administration)
- ClickHouse® Keeper (coordination for replication and distributed DDL)
- Tinybird: Real-Time Analytics APIs Platform Built on ClickHouse®
ClickHouse® is a columnar OLAP database delivering exceptional performance for analytical queries through sparse primary indexes, vectorized execution, and MergeTree storage optimized for scans and aggregations.
It excels at raw query speed on append-only data. For many teams, it's also missing capabilities when used as complete analytics backend—natural upserts, complex join optimization, operational simplicity, and platform features beyond database engine.
Here's what actually happens: You choose ClickHouse® because you need fast analytical queries on large datasets. Query performance is excellent—sub-second aggregations over billions of rows with proper ORDER BY optimization.
So you deploy ClickHouse® for production analytics. Design physical layouts carefully with ORDER BY selection for sparse primary index efficiency. Build real-time data ingestion pipelines handling streaming data from Kafka. Denormalize tables into wide structures avoiding expensive joins. Manage cluster operations—replication, backups, upgrades, monitoring.
Six months later, you have excellent query performance on properly modeled data. You also discover that ClickHouse® has more features than you're using—capabilities solving common pain points without leaving ClickHouse® ecosystem:
Projections provide alternative physical layouts or pre-aggregations chosen automatically by optimizer—accelerating queries without manual rewriting.
Materialized views shift aggregation work from query time to insert time—incremental updates for streaming or scheduled refreshes for batch.
Dictionaries replace expensive dimension joins with in-memory O(1) lookups—dramatically faster enrichment without join complexity.
Lightweight deletes enable fast logical deletion visible immediately—compliance requirements met without mutation rewrite costs.
ReplacingMergeTree handles CDC and upserts naturally—latest record wins during merges without manual deduplication logic.
Data skipping indexes accelerate selective queries on non-primary-key columns—filtering efficiency beyond sparse primary index.
ClickHouse® Cloud eliminates operational overhead—managed backups, upgrades, scaling, and high availability.
ClickHouse® Keeper provides coordination infrastructure—replication and distributed DDL without ZooKeeper dependency.
Someone asks: "Can ClickHouse® handle upserts?" or "How do we accelerate this query without changing ORDER BY?" The answer reveals what ClickHouse® actually provides—extensive feature set beyond basic columnar storage that most teams underutilize.
The uncomfortable reality: most teams seeking "ClickHouse® with more features" don't need different databases—they need to discover and use ClickHouse®'s built-in capabilities solving their exact problems.
This article explores ClickHouse® features extending core OLAP capabilities—when projections accelerate queries transparently, when dictionaries replace joins efficiently, when materialized views optimize serving patterns, and when operational features simplify production deployments.
1. Projections: Automatic Query Acceleration with Alternative Layouts
Projections provide ClickHouse®'s most underutilized acceleration feature—storing data in alternative physical layouts or pre-aggregated forms chosen automatically by the optimizer.
What projections provide
Projections deliver transparent query optimization without SQL rewriting:
Normal projections store same data with different ORDER BY—optimizer selects layout matching query filters and sorting automatically.
Aggregate projections pre-compute aggregations (SUM, COUNT, AVG, GROUP BY)—queries reading aggregates hit pre-computed results instead of scanning raw data.
Automatic selection by query optimizer—no query rewriting required when projection matches query pattern.
Multiple projections per table—define several alternative layouts or aggregations serving different query patterns.
Inspectable through system tables—system.projections shows defined projections and their configurations.
How to use projections effectively
Define projections for specific query patterns causing performance issues:
-- Normal projection with alternative ORDER BY
ALTER TABLE events
ADD PROJECTION p_user_time (
SELECT *
ORDER BY user_id, timestamp
);
-- Aggregate projection for common dashboard query
ALTER TABLE events
ADD PROJECTION p_daily_counts (
SELECT
toDate(timestamp) AS day,
event_type,
count() AS cnt
GROUP BY day, event_type
);
Build projections after defining—initial population processes existing data:
ALTER TABLE events MATERIALIZE PROJECTION p_daily_counts;
The storage trade-off
Projections provide query acceleration with storage costs:
Duplicate storage—each projection stores data additionally (normal projections) or aggregated subset (aggregate projections).
Merge overhead—projections rebuild during background merges maintaining consistency with base data.
Design carefully—too many projections multiply storage and merge costs without proportional query benefits.
Monitor usage—track which projections optimizer actually uses via query logs and system.query_log.
When projections make sense
Use projections when:
- Specific query patterns execute frequently requiring different ORDER BY than primary key
- Pre-aggregated dashboards—same aggregations queried repeatedly justifying pre-computation
- Query rewriting infeasible—optimizer choosing layout automatically simpler than application changes
- Storage acceptable—additional space cost justified by query performance gains
Projections solve query acceleration transparently. They're ClickHouse®'s native alternative to manual denormalization or external caching.
2. Materialized Views: Incremental and Refreshable Pre-Aggregations
Materialized views in ClickHouse® provide two distinct patterns—incremental (trigger-based) and refreshable (scheduled)—for shifting computation from query time to insert/refresh time.
What materialized views provide
ClickHouse® materialized views deliver pre-computation with different semantics than traditional databases:
Incremental materialized views apply query transformations to each inserted block—function as insert triggers writing to destination tables.
Refreshable materialized views execute query periodically on complete dataset—scheduled full recomputation like traditional database MVs.
Destination table control—incremental MVs write to explicitly defined tables you query directly.
Aggregation at insert time—incremental MVs shift computation from reads (query time) to writes (insert time).
Scheduled refresh—refreshable MVs execute on configurable intervals recomputing from source data.
How to use incremental materialized views
Define MV as insert trigger writing aggregated data:
-- Create destination table for aggregated data
CREATE TABLE events_hourly (
hour DateTime,
event_type String,
count UInt64
) ENGINE = SummingMergeTree()
ORDER BY (hour, event_type);
-- Incremental MV applying to each insert
CREATE MATERIALIZED VIEW events_hourly_mv TO events_hourly
AS SELECT
toStartOfHour(timestamp) AS hour,
event_type,
count() AS count
FROM events
GROUP BY hour, event_type;
Incremental updates—each insert to events triggers MV query writing aggregates to events_hourly.
How to use refreshable materialized views
Define MV with scheduled refresh for full recomputation:
CREATE MATERIALIZED VIEW dashboard_summary
REFRESH EVERY 1 HOUR
AS SELECT
toDate(timestamp) AS day,
user_id,
count() AS events,
sum(revenue) AS total_revenue
FROM events
WHERE timestamp >= today() - INTERVAL 30 DAY
GROUP BY day, user_id;
Periodic execution—MV refreshes every hour recomputing from source table completely.
The incremental versus refreshable choice
Choose based on update pattern and data characteristics:
Incremental MVs for streaming data—continuous updates as events arrive; query destination table directly for pre-aggregated results.
Refreshable MVs for expensive queries—full recomputation acceptable on schedule (hourly, daily); simpler than incremental logic for complex transformations.
Incremental trade-offs—faster queries (pre-aggregated) but more insert overhead and destination table management.
Refreshable trade-offs—simple scheduled refresh but full recomputation cost and stale data between refreshes.
When materialized views make sense
Use materialized views when:
- Repeated aggregations—same GROUP BY queries executed frequently across real-time dashboards or API endpoints
- Streaming ingestion—incremental MVs update aggregates continuously as events arrive
- Expensive queries—refreshable MVs precompute complex joins or window functions on schedules
- Query latency critical—pre-computation eliminates aggregation cost at query time
Materialized views solve pre-aggregation natively in ClickHouse®. They're essential for serving dashboards efficiently.
3. Dictionaries: In-Memory Key-Value Lookups Replacing Expensive Joins
Dictionaries provide ClickHouse®'s dimension enrichment optimization—in-memory key-value structures replacing joins with O(1) lookups.
What dictionaries provide
Dictionaries deliver fast dimensional lookups without join execution:
In-memory key-value storage—dimensions loaded into memory with primary key access patterns.
Multiple source types—load from ClickHouse® tables, HTTP endpoints, external databases (MySQL, PostgreSQL), files, or executable scripts.
Automatic refresh—configurable update intervals keeping dictionaries synchronized with source data.
Dictionary functions—dictGet, dictGetOrDefault replacing JOIN syntax with direct lookups.
Layout types—flat, hashed, complex_key_hashed, cache optimizing different cardinality and access patterns.
Inspectable metadata—system.dictionaries shows dictionary status, source, and refresh timing.
How to use dictionaries effectively
Define dictionary for dimension enrichment:
CREATE DICTIONARY user_attributes (
user_id UInt64,
country String,
plan String,
signup_date Date
)
PRIMARY KEY user_id
SOURCE(CLICKHOUSE(
HOST 'localhost'
PORT 9000
USER 'default'
TABLE 'users'
DB 'analytics'
))
LAYOUT(HASHED())
LIFETIME(MIN 300 MAX 360);
Use dictionary functions instead of JOINs:
-- Instead of expensive JOIN
SELECT
user_id,
dictGet('user_attributes', 'country', user_id) AS country,
dictGet('user_attributes', 'plan', user_id) AS plan,
count() AS events
FROM events
GROUP BY user_id;
The performance advantage
Dictionaries provide dramatic speedup over joins for dimension enrichment:
O(1) lookup time—hash-based access versus join execution with data shuffling.
Memory locality—dictionary data loaded in memory versus disk reads for join tables.
No join overhead—eliminates join algorithm selection, hash table building, and result materialization.
Concurrent query efficiency—shared dictionary in memory versus per-query join execution.
The dictionary trade-offs
Dictionaries optimize specific patterns with constraints:
Memory requirements—dictionaries consume memory proportional to dimension table size.
Eventual consistency—refresh intervals mean dictionary lags source; choose LIFETIME balancing freshness versus reload overhead.
Lookup-only semantics—dictionaries don't support complex join conditions beyond primary key equality.
Design for static/slowly-changing—best for dimensions changing infrequently (countries, products, user profiles) not rapidly mutating data.
When dictionaries make sense
Use dictionaries when:
- Dimension enrichment dominates query patterns—adding user attributes, product details, geography to fact events
- Small to medium dimensions—tables fitting comfortably in memory (millions of rows acceptable)
- Lookup by primary key—equality conditions on single key or composite keys
- Read-heavy workloads—dimensions queried frequently but updated infrequently
Dictionaries solve dimensional joins efficiently. They're essential for operational analytics with enrichment.
4. Lightweight Deletes: Fast Logical Deletion Without Rewriting Parts
Lightweight DELETE provides ClickHouse®'s fast deletion mechanism—marking rows deleted with immediate query visibility without rewriting data parts.
What lightweight DELETE provides
Lightweight DELETE delivers operational deletion with different semantics than mutations:
Immediate visibility—deleted rows filtered from queries instantly without waiting for part rewrites.
Logical deletion—marks rows deleted in metadata; physical cleanup happens asynchronously during merges.
Low write amplification—avoids rewriting entire data parts immediately reducing I/O and latency.
Simplified compliance—fast deletion for GDPR/CCPA requirements without mutation complexity.
Mask-based filtering—uses deletion masks checked during query execution filtering marked rows.
How to use lightweight DELETE
Execute standard DELETE syntax with lightweight semantics:
-- Lightweight DELETE for compliance
DELETE FROM events
WHERE user_id = 12345;
-- Lightweight DELETE with complex conditions
DELETE FROM events
WHERE event_date < today() - INTERVAL 90 DAY
AND event_type IN ('debug', 'test');
Enable lightweight deletes per table or session:
-- Enable for specific table (if not default)
ALTER TABLE events
MODIFY SETTING enable_lightweight_delete = 1;
-- Or set in session
SET lightweight_deletes = 1;
The lightweight versus mutation choice
Choose deletion strategy based on requirements:
Lightweight DELETE when:
- Immediate deletion visibility required (compliance, user requests)
- Deletion patterns are selective (specific users, date ranges)
- Physical cleanup delay acceptable (deleted data persists on disk until merges)
- Write amplification minimization matters
Mutations (ALTER TABLE ... DELETE) when:
- Physical deletion required immediately (security, disk space)
- Deleting large portions of table where rewrite cost acceptable
- Batch deletion patterns during maintenance windows
The trade-off understanding
Lightweight DELETE provides fast deletion with deferred cleanup:
Query overhead—checking deletion masks adds small cost to query execution.
Storage persistence—deleted rows occupy disk space until background merges eliminate them physically.
Merge dependency—physical cleanup happens during merges; may require triggering merges explicitly for immediate space reclamation.
Monitoring cleanup—track merge progress and disk space to verify physical deletion completion.
When lightweight DELETE makes sense
Use lightweight DELETE when:
- Compliance requirements—GDPR right to erasure, CCPA deletion requests requiring fast response
- Selective deletion—removing specific users, sessions, or date ranges
- Operational corrections—fixing data quality issues quickly
- Write amplification concerns—minimizing I/O impact of deletion operations
Lightweight DELETE solves fast deletion requirements. It's ClickHouse®'s answer to operational deletion needs.
5. ReplacingMergeTree: Upsert Patterns Through Merge-Time Deduplication
ReplacingMergeTree provides ClickHouse®'s upsert capability—automatically deduplicating rows during background merges keeping latest version.
What ReplacingMergeTree provides
ReplacingMergeTree delivers CDC and upsert patterns without manual deduplication:
Merge-time deduplication—background merges replace duplicate rows (same ORDER BY key) keeping latest version.
Version column—optional column determining which row is "latest" (highest value wins during deduplication).
Sign column—optional -1/+1 pattern for deletions (VersionedCollapsingMergeTree variant).
Eventual consistency—deduplication happens during merges not immediately on insert.
CDC compatibility—natural fit for change data capture streams with updates and deletes.
No explicit upsert syntax—insert all events; merges handle deduplication automatically.
How to use ReplacingMergeTree effectively
Define table with deduplication strategy:
CREATE TABLE user_profiles (
user_id UInt64,
email String,
plan String,
updated_at DateTime,
-- other columns
)
ENGINE = ReplacingMergeTree(updated_at)
ORDER BY user_id;
Insert updates freely—merges deduplicate keeping row with highest updated_at:
-- Initial insert
INSERT INTO user_profiles VALUES (123, 'user@example.com', 'free', '2025-01-01 10:00:00');
-- Update (insert new version)
INSERT INTO user_profiles VALUES (123, 'user@example.com', 'pro', '2025-01-14 15:30:00');
-- Merges eventually keep only latest version per user_id
The eventual consistency reality
ReplacingMergeTree provides deduplication eventually with query considerations:
Immediate queries may see duplicates—merges run asynchronously; FINAL modifier forces deduplication at query time.
OPTIMIZE TABLE—manually trigger merges for immediate deduplication:
OPTIMIZE TABLE user_profiles FINAL;
FINAL modifier—forces deduplication during query execution (slower queries, guaranteed latest):
SELECT * FROM user_profiles FINAL WHERE user_id = 123;
Application logic awareness—design applications tolerating duplicates or using FINAL when correctness critical.
The merge-time deduplication trade-offs
ReplacingMergeTree optimizes insert throughput with query complexity:
Fast inserts—no deduplication overhead at insert time; write all updates directly.
Query cost—reading without FINAL may return duplicates; FINAL adds deduplication overhead.
Storage until merge—duplicates consume space until merges eliminate them.
Merge monitoring—track merge progress ensuring deduplication happens timely.
When ReplacingMergeTree makes sense
Use ReplacingMergeTree when:
- **CDC streams—**consuming database change events with updates and deletes that propagate efficiently to each downstream system
- Mutable dimensions—user profiles, product catalogs, configuration data changing over time
- Insert performance priority—high-throughput ingestion more important than immediate deduplication
- Eventual consistency acceptable—application logic handles duplicates or uses FINAL when needed
ReplacingMergeTree solves upsert patterns natively. It's ClickHouse®'s CDC-friendly engine choice.
6. Data Skipping Indexes: Secondary Indexes for Selective Queries
Data skipping indexes provide ClickHouse®'s secondary indexing—accelerating queries on non-primary-key columns through granule-level filtering.
What data skipping indexes provide
Data skipping indexes deliver selective query acceleration beyond sparse primary index:
Granule-level filtering—indexes enable skipping entire granules (8192 rows default) when conditions don't match.
Multiple index types—minmax, set, bloom_filter, ngrambf_v1, tokenbf_v1 optimizing different query patterns.
Column subset indexing—indexes on specific columns used in WHERE clauses frequently.
Automatic index usage—optimizer applies indexes when beneficial without query hints.
Low storage overhead—indexes store aggregate information per granule not per row.
How to define data skipping indexes
Create indexes for selective filter patterns:
-- MinMax index for range queries
ALTER TABLE events
ADD INDEX idx_user_id user_id TYPE minmax GRANULARITY 4;
-- Bloom filter for equality checks
ALTER TABLE events
ADD INDEX idx_event_type event_type TYPE bloom_filter GRANULARITY 4;
-- Set index for IN clauses with limited cardinality
ALTER TABLE events
ADD INDEX idx_country country TYPE set(100) GRANULARITY 4;
-- NGram index for LIKE queries
ALTER TABLE logs
ADD INDEX idx_message message TYPE ngrambf_v1(3, 512, 2, 0) GRANULARITY 4;
Materialize indexes for existing data:
ALTER TABLE events MATERIALIZE INDEX idx_user_id;
The index type selection
Choose index types based on query patterns:
minmax—range queries on numeric or date columns (>, <, BETWEEN).
set(N)—IN clauses or equality checks with limited distinct values (cardinality < N).
bloom_filter—equality checks (=) or IN with higher cardinality; probabilistic filtering.
ngrambf_v1—LIKE queries or substring matching; useful for text search.
tokenbf_v1—token-based matching for analyzing text by tokens.
The skipping index trade-offs
Data skipping indexes provide query acceleration with storage and maintenance costs:
Index storage—additional space per granule for index data structures.
Insert overhead—indexes update during inserts adding processing time.
GRANULARITY parameter—controls index density (higher = less storage but less precise skipping).
Effectiveness varies—indexes help selective queries; full scans don't benefit and pay index overhead.
Monitor effectiveness—check system.query_log and EXPLAIN to verify index usage.
When data skipping indexes make sense
Use data skipping indexes when:
- Selective queries filtering on non-primary-key columns frequently
- Known filter patterns—specific columns in WHERE clauses repeatedly
- High-cardinality columns—primary key doesn't provide sufficient pruning
- Range or equality searches—patterns matching index types (minmax, bloom_filter, set)
Data skipping indexes solve selective query acceleration. They complement sparse primary indexes for specific patterns.
7. ClickHouse® Cloud: Managed Operations Eliminating Cluster Administration
ClickHouse® Cloud provides managed ClickHouse® service—automated operations eliminating self-managed cluster complexity through modern cloud computing infrastructure.
What ClickHouse® Cloud provides
ClickHouse® Cloud delivers operational simplicity through managed infrastructure:
Automatic backups—scheduled backups with point-in-time restore capabilities included by default.
Regular upgrades—platform handles ClickHouse® version updates without manual coordination.
Auto-scaling—serverless compute adjusting resources automatically based on workload.
High availability—built-in replication and failover without manual configuration.
SharedMergeTree architecture—compute-storage separation with object storage backend.
Private connectivity—AWS PrivateLink, GCP Private Service Connect, Azure Private Link support.
Monitoring dashboards—integrated observability without deploying separate monitoring infrastructure.
Support and SLAs—vendor support with service level agreements for production workloads.
The self-managed versus Cloud differences
ClickHouse® Cloud automates operations self-managed requires building:
Self-managed challenges:
- Manual backup strategies and recovery procedures
- Coordinating cluster upgrades across replicas
- Capacity planning and scaling coordination
- Replication configuration with Keeper or ZooKeeper
- Monitoring infrastructure deployment and maintenance
- Security patching and operational maintenance
Cloud automation:
- Backups automatic with UI-based restore
- Upgrades managed by platform
- Serverless scaling without capacity planning
- Replication configured automatically
- Monitoring included in platform
- Security patches handled transparently
The Cloud trade-offs
ClickHouse® Cloud provides operational simplicity with cost and control considerations:
Consumption-based pricing—compute and storage charged separately based on usage.
Less infrastructure control—platform manages configuration; customization limited compared to self-managed.
Vendor dependency—migration from Cloud requires recreating infrastructure elsewhere.
Network egress costs—data transfer charges for queries from external applications.
Regional availability—service availability depends on Cloud provider regions.
When ClickHouse® Cloud makes sense
Choose ClickHouse® Cloud when:
- Operational burden—managing ClickHouse® clusters outweighs benefits of infrastructure control
- Team expertise—SQL strength exceeds distributed database operations knowledge
- Rapid deployment—production ClickHouse® needed quickly without infrastructure setup
- Variable workloads—serverless scaling handles unpredictable traffic automatically
- Managed operations value—backups, upgrades, monitoring, support justify platform costs
ClickHouse® Cloud solves operational complexity. It's managed ClickHouse® for teams prioritizing applications over infrastructure.
8. ClickHouse® Keeper: Coordination for Replication and Distributed DDL
ClickHouse® Keeper provides coordination infrastructure—ZooKeeper-compatible service managing replication and distributed operations.
What ClickHouse® Keeper provides
ClickHouse® Keeper delivers distributed coordination for ClickHouse® clusters:
Replication coordination—ReplicatedMergeTree engines use Keeper for synchronizing data parts across replicas.
Distributed DDL—schema changes coordinated across cluster nodes through Keeper.
Insert deduplication—Keeper stores insert block hashes preventing duplicate inserts within configurable window.
ZooKeeper compatibility—drop-in replacement for ZooKeeper with compatible protocol.
Simpler operations—single-purpose service versus general-purpose ZooKeeper reducing operational complexity.
RAFT consensus—built on RAFT algorithm providing fault tolerance and leader election.
Lightweight—smaller memory footprint than ZooKeeper for ClickHouse®-specific workloads.
How ClickHouse® Keeper works
Keeper provides metadata storage and coordination:
Replication metadata—ReplicatedMergeTree stores part checksums, merge operations, and mutation tracking in Keeper.
Leader election—replicas coordinate which performs merges and mutations through Keeper.
Insert deduplication—hash sums of inserted blocks stored in Keeper preventing duplicates within retention window.
DDL queue—distributed DDL statements queued in Keeper ensuring consistent schema across replicas.
Snapshot and logs—Keeper maintains state through snapshots and operation logs for recovery.
The Keeper versus ZooKeeper choice
ClickHouse® Keeper provides ClickHouse®-optimized coordination versus general-purpose ZooKeeper:
Keeper advantages:
- Optimized for ClickHouse® access patterns
- Lower memory usage for typical workloads
- Simpler deployment (single binary with ClickHouse®)
- Faster recovery and snapshot operations
- ClickHouse®-native monitoring integration
When ZooKeeper still makes sense:
- Existing ZooKeeper infrastructure serving multiple systems
- Organizational standards requiring ZooKeeper
- Complex multi-tenant coordination beyond ClickHouse®
The replication deduplication mechanism
Keeper enables exactly-once insert semantics through hash tracking:
Insert deduplication window—configurable retention (default minutes to hours) for tracking insert block hashes.
Async inserts impact—deduplication applies to async insert boundaries affecting retry behavior.
Monitoring deduplication—track duplicate detection through system metrics and logs.
Tune window size—balance deduplication guarantees versus Keeper memory usage.
When ClickHouse® Keeper matters
Use ClickHouse® Keeper when:
- Replicated tables—ReplicatedMergeTree engines requiring coordination infrastructure
- High availability—multi-replica deployments needing failover coordination
- Distributed operations—coordinating DDL changes across cluster nodes
- Insert deduplication—preventing duplicate data from retry logic or network issues
- Simpler than ZooKeeper—prefer single-purpose ClickHouse®-native service
ClickHouse® Keeper solves coordination infrastructure. It's essential for production ClickHouse® clusters with replication.
9. Tinybird: Real-Time Analytics APIs Platform Built on ClickHouse®
Tinybird provides a development layer on top of ClickHouse® that transforms analytical queries into low-latency APIs—simplifying the construction of data-driven applications and products.
What Tinybird provides
Tinybird delivers development abstractions that accelerate analytical pipeline creation:
Pipes as SQL transformations—define transformation logic through chained SQL; each Pipe executes one or more queries on ClickHouse® producing intermediate or final datasets.
Automatic APIs from SQL—publish any Pipe as HTTP REST endpoint; Tinybird generates APIs with parameters, authentication, and versioning automatically.
Managed ingestion—native connectors for Kafka, HTTP events, CSV/JSON files, and webhooks; backpressure handling and retry logic included.
CLI and Git workflow—local development with tb CLI; version control for Pipes, Data Sources, and configurations through Git for CI/CD integration.
Integrated observability—latency, throughput, and error metrics per endpoint; query traceability from API request to ClickHouse® execution.
Copy and snapshot management—automatic Materialized Views for optimization; fixture and branch management for development and testing.
How to use Tinybird effectively
Define Data Sources mapping ClickHouse® tables or ingestion:
sql
-- Data Source with explicit schema
SCHEMA >
`timestamp` DateTime,
`user_id` UInt64,
`event_type` String,
`properties` String
ENGINE "MergeTree"
ENGINE_SORTING_KEY "timestamp, user_id"
Create Pipes transforming data with parameterized SQL:
sql
-- Pipe: user_events_summary.pipe
NODE endpoint
SQL >
SELECT
toDate(timestamp) AS date,
event_type,
count() AS event_count
FROM events
WHERE user_id = {{ UInt64(user_id, 0) }}
AND timestamp >= {{DateTime(start_date, '2025-01-01')}}
AND timestamp < {{DateTime(end_date, now())}}
GROUP BY date, event_type
ORDER BY date DESC
Publish as API with single command:
bash
tb pipe publish user_events_summary
# Generates: https://api.tinybird.co/v0/pipes/user_events_summary.json?user_id=123
The API-first development model
Tinybird inverts the traditional analytics → application flow:
Traditional ClickHouse® development:
- Write SQL queries in ClickHouse® client
- Integrate queries into application code
- Manage connection pooling, retry logic, parameters
- Deploy entire application for query changes
- Build custom monitoring and logging
Development with Tinybird:
- Write SQL Pipes with parameters
- Publish as API endpoint
- Consume from application via HTTP
- Update Pipes independently through Git push
- Observability included in platform
The Tinybird trade-offs
Tinybird provides development velocity with cost and abstraction considerations:
Additional layer—abstraction over ClickHouse®; complex queries may require understanding both levels.
Consumption-based pricing—charged per GB processed and requests; can be more expensive than self-managed ClickHouse® at very high scale.
Less direct ClickHouse® control—advanced configurations (custom merge tree settings, distributed tables) limited versus self-managed.
Moderate vendor lock-in—Pipes and Data Sources portable to SQL; migration requires rebuilding workflow and APIs.
Optimized for APIs—excellent for serving patterns; extensive batch processing may benefit from direct ClickHouse®.
When Tinybird makes sense
Use Tinybird when:
Product analytics and dashboards—low-latency APIs serving user-facing applications; real-time metrics.
Data products—building analytical features into applications (recommendations, segmentation, personalization).
Rapid prototyping—validating analytical ideas without infrastructure; concept to production in hours.
Small teams—maximizing output with limited resources; avoiding ClickHouse® infrastructure and API layer management.
Streaming analytics—concurrent ingestion and queries on fresh data; sub-100ms latency requirements.
CI/CD data pipelines—Git versioning of transformations; automated testing and deployment.
Tinybird solves the "last mile" between ClickHouse® and applications. It's the development platform for teams building products on real-time analytics, prioritizing velocity and simplicity over complete infrastructure control.
Real example: SaaS metrics dashboard with dynamic filters by customer, date, and dimensions—Tinybird generates 10+ API endpoints from SQL Pipes; frontend consumes APIs directly; logic updates through Git push without application redeployment.
Decision Framework: Choosing Which ClickHouse® Features to Use
Start with pain points
Slow queries on different ORDER BY? Projections provide alternative layouts automatically.
Repeated aggregations? Materialized views pre-compute results at insert or refresh time.
Expensive dimension joins? Dictionaries replace joins with in-memory lookups.
Deletion requirements? Lightweight DELETE for compliance; mutations for physical cleanup.
CDC and upserts? ReplacingMergeTree handles deduplication through merges.
Selective queries on non-primary columns? Data skipping indexes accelerate filtering.
Operational overhead? ClickHouse® Cloud eliminates cluster management.
Replication needs? ClickHouse® Keeper coordinates distributed operations.
Evaluate query patterns
Dashboard tiles with same aggregations? Materialized views or aggregate projections pre-compute.
Dimensional enrichment everywhere? Dictionaries optimize lookups versus repeated joins.
Multiple query patterns on same table? Normal projections provide alternative ORDER BY options.
Selective filtering frequent? Data skipping indexes on commonly filtered columns.
Consider operational maturity
Self-managing clusters? Keeper essential for replication; projections and indexes optimize performance.
Limited operations expertise? ClickHouse® Cloud simplifies infrastructure; focus on materialized views and dictionaries for query optimization.
CDC integration required? ReplacingMergeTree with lightweight deletes handles mutable data patterns.
Compliance requirements? Lightweight DELETE provides fast deletion for GDPR/CCPA.
Calculate implementation complexity
Projections—medium complexity; define and materialize per table with storage trade-off.
Materialized views—medium to high; design destination tables and refresh strategies carefully.
Dictionaries—low to medium; straightforward for static dimensions; refresh strategy for changing data.
Lightweight DELETE—low complexity; standard SQL syntax with semantic understanding.
ReplacingMergeTree—medium complexity; requires FINAL usage decisions and merge monitoring.
Data skipping indexes—medium complexity; index type selection and GRANULARITY tuning.
ClickHouse® Cloud—low operational complexity; platform handles infrastructure.
ClickHouse® Keeper—medium complexity for deployment; low ongoing maintenance.
Frequently Asked Questions (FAQs)
Do projections replace materialized views?
No—different use cases. Projections provide alternative physical layouts or pre-aggregations within same table; optimizer chooses automatically. Materialized views create separate destination tables with explicit queries. Use projections for transparent acceleration; use MVs when explicit table management and complex transformations needed.
Can dictionaries handle millions of rows?
Yes—memory permitting. Dictionaries load completely in memory; millions of rows work fine on modern servers. Monitor memory usage and choose appropriate LAYOUT (hashed, complex_key_hashed) for your cardinality. For very large dimensions, consider materialized view pre-joins instead.
How do lightweight deletes affect query performance?
Minimal overhead—deletion masks checked during query execution add small cost. Performance impact depends on deletion volume; heavily deleted tables may show degradation until merges physically remove data. Monitor query performance and trigger merges if needed.
What's the difference between ReplacingMergeTree and CollapsingMergeTree?
ReplacingMergeTree keeps latest row (by version column) during deduplication. CollapsingMergeTree uses sign column (-1/+1) collapsing pairs; useful for increment/decrement patterns. VersionedCollapsingMergeTree combines both—version and sign. Choose based on CDC pattern: standard updates use Replacing; increment/decrement use Collapsing.
Should I always use ClickHouse® Cloud instead of self-managed?
Depends on priorities. ClickHouse® Cloud simplifies operations eliminating backup, upgrade, scaling complexity—choose when operational simplicity matters. Self-managed provides infrastructure control, custom configurations, potentially lower costs at scale—choose when engineering expertise available and control requirements justify operational burden.
How many projections can one table have?
Technically unlimited—practically few. Each projection duplicates storage and adds merge overhead. Typically 2-3 projections per table maximum balancing query acceleration versus storage/maintenance costs. Monitor storage growth and merge performance.
Do data skipping indexes slow down inserts?
Yes—slightly. Indexes add overhead during inserts building index structures. Impact minimal for most workloads but measurable at very high insert rates. Monitor insert performance and disable unused indexes if overhead unacceptable.
Most teams using ClickHouse® underutilize built-in features solving common problems.
The question isn't "what database has more features than ClickHouse®?" The question is "which ClickHouse® features solve my specific requirements?"
For query acceleration:
Projections provide alternative layouts and pre-aggregations chosen automatically. Materialized views shift computation to insert/refresh time. Data skipping indexes accelerate selective filtering.
For data patterns:
Dictionaries replace expensive joins with in-memory lookups. ReplacingMergeTree handles CDC and upserts through merge-time deduplication. Lightweight DELETE provides fast deletion for compliance.
For operations:
ClickHouse® Cloud eliminates cluster management through managed service. ClickHouse® Keeper provides coordination for replication and distributed operations.
The right ClickHouse® features aren't always the newest or most complex—they're the ones matching your specific query patterns, data characteristics, and operational requirements.
Start by identifying pain points: slow queries, expensive joins, operational complexity. Then evaluate which ClickHouse® features solve those specific problems. Many teams discover solutions already exist in ClickHouse® without needing different databases or extensive custom engineering.
ClickHouse® provides extensive capabilities beyond basic columnar storage. Learning and using these features effectively often delivers better results than seeking alternative databases or evaluating the best database for real-time analytics. Understand the trade-offs—storage costs, merge overhead, operational complexity—and apply features strategically where benefits justify costs.
