ClickHouse® is an open-source, column-oriented database built for analytics on large datasets, while Umbra is an in-memory database designed for fast analytical queries with full transaction support. Both target analytical workloads, but ClickHouse® focuses on streaming ingestion and horizontal scaling across multiple servers, whereas Umbra prioritizes single-node query speed with ACID guarantees.
This guide compares their architectures, performance characteristics, operational requirements, and ideal use cases to help you choose the right database for your analytics workload.
What is ClickHouse® and what is Umbra
ClickHouse® is an open-source, column-oriented database built for analytics on large datasets, while Umbra is an in-memory database designed for fast analytical queries with full transaction support. Both target analytical workloads, but ClickHouse® focuses on streaming ingestion and horizontal scaling across multiple servers, whereas Umbra prioritizes single-node query speed with ACID guarantees.
ClickHouse® overview
ClickHouse® is a columnar database management system originally developed at Yandex in 2016 and released under the Apache 2.0 license. The database handles online analytical processing workloads where data arrives continuously and queries aggregate across millions or billions of rows. ClickHouse® uses a merge tree storage engine that compresses data efficiently and executes queries using vectorized processing, which scans columns at billions of rows per second on standard hardware.
Umbra overview
Umbra is a research database system from the Technical University of Munich, now commercialized as CedarDB. The system delivers fast analytical query performance through an in-memory, vectorized execution model. Unlike ClickHouse®, Umbra provides full ACID transaction support and runs on single nodes with large memory capacity rather than distributed clusters. Umbra has gained attention for strong performance in ClickBench, where it shows competitive or faster query speeds compared to ClickHouse® and other analytical databases.
Storage and execution architecture differences
ClickHouse® and Umbra make different architectural trade-offs. ClickHouse® optimizes for horizontal scalability and streaming ingestion, while Umbra optimizes for single-node query speed with transactional guarantees.
Columnar merge tree storage
ClickHouse® stores data in columns using the MergeTree engine family, organizing rows into sorted, immutable parts that merge periodically in the background. This approach compresses data efficiently because similar values sit together, and queries read only the columns they need rather than entire rows. The merge tree design supports high-throughput ingestion because new data writes to small parts that combine later, avoiding rewrites of large table portions on every insert. However, updates and deletes are asynchronous mutations rather than immediate operations, making ClickHouse® less suitable for transactional workloads.
In-memory vector store design
Umbra uses an in-memory storage model where data stays primarily in RAM, with spill-to-disk mechanisms for datasets exceeding available memory. The system organizes data in a columnar format optimized for vectorized execution, processing batches of values in tight loops that leverage CPU cache and SIMD instructions. This memory-first approach minimizes disk I/O during query execution, producing low query latencies for datasets that fit in memory. Hardware requirements scale directly with dataset size, though, and operational costs can be higher for large workloads compared to disk-based systems.
Vectorized query execution paths
Both ClickHouse® and Umbra use vectorized query execution, where operations process batches of rows at once rather than one row at a time. This technique improves CPU efficiency by reducing branching, improving cache locality, and enabling SIMD (Single Instruction, Multiple Data) instructions that perform the same operation on multiple values simultaneously.
In ClickHouse®, vectorized execution combines with query optimization techniques like predicate pushdown and partition pruning, which skip irrelevant data before processing. Umbra's execution engine is tuned for low-latency queries and includes advanced optimizations like adaptive query compilation, which generates machine code tailored to specific query patterns.
Performance on analytical benchmarks and real-time ingestion
ClickHouse® and Umbra both perform well in analytical benchmarks, but their strengths appear in different workload types.
ClickBench results summary
ClickBench measures query performance across aggregation, filtering, and join operations on a 100-million-row dataset. In recent results, Umbra (via CedarDB) completes queries faster than ClickHouse® on single-node deployments, though ClickHouse® has achieved 25% performance improvements through infrastructure optimizations like migrating to ARM-based instances. ClickHouse® remains competitive and outperforms Umbra in distributed configurations where queries parallelize across multiple servers. The benchmark also shows ClickHouse®'s query latency is more consistent under concurrent load, while Umbra's performance varies depending on memory availability and dataset size.
Streaming ingestion throughput
ClickHouse® handles continuous, high-volume data ingestion from sources like Kafka, event streams, and application logs. The database can process 4 million rows per second per server with low latency, making it well-suited for real-time analytics where data arrives constantly and queries run against recent data.
Umbra optimizes for batch loading rather than streaming ingestion. Its in-memory architecture means ingestion throughput is limited by available RAM and the time required to rebuild in-memory structures. For applications depending on real-time data pipelines, ClickHouse®'s streaming capabilities provide a clear advantage.
Query latency under concurrency
When multiple users or applications query a database simultaneously, concurrency control and resource contention affect query latency. ClickHouse® handles high concurrency well because its columnar storage and query execution engine optimize for parallel reads, using techniques like query throttling and resource pools to prevent any single query from monopolizing resources.
Umbra's in-memory design delivers low latencies for individual queries, but under high concurrency, memory contention and cache eviction can degrade performance. ClickHouse® is generally more predictable in multi-tenant or high-concurrency environments, while Umbra excels when query volume is lower but speed is the top priority.
ACID guarantees, concurrency, and consistency
ClickHouse® and Umbra differ significantly in transactional capabilities, reflecting their different design goals.
Transaction models
ClickHouse® doesn't support traditional ACID transactions like transactional databases such as PostgreSQL. Inserts are atomic at the block level, meaning either an entire batch of rows writes or none do, but there's no support for multi-statement transactions spanning multiple tables or requiring rollback.
Updates and deletes in ClickHouse® are asynchronous mutations applied in the background, so they're not immediately visible to queries. Umbra provides full ACID transaction support with multi-version concurrency control (MVCC), allowing it to handle mixed read-write workloads and maintain consistency across concurrent transactions.
Isolation levels
Umbra supports standard SQL isolation levels, including read committed and serializable, which allow applications to control the visibility of uncommitted changes and prevent anomalies like dirty reads and phantom rows.
Queries in ClickHouse® read from a consistent snapshot of data at the time the query starts, but there's no mechanism to lock rows or prevent other queries from reading the same data simultaneously. This makes ClickHouse® simpler to operate but less suitable for applications requiring strict consistency guarantees.
Operational overhead and total cost of ownership
Operational complexity and cost depend on factors like dataset size, query volume, and the need for distributed deployments.
Cluster provisioning and scaling effort
ClickHouse® supports horizontal scaling through sharding, where data distributes across multiple servers, and replication, where each shard copies to multiple nodes for redundancy. Setting up a ClickHouse® cluster requires configuring ZooKeeper or ClickHouse® Keeper, defining shard and replica topologies, and managing distributed tables that route queries to the correct nodes. This adds operational complexity but allows ClickHouse® to scale to petabyte-scale datasets and handle thousands of queries per second, with ClickHouse® Cloud customers collectively running 5.5 billion queries per day.
Umbra scales vertically by adding more memory and CPU to a single server, which simplifies operations but limits the maximum dataset size and query throughput to what one machine can handle.
Storage and compute cost trade-offs
ClickHouse®'s disk-based storage model means storage costs scale with data volume, but compute costs can be optimized by using cheaper hardware or scaling up only when query load increases. Umbra's in-memory architecture means both storage and compute costs scale with dataset size because more memory is required to hold larger datasets.
For workloads where data is frequently queried but rarely updated, ClickHouse®'s approach is more cost-effective. For workloads where query latency is the top priority and datasets are small enough to fit in memory, Umbra's performance may justify the higher hardware costs.
Cloud-managed options and vendor ecosystem
Both ClickHouse® and Umbra are available as managed services, though the maturity and breadth of offerings differ significantly.
ClickHouse® Cloud and partner platforms
ClickHouse® is available as a managed service from ClickHouse® Inc. through ClickHouse® Cloud, as well as from third-party providers like Altinity and Aiven. These services handle cluster provisioning, scaling, backups, and upgrades, allowing teams to use ClickHouse® without managing infrastructure.
ClickHouse® Cloud offers features like auto-scaling, pay-as-you-go pricing, and integration with popular data sources, including eligibility for Microsoft Azure Consumption Commitment for enterprises using Azure.
Umbra commercial availability
Umbra's commercial offering, CedarDB, is available as a managed service or self-hosted deployment. The service is newer and less widely adopted than ClickHouse® Cloud, which means documentation, community support, and ecosystem integrations are still developing. CedarDB targets organizations that prioritize query speed and are willing to adopt newer technology in exchange for performance gains.
Tinybird managed ClickHouse® option
Tinybird is a managed ClickHouse® platform designed for software developers who want to integrate ClickHouse® into application backends without managing infrastructure. Unlike ClickHouse® Cloud, which focuses on enterprise analytics, Tinybird provides developer-friendly features like API endpoints that expose ClickHouse® queries as REST APIs, CI/CD workflows for deploying data pipelines as code, and built-in observability for monitoring query performance. Tinybird handles scaling, replication, and performance tuning automatically, allowing developers to focus on building features rather than managing databases.
When to choose ClickHouse®, Umbra, or Tinybird
The decision between ClickHouse®, Umbra, and Tinybird depends on your workload characteristics, operational preferences, and team capabilities.
Choose ClickHouse® if:
- Your dataset is large (terabytes to petabytes) and growing continuously
- You need high-throughput streaming ingestion from sources like Kafka or event streams
- Your queries are primarily analytical (aggregations, filtering, time-series analysis) rather than transactional
- You have operational expertise to manage distributed systems, or you're willing to use a managed service like ClickHouse® Cloud
Choose Umbra (CedarDB) if:
- Your dataset fits comfortably in memory (up to a few terabytes with sufficient RAM)
- Query latency is the top priority, and you're willing to pay for high-memory hardware
- You need full ACID transaction support for mixed read-write workloads
- Your workload is primarily batch-oriented rather than real-time streaming
Choose Tinybird if:
- You're a developer building real-time analytics into an application
- You want to use ClickHouse® without managing infrastructure, scaling, or DevOps
- You need to expose ClickHouse® queries as APIs for your application backend
- Developer experience and time-to-production are more important than fine-grained control over infrastructure
For most developers building applications with real-time analytics, Tinybird provides a faster path to production because it eliminates the operational complexity of ClickHouse® while preserving its performance advantages. Sign up for a free Tinybird plan to get started in minutes.
FAQs about ClickHouse® vs Umbra
What SQL dialect differences exist between ClickHouse® and Umbra?
Both ClickHouse® and Umbra support standard SQL for most analytical queries, but ClickHouse® has a more extensive set of proprietary functions and syntax extensions. ClickHouse® includes specialized functions for time-series analysis, approximate aggregations, and array processing that may not be available in Umbra. Umbra adheres more closely to standard SQL, which can make it easier to port queries from other databases, but it may lack some of the advanced analytical capabilities that ClickHouse® provides.
Does Umbra support distributed deployments across multiple servers?
Umbra is primarily designed for single-node deployments with large memory capacity rather than distributed architectures. While the CedarDB commercial offering may include clustering features in the future, the current focus is on vertical scaling within a single server. ClickHouse®, by contrast, was built from the ground up to support distributed deployments with sharding and replication, making it a better choice for workloads that exceed the capacity of a single machine.
Can ClickHouse® handle mixed analytical and transactional workloads effectively?
ClickHouse® excels at analytical queries but lacks full ACID transaction support for transactional workloads. While it can handle updates and deletes through asynchronous mutations, these operations aren't suitable for high-frequency transactional patterns like those found in OLTP databases. For applications requiring both real-time analytics and transactional consistency, a common pattern is to use a transactional database like PostgreSQL for writes and replicate data to ClickHouse® for analytics.
/
