PostgreSQL with pg_mooncake recently cracked the top 10 on ClickBench, a benchmark typically dominated by specialized analytical databases like ClickHouse®. This achievement raised questions about whether adding columnar storage to PostgreSQL through an extension can match the performance of a purpose-built OLAP database at scale.
This article compares ClickHouse® and PostgreSQL with pg_mooncake across architecture, benchmark performance, operational complexity, and real-world use cases to help you decide which system fits your analytical workload.
How columnar storage changes PostgreSQL performance
ClickHouse® is a native columnar database optimized for analytical queries by storing data in columns. PostgreSQL with the pg_mooncake extension adds columnstore tables and vectorized execution to PostgreSQL, using DuckDB's engine for analytical queries. This hybrid approach lets PostgreSQL handle both transactional and analytical workloads in one system, though it adds complexity compared to ClickHouse®'s purpose-built design.
Traditional row-oriented databases like PostgreSQL store entire rows together on disk. When you run an analytical query that only needs a few columns from a wide table, the database still reads all columns for each row. Columnar storage flips this by storing each column separately, so queries only read what they actually need.
The performance difference shows up clearly with wide fact tables. A query that aggregates three columns from a table with fifty columns reads 94% less data when using columnar storage compared to row storage.
Read amplification reduction
Read amplification happens when a database reads significantly more data from disk than the query needs. In row-oriented storage, selecting a single column from a table forces the database to read every column in every row, even though most of that data gets discarded.
- Physical organization: Columnar storage puts each column in its own segment on disk, so a query selecting three columns from a million-row table only reads those three columns
- I/O savings: The reduction in disk reads translates directly to faster query times and lower CPU usage for decompression and filtering
- Analytical workloads: This matters most for queries that scan large portions of a table but only examine a few columns
Compression ratios on wide fact tables
Columnar storage achieves better compression than row storage because similar data types stored together compress more efficiently. A column of integers or timestamps contains repeating patterns and limited value ranges, which compression algorithms exploit.
pg_mooncake uses DuckDB's compression techniques on its columnstore tables, typically achieving compression ratios between 5:1 and 20:1 depending on data characteristics. ClickHouse® uses specialized codecs like LZ4, ZSTD, and Delta encoding, with production deployments achieving 15-20× compression ratios on analytical data. Higher compression means less disk space and faster queries since the database reads less data from disk, though compression and decompression add CPU overhead.
Impact on updates and deletes
Columnar storage trades write performance for read performance. Updating or deleting individual rows in a columnar table costs more than in row storage because the database might need to rewrite entire column segments rather than modifying a single row in place.
PostgreSQL's native row storage handles updates efficiently through its MVCC (Multi-Version Concurrency Control) system, which creates new row versions without immediately removing old ones. pg_mooncake maintains transactional integrity by storing table metadata in PostgreSQL's heap tables, but updates and deletes on columnstore tables still require more work. ClickHouse® handles updates and deletes through mutations, which are asynchronous background operations that rewrite affected data parts.
What is pg_mooncake and how does it work
pg_mooncake is a PostgreSQL extension that adds columnar storage capabilities by integrating DuckDB's execution engine. The extension lets you create columnstore tables that live alongside regular PostgreSQL heap tables, giving you the option to use columnar storage for analytical workloads while keeping row storage for transactional workloads.
When you create a columnstore table, pg_mooncake stores the actual data in DuckDB's columnar format while maintaining table metadata in PostgreSQL's system catalogs. The extension routes analytical queries on columnstore tables through DuckDB's vectorized execution engine, which processes data in batches rather than row-by-row.
Installation and extension state
Installing pg_mooncake requires building the extension from source or using a pre-built package if available for your PostgreSQL version. After installation, you enable the extension in your database with CREATE EXTENSION pg_mooncake. The extension is currently in active development and may not support all PostgreSQL features or data types.
Row to columnar conversion path
Converting an existing PostgreSQL table to columnstore format happens through a new table creation process rather than an in-place conversion. You create a new columnstore table with the same schema as your existing table, then copy data from the old table to the new one. pg_mooncake stores columnstore table data in a separate location from PostgreSQL's data directory, typically in a DuckDB database file.
SQL feature gaps
Columnstore tables in pg_mooncake don't support all PostgreSQL features that work with regular heap tables. Indexes like B-trees and GiST indexes aren't available on columnstore tables because the columnar storage model makes traditional indexing less useful. Foreign key constraints, triggers, and some data types may not work with columnstore tables.
ClickHouse® fundamentals for real-time analytics
ClickHouse® is a column-oriented database management system built specifically for online analytical processing (OLAP). Unlike PostgreSQL, which started as a general-purpose relational database and added analytical capabilities through extensions, ClickHouse® was designed from the ground up to handle high-volume analytical queries on large datasets. The database uses a distributed architecture that spreads data across multiple servers and processes queries in parallel.
MergeTree architecture
The MergeTree table engine is ClickHouse®'s primary storage engine for analytical workloads. It stores data in sorted parts on disk, with each part containing a subset of table rows sorted by the table's primary key. When new data arrives, ClickHouse® writes it to a new part rather than updating existing parts, then background processes later merge smaller parts into larger ones.
Materialized views for rollups
Materialized views in ClickHouse® pre-compute and store query results, which speeds up repeated analytical queries. When you create a materialized view, ClickHouse® automatically updates it as new data arrives in the source table. This differs from PostgreSQL's materialized views, which require manual refreshes.
Adaptive concurrency model
ClickHouse® handles concurrent analytical queries through a thread pool that dynamically allocates resources based on query complexity and system load. Each query can use multiple threads to parallelize operations like reading data, filtering, and aggregating. The number of threads per query adjusts based on the query's memory requirements and the number of other queries running at the same time.
Benchmark methodology and test setup
ClickBench is a standardized benchmark for analytical databases that measures query performance on a realistic dataset. The benchmark uses 100 million rows of web analytics data with columns for timestamps, URLs, user agents, and various metrics. The test suite includes 43 queries covering common analytical patterns like simple aggregations, filtered scans, group-by operations with multiple dimensions, and queries with complex expressions.
Hardware and cloud specs
Fair benchmark comparisons use identical or equivalent hardware for each database system. ClickBench results typically report instance types from major cloud providers like AWS, GCP, or Azure, along with CPU core count, memory size, and storage type. For example, a common test configuration might use an AWS c6a.4xlarge instance with 16 vCPUs, 32 GB RAM, and NVMe SSD storage.
Dataset size and schema
The ClickBench dataset contains 99,997,497 rows representing web hit logs. The schema includes approximately 100 columns with various data types like integers, strings, timestamps, and floating-point numbers. Column cardinality varies widely, from low-cardinality enum-like fields with dozens of distinct values to high-cardinality string columns with millions of unique values.
Query suite selection
- Simple queries: Compute single aggregates like
COUNT(*)orSUM(column) - Complex queries: Join multiple conditions, compute percentiles, or group by several dimensions
- Scan patterns: Some queries scan the entire dataset, others filter to small subsets using
WHEREclauses
Latency and throughput results on ClickBench
According to ClickBench results, PostgreSQL with pg_mooncake achieved rankings that placed it in the top tier of analytical databases, though specific query times varied significantly by query type. ClickHouse® typically ranks near the top of the benchmark across most query types. Both systems complete most queries in under a second on the standard ClickBench dataset.
Single-node results
On single-node configurations, ClickHouse® typically completes the full ClickBench query suite faster than PostgreSQL with pg_mooncake. For simple aggregation queries that scan large portions of the dataset, ClickHouse® often shows 2-5x faster execution times. Queries with complex filtering or string operations show smaller performance differences.
| Query Type | ClickHouse® (median) | pg_mooncake (median) | Difference |
|---|---|---|---|
| Simple aggregations | 0.02s | 0.08s | 4x |
| Filtered scans | 0.15s | 0.25s | 1.7x |
| Multi-dimension grouping | 0.45s | 0.80s | 1.8x |
| String operations | 0.30s | 0.35s | 1.2x |
Tuning effort required to reach these numbers
Achieving optimal benchmark performance requires configuration changes and schema optimizations beyond default settings. ClickHouse® typically requires less tuning to reach good performance because its defaults assume analytical workloads. PostgreSQL's defaults optimize for transactional workloads, so reaching competitive analytical performance needs more configuration changes.
PostgreSQL's default configuration assumes small datasets and transactional workloads. For analytical performance, you'll want to increase shared_buffers to allocate more memory for caching data, often setting it to 25-40% of available RAM. The work_mem setting controls memory available for sort and hash operations in queries, and analytical queries often need more working memory than the default 4 MB.
Cost and ops overhead at scale
Operating analytical databases at scale involves ongoing costs beyond initial setup. ClickHouse® and PostgreSQL with pg_mooncake have different operational characteristics that affect long-term costs. ClickHouse®'s specialized design often results in lower storage costs and simpler scaling, while pg_mooncake's PostgreSQL integration might reduce operational overhead if you're already running PostgreSQL.
Data compression efficiency affects storage costs over time. ClickHouse® typically achieves 5-10x compression on typical analytical datasets, meaning 1 TB of raw data occupies 100-200 GB on disk. pg_mooncake's compression ratios depend on DuckDB's compression algorithms and typically fall in a similar range. However, if you're maintaining both row-oriented tables for transactional access and columnstore tables for analytics, you'll need roughly 2x the storage.
When to keep analytics in Postgres vs adopt ClickHouse®
The decision between PostgreSQL with pg_mooncake and ClickHouse® depends on your specific workload characteristics, team capabilities, and growth expectations. If your analytical workload is tightly coupled with transactional operations, keeping everything in PostgreSQL might reduce architectural complexity. If analytics dominate your workload and require maximum performance, ClickHouse®'s specialized design often provides better results.
Workload patterns favoring Postgres
PostgreSQL with pg_mooncake makes sense when your workload combines frequent transactional operations with analytical queries:
- Small to medium datasets: Under 1 TB where PostgreSQL's performance suffices
- Mixed workloads: Applications needing both ACID transactions and analytics
- Existing PostgreSQL expertise: Teams already operating PostgreSQL in production
- Frequent updates: Applications modifying analytical data regularly
Tipping points for ClickHouse®
ClickHouse® becomes the better choice when analytical query performance becomes a bottleneck or when data volumes grow beyond what PostgreSQL handles efficiently:
- Query complexity: Analytical queries with multiple aggregations and large scans
- Data volume: Multi-terabyte datasets with billions of rows
- High concurrency: Many simultaneous analytical queries from dashboards or reports
- Insert throughput: High-volume streaming data ingestion
Hybrid architectures
Some teams use both PostgreSQL and ClickHouse® for different purposes. PostgreSQL handles transactional workloads and serves as the source of truth, while ClickHouse® stores denormalized copies optimized for analytics. This architecture requires data replication from PostgreSQL to ClickHouse®, typically through change data capture (CDC) tools or application-level writes to both systems.
Tinybird for managed ClickHouse® without the ops burden
Tinybird provides a managed ClickHouse® platform that eliminates infrastructure setup and operational complexity. Instead of provisioning servers, configuring clusters, and managing scaling, you define data sources and SQL queries that Tinybird deploys as API endpoints. The platform handles ClickHouse® cluster management, automatic scaling, and performance optimization behind the scenes.
Tinybird's developer-focused approach includes features like local development with tb dev, version control for SQL pipelines, and CI/CD integration for deploying changes. Sign up for a free Tinybird account to try managed ClickHouse® without infrastructure work.
FAQs about ClickHouse® vs pg_mooncake
How hard is it to migrate schemas from Postgres to ClickHouse®?
Schema migration requires mapping PostgreSQL data types to ClickHouse® equivalents and rewriting queries to use ClickHouse® syntax. Common types like integers, strings, and timestamps map directly, but PostgreSQL-specific types like JSONB need alternative approaches in ClickHouse®.
Can pg_mooncake be combined with Citus or TimescaleDB?
pg_mooncake is a separate extension that modifies PostgreSQL's storage layer, which may conflict with other extensions that also modify storage or query execution. Testing compatibility in a development environment before production use is important.
Does pg_mooncake support JSONB or arrays?
pg_mooncake's support for complex PostgreSQL data types like JSONB and arrays is limited because DuckDB's type system differs from PostgreSQL's. Simple arrays of primitive types might work, but nested structures and JSONB operations may not translate correctly.
What happens to VACUUM when using pg_mooncake?
PostgreSQL's VACUUM process reclaims storage from deleted or updated rows in heap tables. Columnstore tables in pg_mooncake use a different storage format managed by DuckDB, so traditional VACUUM operations don't apply to them.
/
