Aggregating millions of events at read time can be expensive. It's better to incrementally update aggregates as new data arrives. ClickHouse materialized views solve this by pre-computing results on an ingest trigger, shifting computational work from query time to insert time.
This article walks through creating materialized views in ClickHouse with complete working examples, explains when to use them over alternatives like projections, and shows how to avoid common performance pitfalls.
What is a materialized view in ClickHouse?
A materialized view in ClickHouse is a database object that stores the results of a query and automatically updates when new data arrives in the source table. When you insert rows into the source table, the materialized view runs its SELECT query on those new rows and writes the transformed results to a target table.
Regular views recalculate results every time you query them, which means the database does the same work over and over. Materialized views do the computational work once during data insertion, so SELECT queries run faster because the results already exist.
This pattern works well for aggregations and transformations you run repeatedly. If you frequently query daily user activity summaries from a raw events table, a materialized view can pre-aggregate those metrics as new events arrive.
Full create materialized view syntax explained
The CREATE MATERIALIZED VIEW
statement follows this structure: CREATE MATERIALIZED VIEW view_name [TO target_table] [ENGINE = engine_type] [POPULATE] AS SELECT ...
. Each clause controls where data goes, how it's stored, and whether historical data gets processed.
TO
clause and internal table
The TO
clause directs output to an existing table you've already created. Using TO existing_table
means you define the target table's schema, engine, and settings before creating the view.
Without a TO
clause, ClickHouse automatically creates an internal table to store results. This internal table uses the view's name prefixed with .inner.
, and ClickHouse infers its schema from the SELECT query.
Creating the target table explicitly with TO
gives you control over table engines, partitioning, and ordering. This approach also makes it easier to query the aggregated data directly without referencing the materialized view name.
POPULATE
option
The POPULATE
keyword tells ClickHouse to process all existing data in the source table when you create the materialized view. Without POPULATE
, the view only processes new rows inserted after creation.
Use POPULATE
when you have historical data that belongs in your materialized view. The operation runs synchronously and can take significant time on large tables, blocking until all existing data has been processed.
Refresh schedule with REFRESH EVERY
Refreshable materialized views update on a schedule rather than in real-time. The syntax REFRESH EVERY interval
defines how often ClickHouse recalculates the entire view from scratch.
For example, CREATE MATERIALIZED VIEW daily_summary REFRESH EVERY 1 DAY AS SELECT ...
rebuilds the view once per day. This pattern works when you don't require real-time updates and want to reduce the overhead of continuous incremental processing.
Engine choice for the inner table
The table engine determines how ClickHouse stores and merges the data written by your materialized view. MergeTree
engines work for most use cases, while specialized engines like SummingMergeTree
or AggregatingMergeTree
optimize specific aggregation patterns.
- MergeTree: General-purpose engine for filtering and transformation without heavy aggregation
- SummingMergeTree: Automatically sums numeric columns during background merges, reducing storage for accumulated totals
- AggregatingMergeTree: Stores intermediate aggregation states using functions like
sumState()
oruniqState()
, allowing you to combine partial aggregates later
Step-by-step example: aggregate pageviews into daily metrics
This example shows how to build a materialized view that aggregates raw web pageview events into daily summary statistics. You'll create a source table for raw events, define a materialized view with aggregation logic, insert sample data, and query the pre-computed results.
1. Create the raw events table
First, create a table to store individual pageview events:
CREATE TABLE pageview_events (
event_time DateTime,
user_id UInt64,
page_url String,
session_id String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);
This table uses MergeTree
with an ordering key on event_time
and user_id
, which optimizes queries that filter by time ranges or specific users.
2. Create the target table for aggregated data
Next, create a separate table to store the daily aggregated metrics:
CREATE TABLE daily_pageview_summary (
event_date Date,
total_pageviews UInt64,
unique_users UInt64,
unique_sessions UInt64
) ENGINE = SummingMergeTree()
ORDER BY event_date;
The SummingMergeTree
engine automatically sums the numeric columns during background merges, keeping storage compact as data accumulates.
3. Create the materialized view
Now define the materialized view that transforms raw events into daily aggregates:
CREATE MATERIALIZED VIEW mv_daily_pageviews
TO daily_pageview_summary
AS
SELECT
toDate(event_time) AS event_date,
count() AS total_pageviews,
uniq(user_id) AS unique_users,
uniq(session_id) AS unique_sessions
FROM pageview_events
GROUP BY event_date;
This view groups events by date and calculates three metrics: total pageviews, unique users, and unique sessions.
4. Insert sample data
Insert some test pageview events:
INSERT INTO pageview_events VALUES
('2024-01-15 10:30:00', 101, '/home', 'session_1'),
('2024-01-15 10:35:00', 101, '/products', 'session_1'),
('2024-01-15 11:00:00', 102, '/home', 'session_2'),
('2024-01-15 14:20:00', 103, '/about', 'session_3'),
('2024-01-16 09:15:00', 101, '/home', 'session_4'),
('2024-01-16 09:20:00', 104, '/products', 'session_5');
As soon as the rows are inserted, the materialized view processes them and writes aggregated results to daily_pageview_summary
.
5. Query the aggregated table
Query the summary table to see the pre-computed daily metrics:
SELECT * FROM daily_pageview_summary ORDER BY event_date;
The query returns results instantly because the aggregation already happened during insert time:
┌─event_date─┬─total_pageviews─┬─unique_users─┬─unique_sessions─┐
│ 2024-01-15 │ 4 │ 3 │ 3 │
│ 2024-01-16 │ 2 │ 2 │ 2 │
└────────────┴─────────────────┴──────────────┴─────────────────┘
How to back-fill data with ClickHouse populate materialized view
When you create a materialized view on a table that already contains data, the view only processes new inserts by default. To include historical data in your materialized view, you use the POPULATE
keyword during view creation.
When to use populate
Use POPULATE
when you're adding a materialized view to an existing table with historical data. For example, if you have six months of pageview events and create a new daily summary view, POPULATE
processes all six months of history.
Another common scenario is replacing an existing materialized view with new aggregation logic. You'd drop the old view, create the new one with POPULATE
, and all historical data gets reprocessed according to the new logic.
Risks of data duplication
The POPULATE operation processes all rows currently in the source table, which can create duplicates if the target table already contains data. This typically happens when you drop and recreate a materialized view without first truncating its target table.
To avoid duplicates, either use a fresh target table when creating a populated view, or explicitly truncate the existing target table before running CREATE MATERIALIZED VIEW ... POPULATE
:
TRUNCATE TABLE daily_pageview_summary;
CREATE MATERIALIZED VIEW mv_daily_pageviews
TO daily_pageview_summary
POPULATE
AS
SELECT
toDate(event_time) AS event_date,
count() AS total_pageviews,
uniq(user_id) AS unique_users
FROM pageview_events
GROUP BY event_date;
Incremental vs. refreshable materialized views
ClickHouse supports two fundamentally different patterns for materialized views: incremental views that update in real-time as data arrives, and refreshable views that rebuild on a schedule.
Incremental materialized views process new rows immediately during INSERT operations. As soon as data lands in the source table, the view's SELECT query runs synchronously on INSERT, processing the new rows and writing results to the target table.
Refreshable materialized views recalculate their entire result set at scheduled intervals, such as every hour or every day, supporting both APPEND and REPLACE modes for different use cases.
Here's how the two patterns compare:
Characteristic | Incremental | Refreshable |
---|---|---|
Update frequency | Real-time with each insert | Scheduled intervals |
Resource usage | Overhead on every insert | Batch processing at refresh time |
Data freshness | Always current | Stale until next refresh |
Source data requirements | Append-only or careful handling of updates | Any data pattern |
Choose incremental views when you require up-to-the-second accuracy and your source table receives a steady stream of new data. Choose refreshable views when scheduled updates are acceptable and you want to minimize insert-time overhead.
ClickHouse projection vs materialized view: when to choose each
Projections and materialized views both pre-compute query results in ClickHouse, but they have different architectures and performance characteristics.
Read latency comparison
Projections are stored within the same table as your source data, which means ClickHouse can automatically choose to use them during queries without you explicitly referencing them. The query optimizer selects the most efficient projection based on your WHERE clause and ORDER BY.
Materialized views store results in separate tables, which means you explicitly query the target table to access pre-computed results. For most analytical queries, both approaches deliver sub-second latency.
Storage overhead comparison
Projections are part of the source table's data structure, stored alongside the original data and participating in the same merge operations. Each projection adds storage proportional to the columns it includes and how it transforms them.
Materialized views create entirely separate tables with their own storage, partitioning, and merge behavior. This separation gives you more flexibility to use different table engines or retention policies.
Maintenance workflow comparison
Projections are managed as part of the source table definition using ALTER TABLE
commands. Adding or removing a projection requires an ALTER
statement, and ClickHouse handles the projection's lifecycle automatically when you drop the table.
Materialized views exist as separate database objects with their own lifecycle. You create and drop them independently, which gives you more flexibility but also means more objects to manage and monitor.
Performance tips and common pitfalls
Materialized views can significantly speed up queries, but poorly designed views can slow down inserts or produce incorrect results.
Avoid SELECT *
in the view query
Using SELECT *
in a materialized view copies all columns from the source table, even if you only require a few. This increases storage, slows down inserts, and makes the view harder to maintain when the source table schema changes.
Instead, explicitly list only the columns you require:
-- Bad: copies all columns
CREATE MATERIALIZED VIEW mv_bad AS
SELECT * FROM events;
-- Good: only necessary columns
CREATE MATERIALIZED VIEW mv_good AS
SELECT event_date, user_id, event_type FROM events;
Match inner and outer ORDER BY
The target table's ORDER BY
determines how data is physically stored on disk. When the materialized view's GROUP BY matches this ordering, ClickHouse can write data more efficiently and merge operations run faster.
For example, if your target table uses ORDER BY (event_date, user_id), structure your view's GROUP BY to match:
CREATE TABLE daily_user_summary (
event_date Date,
user_id UInt64,
pageviews UInt64
) ENGINE = SummingMergeTree()
ORDER BY (event_date, user_id);
CREATE MATERIALIZED VIEW mv_daily_user
TO daily_user_summary
AS
SELECT
toDate(event_time) AS event_date,
user_id,
count() AS pageviews
FROM events
GROUP BY event_date, user_id;
Monitor insert latency
Materialized views add processing overhead to INSERT operations because ClickHouse runs the view's query on every batch of new rows. Multiple materialized views on the same source table multiply this overhead, with performance degrading nearly linearly as you add more views.
Monitor your insert latency using the system.query_log
table:
SELECT
query_duration_ms,
query
FROM system.query_log
WHERE type = 'QueryFinish'
AND query LIKE '%INSERT INTO events%'
ORDER BY event_time DESC
LIMIT 10;
Creating materialized views with Tinybird: avoiding expensive backfills
Tinybird is a managed ClickHouse service that simplifies materialized view creation and solves one of ClickHouse's most painful challenges: populating materialized views on tables with existing data.
When you create a materialized view with POPULATE
in standard ClickHouse, the operation blocks your cluster while processing potentially billions of rows. For large tables, this can take hours or even days, during which your production queries compete for resources with the backfill operation.
Tinybird eliminates this problem through ultra-fast compute/compute separation, where populate operations run on dedicated compute resources that don't impact your live query traffic.
How Tinybird handles populates differently
First, install the Tinybird CLI and start a local ClickHouse container for testing:
curl -L tinybird.co | sh
tb login
tb local start
Create a data source file called events.datasource
:
SCHEMA >
`event_time` DateTime,
`user_id` UInt64,
`event_type` String,
`value` Float64
ENGINE MergeTree
ENGINE_SORTING_KEY event_time, user_id
Next, create the target datasource for your materialized view called daily_summary_mv.datasource
:
SCHEMA >
`event_date` Date,
`event_type` String,
`event_count` UInt64,
`total_value` Float64
ENGINE SummingMergeTree
ENGINE_SORTING_KEY event_date, event_type
Then create a pipe file called daily_summary.pipe
that defines your materialized view:
NODE daily_aggregation
SQL >
SELECT
toDate(event_time) AS event_date,
event_type,
count() AS event_count,
sum(value) AS total_value
FROM events
GROUP BY event_date, event_type
TYPE materialized
DATASOURCE daily_summary_mv
Testing locally with tb dev
Test your materialized view locally before deploying:
tb dev
This starts an interactive development session where you can append data to your source and query the materialized view to verify the aggregation logic works correctly. The local environment behaves identically to production, so you can catch issues before they affect live data.
Deploying with automatic populate handling
When you're ready to deploy, Tinybird detects if your materialized view needs to process historical data and handles it automatically:
tb --cloud deploy
Tinybird runs populate operations on dedicated compute infrastructure separate from your production query cluster. This means:
- No production query impact: Your live API endpoints and dashboards continue serving queries with zero degradation while the backfill runs
- Faster completion: Populate operations can use more CPU and memory than your production cluster allows, completing in hours instead of days
- Automatic retry logic: If the populate fails partway through, Tinybird automatically resumes from the last successful checkpoint
- Zero downtime: New data continues flowing into your materialized view through the normal incremental path while historical data backfills
The standard ClickHouse populate problem
In standard ClickHouse, creating a materialized view with POPULATE
on a table with 1 billion existing rows presents several challenges:
-- This runs synchronously and blocks
CREATE MATERIALIZED VIEW mv_daily_summary
TO daily_summary
POPULATE -- Can take 6-12+ hours on large tables
AS SELECT ...
During this operation:
- The
POPULATE
query competes with production queries for CPU, memory, and disk I/O - Your dashboards and APIs experience degraded performance or timeouts
- The operation holds locks that can block other DDL operations
- If it fails halfway through, you start over from scratch
Many teams work around this by:
- Running populates during maintenance windows (requiring downtime)
- Manually batching the backfill with
INSERT INTO ... SELECT
in smaller chunks - Building custom scripts to parallelize the work across multiple queries
- Simply skipping historical data and only processing new inserts going forward
For more insights on managing materialized views and memory issues in production, see our guide on operating ClickHouse at scale.
Tinybird's approach removes the need for any of these workarounds. The platform automatically handles compute separation, so you get fast, reliable populates without impacting production workloads.
Exposing the view as a secure API endpoint
Tinybird lets you query your materialized views through REST APIs with built-in authentication and parameterization.
Define a pipe with parameters
Create a pipe file that queries your materialized view with dynamic filters:
TOKEN daily_summary_read READ
NODE endpoint
SQL >
SELECT
event_date,
event_type,
event_count,
total_value
FROM daily_summary_mv
WHERE event_date >= {{Date(start_date, '2024-01-01')}}
AND event_date <= {{Date(end_date, today())}}
{/% if defined(event_type) %}
AND event_type = {{String(event_type)}}
{/% end %}
ORDER BY event_date DESC
TYPE endpoint
The {{Date(start_date, '2024-01-01')}}
syntax creates a URL parameter with a default value. Deploy this pipe:
tb --cloud deploy
Now you can query your materialized view via HTTP:
curl "https://api.tinybird.co/v0/pipes/daily_summary.json?start_date=2024-01-01&end_date=2024-01-31&event_type=purchase&token=your_token"
Next steps: build fast analytics without running ClickHouse yourself
Materialized views in ClickHouse provide significant query acceleration, but managing ClickHouse infrastructure adds operational overhead. Tinybird eliminates this complexity by providing a fully managed ClickHouse service designed for developers.
With Tinybird, you define your data sources and materialized views as code, test locally, and deploy to production in minutes. The platform handles cluster scaling, backup, monitoring, and optimization automatically.
Sign up for a free Tinybird account to try materialized views with managed ClickHouse infrastructure. The free tier includes enough resources to build and test real-world analytics use cases.
FAQs about ClickHouse materialized views
How do I alter a materialized view without dropping data?
You cannot alter materialized views directly in ClickHouse. The typical workflow is to create a new materialized view with the updated logic, optionally use POPULATE
to backfill historical data, then drop the old view once you've verified the new one works correctly.
How can I monitor lag between the source and the materialized view?
Query the system.parts
table to compare last modification times between your source table and the materialized view's target table. Significant time differences indicate that inserts to the source table might be slow or that the materialized view's processing is falling behind.
What happens to a materialized view after a table schema change?
Schema changes to source tables can break materialized views if the view references columns that were removed or if data types become incompatible. You'll typically drop and recreate the materialized view after significant schema changes to handle the new structure correctly.
Does a materialized view increase storage costs significantly?
Materialized views create additional storage overhead since they store transformed or aggregated data separately from the source table. The actual storage increase depends on how much the view reduces data volume through aggregation. Views that heavily aggregate data often use less storage than the source table, while views that simply reorder data can nearly double storage requirements.