---
title: "ClickHouse® materialized view: how to materialize data in ClickHouse®"
excerpt: "ClickHouse CREATE MATERIALIZED VIEW examples that actually work. Copy these patterns and adapt them for your streaming pipelines."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-10-14 19:26:38"
publishedOn: "2025-10-16 19:26:38"
updatedOn: "2025-10-16 19:26:38"
status: "published"
---

Aggregating millions of events at read time can be expensive. It's better to incrementally update aggregates as new data arrives. ClickHouse{% sup %}®{% /sup %} materialized views solve this by pre-computing results on an ingest trigger, shifting computational work from query time to insert time.

This article walks through creating materialized views in ClickHouse{% sup %}®{% /sup %} with complete working examples, explains when to use them over alternatives like projections, and shows how to avoid common performance pitfalls.

## What is a materialized view in ClickHouse{% sup %}®{% /sup %}?

A materialized view in ClickHouse{% sup %}®{% /sup %} is a database object that stores the results of a query and automatically updates when new data arrives in the source table. When you insert rows into the source table, the materialized view runs its SELECT query on those new rows and writes the transformed results to a target table.

Regular views recalculate results every time you query them, which means the database does the same work over and over. Materialized views do the computational work once during data insertion, so SELECT queries run faster because the results already exist.

This pattern works well for aggregations and transformations you run repeatedly. If you frequently query daily user activity summaries from a raw events table, a materialized view can pre-aggregate those metrics as new events arrive.

## Full create materialized view syntax explained

The `CREATE MATERIALIZED VIEW` statement follows this structure: `CREATE MATERIALIZED VIEW view_name [TO target_table] [ENGINE = engine_type] [POPULATE] AS SELECT ...`. Each clause controls where data goes, how it's stored, and whether historical data gets processed.

### `TO` clause and internal table

The `TO` clause directs output to an existing table you've already created. Using `TO existing_table` means you define the target table's schema, engine, and settings before creating the view.

Without a `TO` clause, ClickHouse{% sup %}®{% /sup %} automatically creates an internal table to store results. This internal table uses the view's name prefixed with `.inner.`, and ClickHouse{% sup %}®{% /sup %} infers its schema from the SELECT query.

Creating the target table explicitly with `TO` gives you control over table engines, partitioning, and ordering. This approach also makes it easier to query the aggregated data directly without referencing the materialized view name.

### `POPULATE` option

The `POPULATE` keyword tells ClickHouse{% sup %}®{% /sup %} to process all existing data in the source table when you create the materialized view. Without `POPULATE`, the view only processes new rows inserted after creation.

Use `POPULATE` when you have historical data that belongs in your materialized view. The operation runs synchronously and can take significant time on large tables, blocking until all existing data has been processed.

### Refresh schedule with `REFRESH EVERY`

Refreshable materialized views update on a schedule rather than in real-time. The syntax `REFRESH EVERY interval` defines how often ClickHouse{% sup %}®{% /sup %} recalculates the entire view from scratch.

For example, `CREATE MATERIALIZED VIEW daily_summary REFRESH EVERY 1 DAY AS SELECT ...` rebuilds the view once per day. This pattern works when you don't require real-time updates and want to reduce the overhead of continuous incremental processing.

### Engine choice for the inner table

The table engine determines how ClickHouse{% sup %}®{% /sup %} stores and merges the data written by your materialized view. `MergeTree` engines work for most use cases, while specialized engines like `SummingMergeTree` or `AggregatingMergeTree` optimize specific aggregation patterns.

* **MergeTree**: General-purpose engine for filtering and transformation without heavy aggregation
* **SummingMergeTree**: Automatically sums numeric columns during background merges, reducing storage for accumulated totals
* **AggregatingMergeTree**: Stores intermediate aggregation states using functions like `sumState()` or `uniqState()`, allowing you to combine partial aggregates later

## Step-by-step example: aggregate pageviews into daily metrics

This example shows how to build a materialized view that aggregates raw web pageview events into daily summary statistics. You'll create a source table for raw events, define a materialized view with aggregation logic, insert sample data, and query the pre-computed results.

### 1. Create the raw events table

First, [create a table](https://www.tinybird.co/blog-posts/clickhouse-create-table-example) to store individual pageview events:

```sql
CREATE TABLE pageview_events (
    event_time DateTime,
    user_id UInt64,
    page_url String,
    session_id String
) ENGINE = MergeTree()
ORDER BY (event_time, user_id);
```

This table uses `MergeTree` with an ordering key on `event_time` and `user_id`, which optimizes queries that filter by time ranges or specific users.

### 2. Create the target table for aggregated data

Next, create a separate table to store the daily aggregated metrics:

```sql
CREATE TABLE daily_pageview_summary (
    event_date Date,
    total_pageviews UInt64,
    unique_users UInt64,
    unique_sessions UInt64
) ENGINE = SummingMergeTree()
ORDER BY event_date;
```

The [`SummingMergeTree` engine automatically sums](https://www.tinybird.co/blog-posts/roll-up-data-with-materialized-views) the numeric columns during background merges, keeping storage compact as data accumulates.

### 3. Create the materialized view

Now define the materialized view that transforms raw events into daily aggregates:

```sql
CREATE MATERIALIZED VIEW mv_daily_pageviews
TO daily_pageview_summary
AS
SELECT
    toDate(event_time) AS event_date,
    count() AS total_pageviews,
    uniq(user_id) AS unique_users,
    uniq(session_id) AS unique_sessions
FROM pageview_events
GROUP BY event_date;
```

This view groups events by date and calculates three metrics: total pageviews, unique users, and unique sessions.

### 4. Insert sample data

Insert some test pageview events:

```sql
INSERT INTO pageview_events VALUES
    ('2024-01-15 10:30:00', 101, '/home', 'session_1'),
    ('2024-01-15 10:35:00', 101, '/products', 'session_1'),
    ('2024-01-15 11:00:00', 102, '/home', 'session_2'),
    ('2024-01-15 14:20:00', 103, '/about', 'session_3'),
    ('2024-01-16 09:15:00', 101, '/home', 'session_4'),
    ('2024-01-16 09:20:00', 104, '/products', 'session_5');
```

As soon as the rows are inserted, the materialized view processes them and writes aggregated results to `daily_pageview_summary`.

### 5. Query the aggregated table

Query the summary table to see the pre-computed daily metrics:

```sql
SELECT * FROM daily_pageview_summary ORDER BY event_date;
```

The query returns results instantly because the aggregation already happened during insert time:

```text
┌─event_date─┬─total_pageviews─┬─unique_users─┬─unique_sessions─┐
│ 2024-01-15 │               4 │            3 │               3 │
│ 2024-01-16 │               2 │            2 │               2 │
└────────────┴─────────────────┴──────────────┴─────────────────┘
```

## How to back-fill data with ClickHouse{% sup %}®{% /sup %} populate materialized view

When you create a materialized view on a table that already contains data, the view only processes new inserts by default. To include historical data in your materialized view, you use the `POPULATE` keyword during view creation.

### When to use populate

Use `POPULATE` when you're adding a materialized view to an existing table with historical data. For example, if you have six months of pageview events and create a new daily summary view, `POPULATE` processes all six months of history.

Another common scenario is replacing an existing materialized view with new aggregation logic. You'd drop the old view, create the new one with `POPULATE`, and all historical data gets reprocessed according to the new logic.

### Risks of data duplication

The [POPULATE operation processes all rows](https://www.tinybird.co/blog-posts/compute-compute-separation-for-populates) currently in the source table, which can create duplicates if the target table already contains data. This typically happens when you drop and recreate a materialized view without first truncating its target table.

To avoid duplicates, either use a fresh target table when creating a populated view, or explicitly truncate the existing target table before running `CREATE MATERIALIZED VIEW ... POPULATE`:

```sql
TRUNCATE TABLE daily_pageview_summary;

CREATE MATERIALIZED VIEW mv_daily_pageviews
TO daily_pageview_summary
POPULATE
AS
SELECT
    toDate(event_time) AS event_date,
    count() AS total_pageviews,
    uniq(user_id) AS unique_users
FROM pageview_events
GROUP BY event_date;
```

## Incremental vs. refreshable materialized views

ClickHouse{% sup %}®{% /sup %} supports two fundamentally different patterns for materialized views: incremental views that update in real-time as data arrives, and refreshable views that rebuild on a schedule.

Incremental materialized views process new rows immediately during INSERT operations. As soon as data lands in the source table, the view's SELECT query [runs synchronously on INSERT](https://clickhouse.com/docs/managing-data/materialized-views-versus-projections), processing the new rows and writing results to the target table.

Refreshable materialized views recalculate their entire result set at scheduled intervals, such as every hour or every day, supporting both [APPEND and REPLACE modes](https://clickhouse.com/docs/best-practices/use-materialized-views) for different use cases.

Here's how the two patterns compare:

| Characteristic             | Incremental                                | Refreshable                      |
|----------------------------|--------------------------------------------|----------------------------------|
| Update frequency           | Real-time with each insert                 | Scheduled intervals              |
| Resource usage             | Overhead on every insert                   | Batch processing at refresh time |
| Data freshness             | Always current                             | Stale until next refresh         |
| Source data requirements   | Append-only or careful handling of updates | Any data pattern                 |

Choose incremental views when you require up-to-the-second accuracy and your source table receives a steady stream of new data. Choose refreshable views when scheduled updates are acceptable and you want to minimize insert-time overhead.

## ClickHouse{% sup %}®{% /sup %} projection vs materialized view: when to choose each

Projections and materialized views both pre-compute query results in ClickHouse{% sup %}®{% /sup %}, but they have different architectures and performance characteristics.

### Read latency comparison

Projections are stored within the same table as your source data, which means ClickHouse{% sup %}®{% /sup %} can automatically choose to use them during queries without you explicitly referencing them. The query optimizer selects the most efficient projection based on your WHERE clause and ORDER BY.

Materialized views store results in separate tables, which means you explicitly query the target table to access pre-computed results. For most analytical queries, both approaches deliver sub-second latency.

### Storage overhead comparison

Projections are part of the source table's data structure, stored alongside the original data and participating in the same merge operations. Each projection adds storage proportional to the columns it includes and how it transforms them.

Materialized views create entirely separate tables with their own storage, partitioning, and merge behavior. This separation gives you more flexibility to use different table engines or retention policies.

### Maintenance workflow comparison

Projections are managed as part of the source table definition using `ALTER TABLE` commands. Adding or removing a projection requires an `ALTER` statement, and ClickHouse{% sup %}®{% /sup %} handles the projection's lifecycle automatically when you drop the table.

Materialized views exist as separate database objects with their own lifecycle. You create and drop them independently, which gives you more flexibility but also means more objects to manage and monitor.

## Performance tips and common pitfalls

Materialized views can significantly speed up queries, but [poorly designed views can slow down inserts](https://www.tinybird.co/blog-posts/data-rules-materialized-views-40000-dollars) or produce incorrect results.

### Avoid `SELECT *` in the view query

Using `SELECT *` in a materialized view copies all columns from the source table, even if you only require a few. This increases storage, slows down inserts, and makes the view harder to maintain when the source table schema changes.

Instead, explicitly list only the columns you require:

```sql
-- Bad: copies all columns
CREATE MATERIALIZED VIEW mv_bad AS
SELECT * FROM events;

-- Good: only necessary columns
CREATE MATERIALIZED VIEW mv_good AS
SELECT event_date, user_id, event_type FROM events;
```

### Match inner and outer `ORDER BY`

The target table's `ORDER BY` determines how data is physically stored on disk. When the materialized view's GROUP BY matches this ordering, ClickHouse{% sup %}®{% /sup %} can write data more efficiently and merge operations run faster.

For example, if your target table uses [ORDER BY (event_date, user_id)](https://www.tinybird.co/blog-posts/5-rules-for-writing-faster-sql-queries), structure your view's GROUP BY to match:

```sql
CREATE TABLE daily_user_summary (
    event_date Date,
    user_id UInt64,
    pageviews UInt64
) ENGINE = SummingMergeTree()
ORDER BY (event_date, user_id);

CREATE MATERIALIZED VIEW mv_daily_user
TO daily_user_summary
AS
SELECT
    toDate(event_time) AS event_date,
    user_id,
    count() AS pageviews
FROM events
GROUP BY event_date, user_id;
```

### Monitor insert latency

Materialized views add processing overhead to INSERT operations because ClickHouse{% sup %}®{% /sup %} runs the view's query on every batch of new rows. Multiple materialized views on the same source table multiply this overhead, with performance [degrading nearly linearly](https://yandex.cloud/en/blog/posts/2025/03/materialized-views) as you add more views.

Monitor your insert latency using the `system.query_log` table:

```sql
SELECT
    query_duration_ms,
    query
FROM system.query_log
WHERE type = 'QueryFinish'
  AND query LIKE '%INSERT INTO events%'
ORDER BY event_time DESC
LIMIT 10;
```

## Creating materialized views with Tinybird: avoiding expensive backfills

[Tinybird](https://tinybird.co) is a managed ClickHouse{% sup %}®{% /sup %} service that simplifies materialized view creation and solves one of ClickHouse{% sup %}®{% /sup %}'s most painful challenges: populating materialized views on tables with existing data.

When you create a materialized view with `POPULATE` in standard ClickHouse{% sup %}®{% /sup %}, the operation blocks your cluster while processing potentially billions of rows. For large tables, this can take hours or even days, during which your production queries compete for resources with the backfill operation.

Tinybird eliminates this problem through [ultra-fast compute/compute separation](https://www.tinybird.co/blog-posts/compute-compute-separation-for-faster-populates), where populate operations run on dedicated compute resources that don't impact your live query traffic.

### How Tinybird handles populates differently

First, install the Tinybird CLI and start a local ClickHouse{% sup %}®{% /sup %} container for testing:

```sh
curl -L tinybird.co | sh
tb login
tb local start
```

Create a data source file called `events.datasource`:

```tinybird
SCHEMA >
    `event_time` DateTime,
    `user_id` UInt64,
    `event_type` String,
    `value` Float64

ENGINE MergeTree
ENGINE_SORTING_KEY event_time, user_id
```

Next, create the target datasource for your materialized view called `daily_summary_mv.datasource`:

```tinybird
SCHEMA >
    `event_date` Date,
    `event_type` String,
    `event_count` UInt64,
    `total_value` Float64

ENGINE SummingMergeTree
ENGINE_SORTING_KEY event_date, event_type
```

Then create a pipe file called `daily_summary.pipe` that defines your materialized view:

```tinybird
NODE daily_aggregation
SQL >
    SELECT
        toDate(event_time) AS event_date,
        event_type,
        count() AS event_count,
        sum(value) AS total_value
    FROM events
    GROUP BY event_date, event_type

TYPE materialized
DATASOURCE daily_summary_mv
```

### Testing locally with `tb dev`

Test your materialized view locally before deploying:

```sh
tb dev
```

This starts an interactive development session where you can append data to your source and query the materialized view to verify the aggregation logic works correctly. The local environment behaves identically to production, so you can catch issues before they affect live data.

### Deploying with automatic populate handling

When you're ready to deploy, Tinybird detects if your materialized view needs to process historical data and handles it automatically:

```sh
tb --cloud deploy
```

Tinybird runs populate operations on dedicated compute infrastructure separate from your production query cluster. This means:

1. **No production query impact**: Your live API endpoints and dashboards continue serving queries with zero degradation while the backfill runs
2. **Faster completion**: Populate operations can use more CPU and memory than your production cluster allows, completing in hours instead of days
3. **Automatic retry logic**: If the populate fails partway through, Tinybird automatically resumes from the last successful checkpoint
4. **Zero downtime**: New data continues flowing into your materialized view through the normal incremental path while historical data backfills

### The standard ClickHouse{% sup %}®{% /sup %} populate problem

In standard ClickHouse{% sup %}®{% /sup %}, creating a materialized view with `POPULATE` on a table with 1 billion existing rows presents several challenges:

```sql
-- This runs synchronously and blocks
CREATE MATERIALIZED VIEW mv_daily_summary
TO daily_summary
POPULATE  -- Can take 6-12+ hours on large tables
AS SELECT ...
```

During this operation:

* The `POPULATE` query competes with production queries for CPU, memory, and disk I/O
* Your dashboards and APIs experience degraded performance or timeouts
* The operation holds locks that can block other DDL operations
* If it fails halfway through, you start over from scratch

Many teams work around this by:

* Running populates during maintenance windows (requiring downtime)
* Manually batching the backfill with `INSERT INTO ... SELECT` in smaller chunks
* Building custom scripts to parallelize the work across multiple queries
* Simply skipping historical data and only processing new inserts going forward

For more insights on managing materialized views and memory issues in production, see our guide on [operating ClickHouse{% sup %}®{% /sup %} at scale](https://www.tinybird.co/blog-posts/what-i-learned-operating-clickhouse).

Tinybird's approach removes the need for any of these workarounds. The platform automatically handles compute separation, so you get fast, reliable populates without impacting production workloads.

## Exposing the view as a secure API endpoint

Tinybird lets you query your materialized views through REST APIs with built-in authentication and parameterization.

### Define a pipe with parameters

Create a pipe file that queries your materialized view with dynamic filters:

```sql
TOKEN daily_summary_read READ

NODE endpoint
SQL >
    SELECT
        event_date,
        event_type,
        event_count,
        total_value
    FROM daily_summary_mv
    WHERE event_date >= {{Date(start_date, '2024-01-01')}}
      AND event_date <= {{Date(end_date, today())}}
    {/% if defined(event_type) %}
      AND event_type = {{String(event_type)}}
    {/% end %}
    ORDER BY event_date DESC

TYPE endpoint
```

The `{{Date(start_date, '2024-01-01')}}` syntax creates a URL parameter with a default value. Deploy this pipe:

```sh
tb --cloud deploy
```

Now you can query your materialized view via HTTP:

```sh
curl "https://api.tinybird.co/v0/pipes/daily_summary.json?start_date=2024-01-01&end_date=2024-01-31&event_type=purchase&token=your_token"
```

## Next steps: build fast analytics without running ClickHouse{% sup %}®{% /sup %} yourself

Materialized views in ClickHouse{% sup %}®{% /sup %} provide significant query acceleration, but managing ClickHouse{% sup %}®{% /sup %} infrastructure adds operational overhead. Tinybird eliminates this complexity by providing a [fully managed ClickHouse{% sup %}®{% /sup %} service](https://www.tinybird.co/product/managed-clickhouse) designed for developers.

With Tinybird, you define your data sources and materialized views as code, test locally, and deploy to production in minutes. The platform handles cluster scaling, backup, monitoring, and optimization automatically.

[Sign up for a free Tinybird account](https://cloud.tinybird.co/signup) to try materialized views with managed ClickHouse{% sup %}®{% /sup %} infrastructure. The free tier includes enough resources to build and test real-world analytics use cases.

## FAQs about ClickHouse{% sup %}®{% /sup %} materialized views

### How do I alter a materialized view without dropping data?

You cannot alter materialized views directly in ClickHouse{% sup %}®{% /sup %}. The typical workflow is to create a new materialized view with the updated logic, optionally use `POPULATE` to backfill historical data, then drop the old view once you've verified the new one works correctly.

### How can I monitor lag between the source and the materialized view?

Query the `system.parts` table to compare last modification times between your source table and the materialized view's target table. Significant time differences indicate that inserts to the source table might be slow or that the materialized view's processing is falling behind.

### What happens to a materialized view after a table schema change?

Schema changes to source tables can break materialized views if the view references columns that were removed or if data types become incompatible. You'll typically drop and recreate the materialized view after significant schema changes to handle the new structure correctly.

### Does a materialized view increase storage costs significantly?

Materialized views create additional storage overhead since they store transformed or aggregated data separately from the source table. The actual storage increase depends on how much the view reduces data volume through aggregation. Views that heavily aggregate data often use less storage than the source table, while views that simply reorder data can nearly double storage requirements.
