---
title: "ClickHouse® CTE example: How to use WITH in ClickHouse®"
excerpt: "Learn ClickHouse® CTE examples with practical WITH clause syntax, performance tips, and real-world use cases to simplify complex SQL queries effectively."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-10-14 19:29:22"
publishedOn: "2025-10-16 19:29:22"
updatedOn: "2025-10-16 19:29:22"
status: "published"
---

Complex SQL queries often become difficult to read when the same filtering or aggregation logic appears multiple times. Common table expressions (CTEs) in ClickHouse{% sup %}®{% /sup %} solve this by letting you define temporary named result sets using the `WITH` clause, which you can then reference throughout your query like a regular table.

This article shows you how to write CTEs in ClickHouse{% sup %}®{% /sup %}, when to use them instead of subqueries or materialized views, and how to deploy CTE-based queries as production API endpoints.

## What is a common table expression in ClickHouse{% sup %}®{% /sup %}

A common table expression (CTE) in ClickHouse{% sup %}®{% /sup %} is a temporary named result set that you define within a single query using the `WITH` clause. The CTE exists only during query execution and lets you break down complex logic into readable, reusable parts.

You write a CTE by giving a subquery a name, then reference that name later in your main query. This is particularly useful when you need to filter data in stages or when the same intermediate calculation appears multiple times in your query. Instead of copying the same subquery logic repeatedly, you define it once as a CTE and reference it by name.

Here's what a basic CTE looks like:

```sql
WITH active_users AS (
    SELECT user_id
    FROM users
    WHERE status = 'active'
)
SELECT
    au.user_id,
    COUNT(*) AS event_count
FROM events e
JOIN active_users au ON e.user_id = au.user_id
GROUP BY au.user_id
ORDER BY event_count DESC;
```

In this query, `active_users` is the CTE. It filters the users table to only active users, then the main query joins this filtered set with the events table to count how many events each active user has generated.

## ClickHouse{% sup %}®{% /sup %} WITH clause syntax

The `WITH` clause in ClickHouse{% sup %}®{% /sup %} follows a straightforward pattern where you define one or more CTEs before your main `SELECT` statement. Each CTE has a name, followed by `AS`, followed by a subquery in parentheses.

The basic structure looks like this:

```sql
WITH
    cte_name AS (
        SELECT ...
    )
SELECT ...
FROM cte_name;
```

You can define multiple CTEs by separating them with commas:

```sql
WITH
    first_cte AS (
        SELECT ...
    ),
    second_cte AS (
        SELECT ...
    )
SELECT ...
FROM first_cte
JOIN second_cte ON ...;
```

Each CTE has to be defined before you reference it. You can use earlier CTEs within later ones, but you can't reference a CTE before it's defined.

## Non-recursive CTE example you can copy-paste

Here's a practical example that calculates user engagement metrics using CTEs. This query finds users who have made at least one purchase, then calculates their average session duration.

```sql
WITH
    purchasing_users AS (
        SELECT DISTINCT user_id
        FROM events
        WHERE event_type = 'purchase'
    ),
    session_durations AS (
        SELECT
            user_id,
            session_id,
            max(timestamp) - min(timestamp) AS duration_seconds
        FROM events
        WHERE user_id IN (SELECT user_id FROM purchasing_users)
        GROUP BY user_id, session_id
    )
SELECT
    user_id,
    COUNT(DISTINCT session_id) AS total_sessions,
    AVG(duration_seconds) AS avg_session_duration
FROM session_durations
GROUP BY user_id
ORDER BY avg_session_duration DESC
LIMIT 100;
```

The first CTE (`purchasing_users`) filters down to users who have completed at least one purchase. The second CTE (`session_durations`) calculates how long each session lasted for those purchasing users by finding the difference between the earliest and latest event timestamp in each session. The main query then aggregates these durations to show the average session length per user.

This pattern is common in analytics where you filter data in stages and then perform calculations on the filtered results.

## Referencing a CTE multiple times in one query

One advantage of CTEs is that you can reference the same named result set multiple times within a single query. This eliminates repeated subquery logic, which both reduces code duplication and makes queries easier to maintain.

Here's an example that references a CTE in two different places:

```sql
WITH high_value_customers AS (
    SELECT
        customer_id,
        SUM(order_amount) AS total_spent
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
    HAVING total_spent > 1000
)
SELECT
    'High Value' AS segment,
    COUNT(*) AS customer_count,
    AVG(total_spent) AS avg_spent
FROM high_value_customers

UNION ALL

SELECT
    'Top 10' AS segment,
    COUNT(*) AS customer_count,
    AVG(total_spent) AS avg_spent
FROM (
    SELECT *
    FROM high_value_customers
    ORDER BY total_spent DESC
    LIMIT 10
);
```

In this query, the `high_value_customers` CTE appears twice: once in the first `SELECT` and again in the subquery within the `UNION ALL`. Without the CTE, you'd write the same aggregation logic twice, which increases the chance of inconsistencies if you later modify the filtering criteria.

CTEs also work well in self-joins where you compare rows from the same filtered dataset:

```sql
WITH daily_metrics AS (
    SELECT
        toDate(timestamp) AS date,
        COUNT(*) AS event_count
    FROM events
    GROUP BY date
)
SELECT
    today.date,
    today.event_count AS today_count,
    yesterday.event_count AS yesterday_count,
    today.event_count - yesterday.event_count AS daily_change
FROM daily_metrics today
LEFT JOIN daily_metrics yesterday
    ON yesterday.date = today.date - INTERVAL 1 DAY
ORDER BY today.date DESC;
```

Here the same CTE joins to itself to compare daily event counts with the previous day's counts. For a complete guide to [LEFT JOIN syntax and performance](https://www.tinybird.co/blog-posts/clickhouse-left-join-example), see our dedicated LEFT JOIN tutorial.

## Does ClickHouse{% sup %}®{% /sup %} support recursive CTEs

ClickHouse{% sup %}®{% /sup %} added limited support for recursive CTEs starting in [version 24.4](https://clickhouse.com/docs/whats-new/changelog/2024), but the implementation differs from databases like PostgreSQL or SQL Server. Recursive CTEs in ClickHouse{% sup %}®{% /sup %} work for specific use cases but have constraints that affect how you can use them.

A recursive CTE references itself within its own definition, which allows you to perform iterative calculations or traverse hierarchical data (passing all PostgreSQL tests and supporting even more complex recursive logic). The syntax uses `WITH RECURSIVE` instead of just `WITH`:

```sql
WITH RECURSIVE counter AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM counter
    WHERE n < 10
)
SELECT * FROM counter;
```

This query generates numbers from 1 to 10 by starting with 1 and recursively adding 1 until the condition `n < 10` is no longer true.

However, ClickHouse{% sup %}®{% /sup %}'s recursive CTE implementation is more restrictive than in other databases. The recursion depth is limited, and certain operations that work in other systems might not be supported. For complex hierarchical queries or deep recursion, you might need alternative approaches like array functions or multiple query steps.

If you're working with hierarchical data like organizational charts or category trees, test your recursive queries carefully. Consider whether a materialized view or a different data modeling approach might be more reliable for your use case.

## CTE performance and memory considerations

CTEs in ClickHouse{% sup %}®{% /sup %} are materialized in memory during query execution, which means the result set from each CTE is computed and stored before the main query uses it. This behavior has both advantages and tradeoffs depending on your data volume and query structure.

When you define a CTE, ClickHouse{% sup %}®{% /sup %} executes that subquery and holds the results in RAM ([re-executing each reference](https://clickhouse.com/docs/en/sql-reference/statements/select/with/) rather than reusing the stored result). If the CTE produces a large result set, this can consume significant memory, particularly if you're running multiple concurrent queries (8 GB RAM minimum is frequently insufficient without tuning). The `max_memory_usage` setting controls how much memory a single query can use, and exceeding this limit will cause the query to fail.

For CTEs that produce small to medium-sized result sets (thousands to millions of rows), the memory overhead typically isn't a concern. The benefit of having the data readily available for multiple references usually outweighs the memory cost. However, if a CTE produces billions of rows or contains wide rows with many columns, you might hit memory limits.

One approach is to [filter or aggregate data as early as possible](https://www.tinybird.co/blog-posts/5-rules-for-writing-faster-sql-queries) in your CTE definitions:

```sql
-- Less efficient: large intermediate result
WITH all_events AS (
    SELECT * FROM events
)
SELECT COUNT(*)
FROM all_events
WHERE event_type = 'purchase';

-- More efficient: filter early
WITH purchase_events AS (
    SELECT * FROM events
    WHERE event_type = 'purchase'
)
SELECT COUNT(*) FROM purchase_events;
```

The second version filters down to only purchase events in the CTE, which reduces the amount of data held in memory.

## When to use a CTE versus a subquery or materialized view

Choosing between CTEs, subqueries, and materialized views depends on your specific use case, query complexity, and performance requirements. Each approach has different characteristics that make it more suitable for certain scenarios.

**CTEs work well when you:**

* Reference the same intermediate result multiple times in a single query
* Want to improve readability by breaking complex logic into named steps
* Organize multi-stage filtering or aggregation in a clear sequence

**Subqueries are better when you:**

* Only need the intermediate result once
* Have a simple filtering condition that doesn't benefit from being named
* Want to avoid the overhead of materializing a result set in memory

**[Materialized views](https://www.tinybird.co/blog-posts/what-are-materialized-views-and-why-do-they-matter-for-realtime) are the right choice when you:**

* Run the same aggregation or transformation repeatedly across many queries
* Can accept slightly stale data that updates periodically rather than in real-time
* Want to pre-compute expensive operations to improve query response times

Here's a comparison of the same logic implemented three ways:

```sql
-- As a CTE
WITH recent_orders AS (
    SELECT customer_id, SUM(amount) AS total
    FROM orders
    WHERE order_date >= today() - INTERVAL 30 DAY
    GROUP BY customer_id
)
SELECT * FROM recent_orders WHERE total > 500;

-- As a subquery
SELECT * FROM (
    SELECT customer_id, SUM(amount) AS total
    FROM orders
    WHERE order_date >= today() - INTERVAL 30 DAY
    GROUP BY customer_id
) WHERE total > 500;
```

The CTE and subquery versions produce identical results and have similar performance characteristics. The CTE version is more readable if you reference `recent_orders` multiple times or if your query has several stages of logic.

A materialized view would store the aggregated results persistently, which makes subsequent queries much faster but requires additional storage and adds complexity to your data pipeline.

## Avoiding CTEs with Tinybird's multi-node pipes

[Tinybird](https://tinybird.co) is a [managed ClickHouse{% sup %}®{% /sup %} service](https://www.tinybird.co/product/managed-clickhouse) that provides a better alternative to CTEs: multi-node pipes. Instead of nesting logic within a single query using `WITH` clauses, you break your query into separate, referenceable nodes that make your logic more modular and reusable.

CTEs in ClickHouse{% sup %}®{% /sup %} have limitations: each CTE only exists within its parent query, making them non-reusable across different queries. Tinybird's multi-node approach solves this differently:

* **Individual nodes** can be referenced anywhere within the same pipe - similar to how you'd reference a CTE, but cleaner and more testable
* **Complete pipes** can be referenced across your entire project using `SELECT * FROM pipe_name` - allowing you to chain entire transformation pipelines together and reuse them project-wide

This means a pipe can contain multiple transformation steps as nodes, and then the entire pipe becomes a reusable building block for other pipes in your project.

### Why multi-node pipes are better than CTEs

**Problem with CTEs:**

```sql
-- Every query needs to redefine the same CTE
WITH filtered_events AS (
    SELECT * FROM user_events
    WHERE timestamp >= today() - 30
)
SELECT ... FROM filtered_events ...
```

**Tinybird's solution:**

Define `filtered_events` once as a node within a pipe, then:

* Reference it across multiple nodes in the same pipe
* Or reference the entire pipe from other pipes across your project

### 1. Create the data source

First, install the Tinybird CLI and start a local development environment:

```bash
curl -L tinybird.co | sh
tb login
tb local start
```

Create a sample data file called `user_events.ndjson` with event data:

```json
{"user_id": "user_1", "event_type": "login", "timestamp": "2024-01-15 08:30:00", "session_id": "session_100"}
{"user_id": "user_1", "event_type": "pageview", "timestamp": "2024-01-15 08:31:00", "session_id": "session_100"}
{"user_id": "user_2", "event_type": "login", "timestamp": "2024-01-15 09:00:00", "session_id": "session_101"}
{"user_id": "user_2", "event_type": "purchase", "timestamp": "2024-01-15 09:15:00", "session_id": "session_101"}
{"user_id": "user_3", "event_type": "login", "timestamp": "2024-01-15 10:00:00", "session_id": "session_102"}
```

Generate and populate the data source:

```bash
tb datasource generate user_events --file user_events.ndjson
tb build
tb datasource append user_events --file user_events.ndjson
```

### 2. Build a multi-node pipe

Create a file called `user_activity_summary.pipe` with multiple nodes instead of CTEs:

```tinybird
TOKEN user_activity_read READ

DESCRIPTION >
    Calculate user activity metrics using multi-node architecture instead of CTEs

NODE filtered_events
SQL >
    %
    SELECT
        user_id,
        event_type,
        timestamp,
        session_id
    FROM user_events
    WHERE 1=1
    {/% if defined(event_type) %}
        AND event_type = {{ String(event_type) }}
    {/% end %}
    {/% if defined(start_date) %}
        AND timestamp >= {{ DateTime(start_date) }}
    {/% end %}

NODE user_activity_endpoint
SQL >
    %
    SELECT
        user_id,
        COUNT(*) AS event_count,
        COUNT(DISTINCT session_id) AS session_count,
        min(timestamp) AS first_event,
        max(timestamp) AS last_event
    FROM filtered_events
    GROUP BY user_id
    ORDER BY event_count DESC
    LIMIT {{ Int32(limit, 10) }}

TYPE endpoint
```

This pipe defines two nodes:

1. **`filtered_events`**: Handles filtering logic with parameterization
2. **`user_activity_endpoint`**: Aggregates data from the first node

Each node is an independent transformation step. The second node references the first by name, just like you'd reference a CTE, but with key advantages:

* **Within-pipe reusability**: Multiple nodes in the same pipe can reference `filtered_events`
* **Cross-project reusability**: Other pipes can reference the entire `user_activity_summary` pipe
* **Testing**: You can test each node independently during development
* **Clarity**: Each transformation step is clearly separated
* **Performance**: Tinybird optimizes the entire chain of nodes as a single query

For example, within this same pipe, another node could also use the filtered events:

```tinybird
NODE event_type_breakdown
SQL >
    SELECT
        event_type,
        COUNT(*) as type_count
    FROM filtered_events
    GROUP BY event_type
```

Or a completely different pipe could reference the entire `user_activity_summary` output:

```tinybird
NODE top_users
SQL >
    SELECT *
    FROM user_activity_summary
    WHERE event_count > 10
```

Test it locally:

```bash
tb dev
```

Then query the endpoint at `http://localhost:7181/pipes/user_activity_summary.json?event_type=login&limit=5`.

### 3. Push and test the API

Deploy your data source and pipe to Tinybird Cloud:

```bash
tb --cloud deploy
```

Get the API endpoint URL:

```bash
tb --cloud endpoint url user_activity_summary --language curl
```

This returns a curl command you can run to test your deployed API:

```bash
curl -X GET "https://api.us-east.tinybird.co/v0/pipes/user_activity_summary.json?event_type=login&limit=5&token=p.ey..."
```

The response is a JSON object with your query results:

```json
{
  "meta": [
    {"name": "user_id", "type": "String"},
    {"name": "event_count", "type": "UInt64"},
    {"name": "session_count", "type": "UInt64"},
    {"name": "first_event", "type": "DateTime"},
    {"name": "last_event", "type": "DateTime"}
  ],
  "data": [
    {
      "user_id": "user_1",
      "event_count": 1,
      "session_count": 1,
      "first_event": "2024-01-15 08:30:00",
      "last_event": "2024-01-15 08:30:00"
    }
  ],
  "rows": 1
}
```

You can now call this API from your application code using any HTTP client. The multi-node query runs on managed ClickHouse{% sup %}®{% /sup %} infrastructure without you handling cluster scaling, connection pooling, or performance tuning.

### Benefits of multi-node pipes over CTEs

This multi-node approach provides several advantages over traditional CTEs:

1. **Pipe reusability**: The entire `user_activity_summary` pipe can be referenced by other pipes in your project using `SELECT * FROM user_activity_summary`. This lets you build transformation pipelines once and reuse them across multiple endpoints without duplicating logic.

2. **Independent testing**: During development with `tb dev`, you can test each node separately to verify the filtering logic works before building the aggregation layer.

3. **Better debugging**: When a query fails, Tinybird shows you which specific node caused the error, making it easier to identify and fix issues.

4. **Simplified collaboration**: Team members can understand and modify individual nodes without parsing complex nested CTEs.

5. **Version control friendly**: Each node change shows up clearly in git diffs, unlike CTE changes buried within large query blocks.

## Streaming ingestion and CTE consistency

ClickHouse{% sup %}®{% /sup %} uses an eventually consistent model for data, which means recently ingested data might not immediately appear in query results. This behavior affects CTEs the same way it affects regular queries.

When you ingest data into ClickHouse{% sup %}®{% /sup %}, it's written to parts that are merged in the background. Queries read from the parts that are currently available, which might not include the most recently written data until the merge completes. This typically happens within seconds, but the exact timing depends on your table engine and merge settings.

If you're using CTEs to filter or aggregate streaming data, be aware that your CTE might not include the absolute latest records. For most analytics use cases, this delay is acceptable because the data is still fresh enough for decision-making.

If you need stronger consistency guarantees, you can use the `FINAL` modifier in your queries, though this comes with a performance cost:

```sql
WITH recent_events AS (
    SELECT *
    FROM events FINAL
    WHERE timestamp >= now() - INTERVAL 1 HOUR
)
SELECT COUNT(*) FROM recent_events;
```

The `FINAL` modifier forces ClickHouse{% sup %}®{% /sup %} to merge parts on the fly, which ensures you see the most up-to-date data but makes queries slower, particularly on large tables.

For real-time dashboards or monitoring systems, test your queries to understand the typical lag between data ingestion and query visibility, then design your CTEs and aggregations accordingly.

## Next steps for faster analytics with Tinybird

Tinybird provides a managed ClickHouse{% sup %}®{% /sup %} platform that handles infrastructure complexity so you can focus on building analytics features. The service includes built-in streaming ingestion, automatic scaling, and API generation that turns your SQL queries into production endpoints.

If you're integrating ClickHouse{% sup %}®{% /sup %} into an application and want to avoid managing clusters, setting up observability, or tuning performance at scale, Tinybird handles those operational concerns. You can define data pipelines as code, test locally, and deploy to production with standard CI/CD workflows.

[Sign up for a free Tinybird plan](https://cloud.tinybird.co/signup) to start building with managed ClickHouse{% sup %}®{% /sup %}. The platform includes a CLI for local development, comprehensive documentation, and example projects that demonstrate common analytics patterns using CTEs and other ClickHouse{% sup %}®{% /sup %} features.

## FAQs about ClickHouse{% sup %}®{% /sup %} CTEs

### How many CTEs can a single query contain?

ClickHouse{% sup %}®{% /sup %} doesn't enforce a hard limit on the number of CTEs you can define in one query. However, each CTE consumes memory during execution, so practical limits depend on your available RAM and the size of each CTE's result set. Most queries use between one and five CTEs.

### Can I reference the same CTE in a JOIN operation?

Yes, you can reference the same CTE multiple times in JOIN operations, including self-joins where you join a CTE to itself. This is one of the main advantages of CTEs over subqueries, as it lets you write the logic once and reference it multiple times without duplication.
