Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Complex SQL queries often become difficult to read when the same filtering or aggregation logic appears multiple times. Common table expressions (CTEs) in ClickHouse® solve this by letting you define temporary named result sets using the WITH clause, which you can then reference throughout your query like a regular table.

This article shows you how to write CTEs in ClickHouse®, when to use them instead of subqueries or materialized views, and how to deploy CTE-based queries as production API endpoints.

What is a common table expression in ClickHouse®

A common table expression (CTE) in ClickHouse is a temporary named result set that you define within a single query using the WITH clause. The CTE exists only during query execution and lets you break down complex logic into readable, reusable parts.

You write a CTE by giving a subquery a name, then reference that name later in your main query. This is particularly useful when you need to filter data in stages or when the same intermediate calculation appears multiple times in your query. Instead of copying the same subquery logic repeatedly, you define it once as a CTE and reference it by name.

Here's what a basic CTE looks like:

WITH active_users AS (
    SELECT user_id
    FROM users
    WHERE status = 'active'
)
SELECT 
    au.user_id,
    COUNT(*) AS event_count
FROM events e
JOIN active_users au ON e.user_id = au.user_id
GROUP BY au.user_id
ORDER BY event_count DESC;

In this query, active_users is the CTE. It filters the users table to only active users, then the main query joins this filtered set with the events table to count how many events each active user has generated.

ClickHouse WITH clause syntax

The WITH clause in ClickHouse follows a straightforward pattern where you define one or more CTEs before your main SELECT statement. Each CTE has a name, followed by AS, followed by a subquery in parentheses.

The basic structure looks like this:

WITH 
    cte_name AS (
        SELECT ...
    )
SELECT ...
FROM cte_name;

You can define multiple CTEs by separating them with commas:

WITH 
    first_cte AS (
        SELECT ...
    ),
    second_cte AS (
        SELECT ...
    )
SELECT ...
FROM first_cte
JOIN second_cte ON ...;

Each CTE has to be defined before you reference it. You can use earlier CTEs within later ones, but you can't reference a CTE before it's defined.

Non-recursive CTE example you can copy-paste

Here's a practical example that calculates user engagement metrics using CTEs. This query finds users who have made at least one purchase, then calculates their average session duration.

WITH 
    purchasing_users AS (
        SELECT DISTINCT user_id
        FROM events
        WHERE event_type = 'purchase'
    ),
    session_durations AS (
        SELECT 
            user_id,
            session_id,
            max(timestamp) - min(timestamp) AS duration_seconds
        FROM events
        WHERE user_id IN (SELECT user_id FROM purchasing_users)
        GROUP BY user_id, session_id
    )
SELECT 
    user_id,
    COUNT(DISTINCT session_id) AS total_sessions,
    AVG(duration_seconds) AS avg_session_duration
FROM session_durations
GROUP BY user_id
ORDER BY avg_session_duration DESC
LIMIT 100;

The first CTE (purchasing_users) filters down to users who have completed at least one purchase. The second CTE (session_durations) calculates how long each session lasted for those purchasing users by finding the difference between the earliest and latest event timestamp in each session. The main query then aggregates these durations to show the average session length per user.

This pattern is common in analytics where you filter data in stages and then perform calculations on the filtered results.

Referencing a CTE multiple times in one query

One advantage of CTEs is that you can reference the same named result set multiple times within a single query. This eliminates repeated subquery logic, which both reduces code duplication and makes queries easier to maintain.

Here's an example that references a CTE in two different places:

WITH high_value_customers AS (
    SELECT 
        customer_id,
        SUM(order_amount) AS total_spent
    FROM orders
    WHERE order_date >= '2024-01-01'
    GROUP BY customer_id
    HAVING total_spent > 1000
)
SELECT 
    'High Value' AS segment,
    COUNT(*) AS customer_count,
    AVG(total_spent) AS avg_spent
FROM high_value_customers

UNION ALL

SELECT 
    'Top 10' AS segment,
    COUNT(*) AS customer_count,
    AVG(total_spent) AS avg_spent
FROM (
    SELECT * 
    FROM high_value_customers
    ORDER BY total_spent DESC
    LIMIT 10
);

In this query, the high_value_customers CTE appears twice: once in the first SELECT and again in the subquery within the UNION ALL. Without the CTE, you'd write the same aggregation logic twice, which increases the chance of inconsistencies if you later modify the filtering criteria.

CTEs also work well in self-joins where you compare rows from the same filtered dataset:

WITH daily_metrics AS (
    SELECT 
        toDate(timestamp) AS date,
        COUNT(*) AS event_count
    FROM events
    GROUP BY date
)
SELECT 
    today.date,
    today.event_count AS today_count,
    yesterday.event_count AS yesterday_count,
    today.event_count - yesterday.event_count AS daily_change
FROM daily_metrics today
LEFT JOIN daily_metrics yesterday 
    ON yesterday.date = today.date - INTERVAL 1 DAY
ORDER BY today.date DESC;

Here the same CTE joins to itself to compare daily event counts with the previous day's counts. For a complete guide to LEFT JOIN syntax and performance, see our dedicated LEFT JOIN tutorial.

Does ClickHouse support recursive CTEs

ClickHouse added limited support for recursive CTEs starting in version 24.4, but the implementation differs from databases like PostgreSQL or SQL Server. Recursive CTEs in ClickHouse work for specific use cases but have constraints that affect how you can use them.

A recursive CTE references itself within its own definition, which allows you to perform iterative calculations or traverse hierarchical data (passing all PostgreSQL tests and supporting even more complex recursive logic). The syntax uses WITH RECURSIVE instead of just WITH:

WITH RECURSIVE counter AS (
    SELECT 1 AS n
    UNION ALL
    SELECT n + 1
    FROM counter
    WHERE n < 10
)
SELECT * FROM counter;

This query generates numbers from 1 to 10 by starting with 1 and recursively adding 1 until the condition n < 10 is no longer true.

However, ClickHouse's recursive CTE implementation is more restrictive than in other databases. The recursion depth is limited, and certain operations that work in other systems might not be supported. For complex hierarchical queries or deep recursion, you might need alternative approaches like array functions or multiple query steps.

If you're working with hierarchical data like organizational charts or category trees, test your recursive queries carefully. Consider whether a materialized view or a different data modeling approach might be more reliable for your use case.

CTE performance and memory considerations

CTEs in ClickHouse are materialized in memory during query execution, which means the result set from each CTE is computed and stored before the main query uses it. This behavior has both advantages and tradeoffs depending on your data volume and query structure.

When you define a CTE, ClickHouse executes that subquery and holds the results in RAM (re-executing each reference rather than reusing the stored result). If the CTE produces a large result set, this can consume significant memory, particularly if you're running multiple concurrent queries (8 GB RAM minimum is frequently insufficient without tuning). The max_memory_usage setting controls how much memory a single query can use, and exceeding this limit will cause the query to fail.

For CTEs that produce small to medium-sized result sets (thousands to millions of rows), the memory overhead typically isn't a concern. The benefit of having the data readily available for multiple references usually outweighs the memory cost. However, if a CTE produces billions of rows or contains wide rows with many columns, you might hit memory limits.

One approach is to filter or aggregate data as early as possible in your CTE definitions:

-- Less efficient: large intermediate result
WITH all_events AS (
    SELECT * FROM events
)
SELECT COUNT(*) 
FROM all_events 
WHERE event_type = 'purchase';

-- More efficient: filter early
WITH purchase_events AS (
    SELECT * FROM events
    WHERE event_type = 'purchase'
)
SELECT COUNT(*) FROM purchase_events;

The second version filters down to only purchase events in the CTE, which reduces the amount of data held in memory.

When to use a CTE versus a subquery or materialized view

Choosing between CTEs, subqueries, and materialized views depends on your specific use case, query complexity, and performance requirements. Each approach has different characteristics that make it more suitable for certain scenarios.

CTEs work well when you:

Reference the same intermediate result multiple times in a single query
Want to improve readability by breaking complex logic into named steps
Organize multi-stage filtering or aggregation in a clear sequence

Subqueries are better when you:

Only need the intermediate result once
Have a simple filtering condition that doesn't benefit from being named
Want to avoid the overhead of materializing a result set in memory

Materialized views are the right choice when you:

Run the same aggregation or transformation repeatedly across many queries
Can accept slightly stale data that updates periodically rather than in real-time
Want to pre-compute expensive operations to improve query response times

Here's a comparison of the same logic implemented three ways:

-- As a CTE
WITH recent_orders AS (
    SELECT customer_id, SUM(amount) AS total
    FROM orders
    WHERE order_date >= today() - INTERVAL 30 DAY
    GROUP BY customer_id
)
SELECT * FROM recent_orders WHERE total > 500;

-- As a subquery
SELECT * FROM (
    SELECT customer_id, SUM(amount) AS total
    FROM orders
    WHERE order_date >= today() - INTERVAL 30 DAY
    GROUP BY customer_id
) WHERE total > 500;

The CTE and subquery versions produce identical results and have similar performance characteristics. The CTE version is more readable if you reference recent_orders multiple times or if your query has several stages of logic.

A materialized view would store the aggregated results persistently, which makes subsequent queries much faster but requires additional storage and adds complexity to your data pipeline.

Avoiding CTEs with Tinybird's multi-node pipes

Tinybird is a managed ClickHouse service that provides a better alternative to CTEs: multi-node pipes. Instead of nesting logic within a single query using WITH clauses, you break your query into separate, referenceable nodes that make your logic more modular and reusable.

CTEs in ClickHouse have limitations: each CTE only exists within its parent query, making them non-reusable across different queries. Tinybird's multi-node approach solves this differently:

Individual nodes can be referenced anywhere within the same pipe - similar to how you'd reference a CTE, but cleaner and more testable
Complete pipes can be referenced across your entire project using SELECT * FROM pipe_name - allowing you to chain entire transformation pipelines together and reuse them project-wide

This means a pipe can contain multiple transformation steps as nodes, and then the entire pipe becomes a reusable building block for other pipes in your project.

Why multi-node pipes are better than CTEs

Problem with CTEs:

-- Every query needs to redefine the same CTE
WITH filtered_events AS (
    SELECT * FROM user_events 
    WHERE timestamp >= today() - 30
)
SELECT ... FROM filtered_events ...

Tinybird's solution:

Define filtered_events once as a node within a pipe, then:

Reference it across multiple nodes in the same pipe
Or reference the entire pipe from other pipes across your project

1. Create the data source

First, install the Tinybird CLI and start a local development environment:

curl -L tinybird.co | sh
tb login
tb local start

Create a sample data file called user_events.ndjson with event data:

{"user_id": "user_1", "event_type": "login", "timestamp": "2024-01-15 08:30:00", "session_id": "session_100"}
{"user_id": "user_1", "event_type": "pageview", "timestamp": "2024-01-15 08:31:00", "session_id": "session_100"}
{"user_id": "user_2", "event_type": "login", "timestamp": "2024-01-15 09:00:00", "session_id": "session_101"}
{"user_id": "user_2", "event_type": "purchase", "timestamp": "2024-01-15 09:15:00", "session_id": "session_101"}
{"user_id": "user_3", "event_type": "login", "timestamp": "2024-01-15 10:00:00", "session_id": "session_102"}

Generate and populate the data source:

tb datasource generate user_events --file user_events.ndjson
tb build
tb datasource append user_events --file user_events.ndjson

2. Build a multi-node pipe

Create a file called user_activity_summary.pipe with multiple nodes instead of CTEs:

TOKEN user_activity_read READ

DESCRIPTION >
    Calculate user activity metrics using multi-node architecture instead of CTEs

NODE filtered_events
SQL >
    %
    SELECT 
        user_id,
        event_type,
        timestamp,
        session_id
    FROM user_events
    WHERE 1=1
    {/% if defined(event_type) %}
        AND event_type = {{ String(event_type) }}
    {/% end %}
    {/% if defined(start_date) %}
        AND timestamp >= {{ DateTime(start_date) }}
    {/% end %}

NODE user_activity_endpoint
SQL >
    %
    SELECT 
        user_id,
        COUNT(*) AS event_count,
        COUNT(DISTINCT session_id) AS session_count,
        min(timestamp) AS first_event,
        max(timestamp) AS last_event
    FROM filtered_events
    GROUP BY user_id
    ORDER BY event_count DESC
    LIMIT {{ Int32(limit, 10) }}

TYPE endpoint

This pipe defines two nodes:

filtered_events: Handles filtering logic with parameterization
user_activity_endpoint: Aggregates data from the first node

Each node is an independent transformation step. The second node references the first by name, just like you'd reference a CTE, but with key advantages:

Within-pipe reusability: Multiple nodes in the same pipe can reference filtered_events
Cross-project reusability: Other pipes can reference the entire user_activity_summary pipe
Testing: You can test each node independently during development
Clarity: Each transformation step is clearly separated
Performance: Tinybird optimizes the entire chain of nodes as a single query

For example, within this same pipe, another node could also use the filtered events:

NODE event_type_breakdown
SQL >
    SELECT 
        event_type,
        COUNT(*) as type_count
    FROM filtered_events
    GROUP BY event_type

Or a completely different pipe could reference the entire user_activity_summary output:

NODE top_users
SQL >
    SELECT * 
    FROM user_activity_summary
    WHERE event_count > 10

Test it locally:

tb dev

Then query the endpoint at http://localhost:7181/pipes/user_activity_summary.json?event_type=login&limit=5.

3. Push and test the API

Deploy your data source and pipe to Tinybird Cloud:

tb --cloud deploy

Get the API endpoint URL:

tb --cloud endpoint url user_activity_summary --language curl

This returns a curl command you can run to test your deployed API:

curl -X GET "https://api.us-east.tinybird.co/v0/pipes/user_activity_summary.json?event_type=login&limit=5&token=p.ey..."

The response is a JSON object with your query results:

{
  "meta": [
    {"name": "user_id", "type": "String"},
    {"name": "event_count", "type": "UInt64"},
    {"name": "session_count", "type": "UInt64"},
    {"name": "first_event", "type": "DateTime"},
    {"name": "last_event", "type": "DateTime"}
  ],
  "data": [
    {
      "user_id": "user_1",
      "event_count": 1,
      "session_count": 1,
      "first_event": "2024-01-15 08:30:00",
      "last_event": "2024-01-15 08:30:00"
    }
  ],
  "rows": 1
}

You can now call this API from your application code using any HTTP client. The multi-node query runs on managed ClickHouse infrastructure without you handling cluster scaling, connection pooling, or performance tuning.

Benefits of multi-node pipes over CTEs

This multi-node approach provides several advantages over traditional CTEs:

Pipe reusability: The entire user_activity_summary pipe can be referenced by other pipes in your project using SELECT * FROM user_activity_summary. This lets you build transformation pipelines once and reuse them across multiple endpoints without duplicating logic.
Independent testing: During development with tb dev, you can test each node separately to verify the filtering logic works before building the aggregation layer.
Better debugging: When a query fails, Tinybird shows you which specific node caused the error, making it easier to identify and fix issues.
Simplified collaboration: Team members can understand and modify individual nodes without parsing complex nested CTEs.
Version control friendly: Each node change shows up clearly in git diffs, unlike CTE changes buried within large query blocks.

Streaming ingestion and CTE consistency

ClickHouse uses an eventually consistent model for data, which means recently ingested data might not immediately appear in query results. This behavior affects CTEs the same way it affects regular queries.

When you ingest data into ClickHouse, it's written to parts that are merged in the background. Queries read from the parts that are currently available, which might not include the most recently written data until the merge completes. This typically happens within seconds, but the exact timing depends on your table engine and merge settings.

If you're using CTEs to filter or aggregate streaming data, be aware that your CTE might not include the absolute latest records. For most analytics use cases, this delay is acceptable because the data is still fresh enough for decision-making.

If you need stronger consistency guarantees, you can use the FINAL modifier in your queries, though this comes with a performance cost:

WITH recent_events AS (
    SELECT * 
    FROM events FINAL
    WHERE timestamp >= now() - INTERVAL 1 HOUR
)
SELECT COUNT(*) FROM recent_events;

The FINAL modifier forces ClickHouse to merge parts on the fly, which ensures you see the most up-to-date data but makes queries slower, particularly on large tables.

For real-time dashboards or monitoring systems, test your queries to understand the typical lag between data ingestion and query visibility, then design your CTEs and aggregations accordingly.

Next steps for faster analytics with Tinybird

Tinybird provides a managed ClickHouse platform that handles infrastructure complexity so you can focus on building analytics features. The service includes built-in streaming ingestion, automatic scaling, and API generation that turns your SQL queries into production endpoints.

If you're integrating ClickHouse into an application and want to avoid managing clusters, setting up observability, or tuning performance at scale, Tinybird handles those operational concerns. You can define data pipelines as code, test locally, and deploy to production with standard CI/CD workflows.

Sign up for a free Tinybird plan to start building with managed ClickHouse. The platform includes a CLI for local development, comprehensive documentation, and example projects that demonstrate common analytics patterns using CTEs and other ClickHouse features.

FAQs about ClickHouse CTEs

How many CTEs can a single query contain?

ClickHouse doesn't enforce a hard limit on the number of CTEs you can define in one query. However, each CTE consumes memory during execution, so practical limits depend on your available RAM and the size of each CTE's result set. Most queries use between one and five CTEs.

Can I reference the same CTE in a JOIN operation?

Yes, you can reference the same CTE multiple times in JOIN operations, including self-joins where you join a CTE to itself. This is one of the main advantages of CTEs over subqueries, as it lets you write the logic once and reference it multiple times without duplication.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® common table expression (CTE) example: how to use WITH in ClickHouse^®

What is a common table expression in ClickHouse®

ClickHouse WITH clause syntax

Non-recursive CTE example you can copy-paste

Referencing a CTE multiple times in one query

Does ClickHouse support recursive CTEs

CTE performance and memory considerations

When to use a CTE versus a subquery or materialized view

Avoiding CTEs with Tinybird's multi-node pipes

Why multi-node pipes are better than CTEs

1. Create the data source

2. Build a multi-node pipe

3. Push and test the API

Benefits of multi-node pipes over CTEs

Streaming ingestion and CTE consistency

Next steps for faster analytics with Tinybird

FAQs about ClickHouse CTEs

How many CTEs can a single query contain?

Can I reference the same CTE in a JOIN operation?

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse® project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse® common table expression (CTE) example: how to use WITH in ClickHouse®

What is a common table expression in ClickHouse®

ClickHouse WITH clause syntax

Non-recursive CTE example you can copy-paste

Referencing a CTE multiple times in one query

Does ClickHouse support recursive CTEs

CTE performance and memory considerations

When to use a CTE versus a subquery or materialized view

Avoiding CTEs with Tinybird's multi-node pipes

Why multi-node pipes are better than CTEs

1. Create the data source

2. Build a multi-node pipe

3. Push and test the API

Benefits of multi-node pipes over CTEs

Streaming ingestion and CTE consistency

Next steps for faster analytics with Tinybird

FAQs about ClickHouse CTEs

How many CTEs can a single query contain?

Can I reference the same CTE in a JOIN operation?

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® common table expression (CTE) example: how to use WITH in ClickHouse^®

Skip the infra work. Deploy your first ClickHouse^®
project now.