When you query two tables in Clickhouse, you often want all records from one table regardless of whether matching records exist in the other. LEFT JOIN
solves this by preserving every row from the left table and filling in data from the right table only where matches are found.
This guide walks through LEFT JOIN
syntax, explains how Clickhouse handles matching and non-matching rows, and shows you how to avoid common mistakes that lead to unexpected results or poor performance.
What a left join does in Clickhouse
A LEFT JOIN
in Clickhouse returns all records from the left table and the matched records from the right table. When no match exists in the right table for a row in the left table, the result set contains NULL
values for all columns from the right table.
Think of it this way: the left table stays complete, and the right table adds optional information where it can. If you're joining a users table with an orders table, every user appears in the results, even users who never placed an order.
Left join syntax and parameters
The basic structure follows this pattern: SELECT ... FROM table1 LEFT JOIN table2 ON condition
. The ON
clause tells Clickhouse how to match rows between tables.
SELECT
users.name,
orders.product
FROM users
LEFT JOIN orders ON users.user_id = orders.user_id
This query fetches all users and their products, if they have any.
ON clause and multi-key joins
Single-column joins work when one field connects the tables. For more complex relationships, combine multiple conditions with AND
:
SELECT
orders.order_id,
inventory.quantity
FROM orders
LEFT JOIN inventory ON orders.product_id = inventory.product_id
AND orders.warehouse = inventory.warehouse
Both conditions work together to find the right match. A product in the wrong warehouse won't match.
ANY modifier to drop duplicates
When the right table has multiple matching rows for a single left table row, LEFT ANY JOIN
returns only the first match:
SELECT
customers.name,
orders.order_date
FROM customers
LEFT ANY JOIN orders ON customers.customer_id = orders.customer_id
Clickhouse picks the first matching row it finds. If you want a specific row, filter or sort the right table first using a subquery.
ALL modifier to keep duplicates
The LEFT ALL JOIN
modifier keeps all matching combinations. This is the default behavior in Clickhouse, so you can omit ALL
and get the same result:
SELECT
customers.name,
purchases.amount
FROM customers
LEFT ALL JOIN purchases ON customers.customer_id = purchases.customer_id
If one customer has three purchases, that customer appears three times in the results.
Step-by-step left join example with sample tables
Let's walk through a complete example using two tables: one for users and another for orders. This shows how LEFT JOIN
handles both matching and non-matching rows.
1. Create source data sources
First, create two tables with appropriate data types:
CREATE TABLE users (
user_id UInt32,
name String,
email String,
signup_date Date
) ENGINE = MergeTree()
ORDER BY user_id;
CREATE TABLE orders (
order_id UInt32,
user_id UInt32,
product String,
amount Decimal(10, 2),
order_date Date
) ENGINE = MergeTree()
ORDER BY order_id;
The user_id
column appears in both tables and serves as the join key.
2. Insert sample rows
Add data that includes users with orders and users without orders:
INSERT INTO users VALUES
(1, 'Alice Johnson', 'alice@example.com', '2024-01-15'),
(2, 'Bob Smith', 'bob@example.com', '2024-02-20'),
(3, 'Carol White', 'carol@example.com', '2024-03-10'),
(4, 'David Brown', 'david@example.com', '2024-04-05');
INSERT INTO orders VALUES
(101, 1, 'Laptop', 999.99, '2024-05-01'),
(102, 1, 'Mouse', 29.99, '2024-05-03'),
(103, 2, 'Keyboard', 79.99, '2024-05-07'),
(104, 3, 'Monitor', 299.99, '2024-05-12');
Notice that user_id 4 (David Brown) has no orders.
3. Run the left join query
Execute a LEFT JOIN
to get all users with their order information:
SELECT
u.name,
u.email,
o.product,
o.amount
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
ORDER BY u.user_id;
This query returns every user, whether they placed orders or not.
4. Verify expected output
The result shows all four users:
┌─name──────────┬─email─────────────┬─product──┬──amount─┐
│ Alice Johnson │ alice@example.com │ Laptop │ 999.99 │
│ Alice Johnson │ alice@example.com │ Mouse │ 29.99 │
│ Bob Smith │ bob@example.com │ Keyboard │ 79.99 │
│ Carol White │ carol@example.com │ Monitor │ 299.99 │
│ David Brown │ david@example.com │ NULL │ NULL │
└───────────────┴───────────────────┴──────────┴─────────┘
Alice appears twice because she has two orders. David appears once with NULL
values because he has no orders.
Handling NULLs and missing keys
When a LEFT JOIN
finds no matching row in the right table, Clickhouse fills all right-table columns with NULL
values. You can replace NULL
values with defaults using the COALESCE
function:
SELECT
u.name,
COALESCE(o.product, 'No orders') AS product,
COALESCE(o.amount, 0) AS amount
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id;
The COALESCE
function returns the first non-NULL
argument. David Brown now shows "No orders" instead of NULL
.
Choosing ANY vs ALL for left join
Pick LEFT ANY JOIN
when you want exactly one row per left-table record, which Clickhouse can optimize through direct joins with O(1) lookup when using dictionaries. Pick LEFT ALL JOIN
(or just LEFT JOIN
) when you want every matching combination.
- LEFT ANY JOIN: Returns one row per left-table record, even when multiple right-table matches exist
- LEFT ALL JOIN: Returns every matching combination, which can multiply rows
For user dashboards where each user appears once, use LEFT ANY JOIN
. For transaction reports where you analyze every individual order, use LEFT ALL JOIN
.
Which join algorithm Clickhouse uses for left join
Clickhouse automatically picks a join algorithm based on table sizes and available memory. The database chooses between hash joins, merge joins, and direct joins without manual configuration in most cases.
Hash join
Hash joins build an in-memory hash table from the right-side table. Clickhouse then probes this hash table for each left-table row to find matches. This algorithm works well when the right table fits comfortably in memory and is the default choice for most LEFT JOIN
queries.
Partial merge join
When both tables are already sorted by the join key, Clickhouse can use a merge join algorithm. This approach reads both tables in order and matches rows without building a hash table, which reduces memory usage by over 2× compared to hash joins for very large datasets.
Join algorithm hints
You can force Clickhouse to use a specific algorithm by adding a SETTINGS
clause:
SELECT
u.name,
o.product
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
SETTINGS join_algorithm = 'hash';
Valid options include hash
, partial_merge
, direct
, and auto
. The auto
setting lets Clickhouse choose, which typically produces the best performance.
When to use full join instead of left join
A FULL JOIN
returns all rows from both tables, with NULL
s filling in where matches don't exist on either side. LEFT JOIN
only preserves all rows from the left table.
Join Type | Left Table Rows | Right Table Rows | Unmatched Behavior |
---|---|---|---|
LEFT JOIN | All preserved | Only matched | NULL s for right columns |
FULL JOIN | All preserved | All preserved | NULL s for either side |
Use LEFT JOIN
when you want to keep all records from one primary table and optionally enrich them with data from a secondary table. Use FULL JOIN
when both tables contain important records that might not have matches, like reconciling two independent data sources.
Common mistakes and how to debug them
Several common errors can cause unexpected results or query failures when writing LEFT JOIN
queries.
Data type mismatch
Join columns need compatible data types. Joining a UInt32
column to a String
column produces a type mismatch error. Convert one column to match the other using CAST
or type conversion functions before joining.
Wrong join order
The order of tables in a LEFT JOIN
affects both results and performance. The left table (listed first) determines which rows appear in the output. Placing a large table on the right side can improve performance because Clickhouse builds internal structures from the right table, though your query logic determines which table goes where.
Missing default values
Forgetting to handle NULL
values in calculations leads to NULL
results:
-- This returns NULL for users without orders
SELECT
u.name,
o.amount * 1.1 AS amount_with_tax
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id;
Wrap calculations in COALESCE
or filter out NULL
s with a WHERE
clause.
From query to API in Tinybird
Tinybird converts Clickhouse queries into hosted REST API endpoints. Here's a complete pipe file that creates an API endpoint for a LEFT JOIN
query:
TOKEN user_orders_read READ
DESCRIPTION >
Get all users with their order history
NODE user_orders_endpoint
SQL >
%
SELECT
u.user_id,
u.name,
u.email,
o.order_id,
o.product,
o.amount
FROM users AS u
LEFT JOIN orders AS o ON u.user_id = o.user_id
{/% if defined(user_id) %}
WHERE u.user_id = {{UInt32(user_id)}}
{/% end %}
ORDER BY u.user_id
LIMIT {{Int32(limit, 100)}}
TYPE endpoint
Deploy this pipe with tb --cloud deploy
, and Tinybird generates a URL you can call from any HTTP client. The API accepts user_id
and limit
parameters and returns JSON responses.
Want to try this yourself? Sign up for a free Tinybird account to test LEFT JOIN
queries without infrastructure setup.
Build fast, skip the ops
Tinybird provides a managed Clickhouse service that handles cluster setup, scaling, and maintenance. Developers can focus on writing SQL and building features rather than managing infrastructure.
The platform includes built-in data ingestion, automatic query optimization, and API generation from SQL queries. LEFT JOIN
queries that might take weeks to productionize on self-hosted Clickhouse can be deployed as APIs in minutes with Tinybird.
FAQs about Clickhouse left join
What is the memory impact of a large Clickhouse LEFT JOIN?
Memory usage depends on the size of the right table and the join algorithm Clickhouse selects. Hash joins load the entire right table into memory, so a 10GB right table requires at least 10GB of available RAM, though grace hash joins use 50% less memory by spilling to disk when needed. Clickhouse automatically switches to merge joins or other algorithms when memory constraints exist, though this can slow query execution.
Can I LEFT JOIN a streaming data source with a static table in Clickhouse?
Yes, you can join streaming tables with static reference tables. The static table data represents a snapshot at the moment the query executes. This pattern works well for enriching real-time events with slowly changing dimension data like user profiles or product catalogs.
How does Clickhouse handle duplicate join keys in the right table?
By default, Clickhouse uses LEFT ALL JOIN
behavior, which returns every matching combination. If the right table has five rows matching one left-table row, the result includes all five combinations. Use LEFT ANY JOIN
to limit results to one match per left-table row, though Clickhouse doesn't guarantee which match it selects without explicit ordering./