---
title: "ClickHouse® query examples: query the fastest OLAP database"
excerpt: "ClickHouse query examples that solve real problems. Copy, paste, and adapt these patterns for your own analytical workloads."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-10-14 19:45:43"
publishedOn: "2025-10-17 19:45:43"
updatedOn: "2025-10-17 19:45:43"
status: "published"
---

ClickHouse{% sup %}®{% /sup %} queries use SQL syntax extended with specialized functions and clauses built for columnar storage and analytical workloads at scale. The database supports standard `SELECT`, `JOIN`, and `GROUP BY` operations while adding features like `LIMIT BY` for top-N-per-group queries, `SAMPLE` for fast prototyping on large datasets, and approximate aggregation functions that trade precision for speed.

This guide covers the query patterns you'll use most often when working with ClickHouse{% sup %}®{% /sup %}, from basic filtering and aggregation to advanced optimization techniques and troubleshooting slow queries.

## Writing your first ClickHouse{% sup %}®{% /sup %} select

ClickHouse{% sup %}®{% /sup %} queries use SQL syntax with extensions built for columnar storage and analytical workloads. A basic `SELECT` statement retrieves data from tables, and you can control output format, limit rows, and filter columns using standard SQL clauses.

To start querying ClickHouse{% sup %}®{% /sup %}, you first need a table with data. Here's how to create a simple events table and run your first queries.

### Create table sample_events

The `CREATE TABLE` statement defines the schema for storing data. In ClickHouse{% sup %}®{% /sup %}, you specify column names, data types, and a table engine that determines how data is stored and queried.

```sql
CREATE TABLE sample_events (
    timestamp DateTime,
    user_id UInt64,
    event_name String,
    event_value Float64
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);
```

The `MergeTree` engine is the most common choice for analytical queries. The `ORDER BY` clause defines the primary key, which determines how data is sorted on disk and affects query performance.

### Basic select star

The `SELECT *` syntax retrieves all columns from a table. This works well for exploring data but can be inefficient on large tables with many columns.

```sql
SELECT * FROM sample_events LIMIT 10;
```

The `LIMIT` clause restricts the number of rows returned. In production queries, always use `LIMIT` when exploring data to avoid accidentally pulling millions of rows.

ClickHouse{% sup %}®{% /sup %} returns results in a default tabular format, but you can specify different output formats using the `FORMAT` clause. For example, `FORMAT JSON` returns results as JSON objects, while `FORMAT TabSeparated` outputs tab-delimited text.

### Selecting specific columns

Selecting only the columns you need improves query performance, especially on wide tables. ClickHouse{% sup %}®{% /sup %}'s columnar storage means it only reads the columns referenced in your query, achieving [85% compression rates](https://markaicode.com/time-series-databases-2025-comparison/) that significantly reduce I/O operations.

```sql
SELECT
    toDate(timestamp) AS event_date,
    user_id,
    event_name
FROM sample_events
LIMIT 100;
```

Column aliases using `AS` make results more readable. Functions like `toDate()` transform data types during the query, converting a `DateTime` to a `Date` by truncating the time portion.

## Core query patterns for filtering and ordering

Filtering and sorting data efficiently matters more in ClickHouse{% sup %}®{% /sup %} than in row-based databases. The order in which you filter data and the functions you use can [significantly affect query speed](https://www.tinybird.co/blog-posts/5-rules-for-writing-faster-sql-queries).

### Where clause basics

The `WHERE` clause filters rows based on conditions. ClickHouse{% sup %}®{% /sup %} offers specialized functions that work better with its columnar storage than generic SQL functions.

```sql
SELECT user_id, event_name, timestamp
FROM sample_events
WHERE event_name = 'page_view'
  AND timestamp >= '2024-01-01'
  AND user_id > 1000;
```

String matching functions like `startsWith()`, `endsWith()`, and `match()` often perform better than SQL `LIKE` patterns. Date filtering using comparison operators works with both `Date` and `DateTime` types.

For date-based filtering, ClickHouse{% sup %}®{% /sup %} provides functions that convert dates to integers for faster comparisons:

```sql
SELECT user_id, COUNT(*) AS event_count
FROM sample_events
WHERE startsWith(event_name, 'purchase')
  AND toYYYYMM(timestamp) = 202401
GROUP BY user_id;
```

The `toYYYYMM()` function converts dates to year-month integers, which speeds up filtering when you're grouping by month.

### Order by vs limit by

ClickHouse{% sup %}®{% /sup %} includes a `LIMIT BY` clause that doesn't exist in standard SQL. While `ORDER BY` sorts all results and `LIMIT` restricts total rows, `LIMIT BY` returns the top N rows per group.

```sql
SELECT user_id, event_name, timestamp
FROM sample_events
ORDER BY timestamp DESC
LIMIT 3 BY user_id;
```

This query returns up to 3 events for each distinct `user_id`, ordered by timestamp. You can think of `LIMIT BY` as a more efficient alternative to window functions for top-N-per-group queries.

The difference matters for performance: `LIMIT BY` can process data as it reads, while window functions typically require more memory and processing.

### Using sample for fast prototyping

The `SAMPLE` clause lets you query a fraction of your data, which speeds up development and testing on large tables. ClickHouse{% sup %}®{% /sup %} uses deterministic sampling, so the same sample returns consistent results.

```sql
SELECT event_name, COUNT(*) AS count
FROM sample_events
SAMPLE 0.1
GROUP BY event_name;
```

This query processes approximately 10% of the table. The sampling is deterministic based on the primary key, so repeated queries return the same rows. You can also specify an exact number of rows: `SAMPLE 10000` returns roughly 10,000 rows.

## Aggregations and group by at scale

ClickHouse{% sup %}®{% /sup %} aggregation functions are optimized for processing billions of rows quickly. Many functions offer approximate versions that trade precision for speed, which is often acceptable for analytics use cases.

### Count distinct tricks

[Counting unique values](https://www.tinybird.co/blog-posts/the-simplest-way-to-count-100-billion-unique-ids-part-1) is expensive in any database. ClickHouse{% sup %}®{% /sup %} provides several functions with different speed and accuracy tradeoffs:

- `uniq()`: Fast approximate count using HyperLogLog, typically within 2% accuracy
- `uniqExact()`: Precise count but slower and more memory-intensive
- `uniqCombined()`: Balanced approach using multiple algorithms for better accuracy than `uniq()`

```sql
SELECT
    uniq(user_id) AS approx_users,
    uniqExact(user_id) AS exact_users,
    uniqCombined(user_id) AS combined_users
FROM sample_events;
```

For most analytics, `uniq()` provides enough accuracy while being 5-10x faster than `uniqExact()`. Use `uniqExact()` only when you need guaranteed precision, such as for billing or compliance reporting.

### Approximate functions

Beyond unique counts, ClickHouse{% sup %}®{% /sup %} offers approximate functions for percentiles, top values, and other statistical calculations.

```sql
SELECT
    event_name,
    quantile(0.95)(event_value) AS p95_value,
    topK(5)(user_id) AS top_5_users
FROM sample_events
GROUP BY event_name;
```

The `quantile()` function calculates percentiles using sampling, which is much faster than sorting all values. The `topK()` function returns the K most frequent values using a space-efficient algorithm. These functions work well for dashboards and monitoring where approximate answers delivered quickly are more valuable than exact answers delivered slowly. ClickHouse{% sup %}®{% /sup %} can process aggregations in 75ms on billion-row datasets.

### With totals for rollups

The `WITH TOTALS` modifier adds a summary row to your results, which is useful for creating subtotals and grand totals in a single query.

```sql
SELECT
    event_name,
    COUNT(*) AS event_count,
    AVG(event_value) AS avg_value
FROM sample_events
GROUP BY event_name
WITH TOTALS;
```

The final row in the result contains aggregates across all groups. This eliminates the need for a separate query to calculate overall totals.

## Joining data the ClickHouse{% sup %}®{% /sup %} way

Joins in ClickHouse{% sup %}®{% /sup %} work differently than in traditional databases due to its distributed and columnar architecture. Understanding these differences helps you [write faster queries](https://www.tinybird.co/blog-posts/tips-6-faster-joins-without-joining-using-where).

### Inner vs left join

ClickHouse{% sup %}®{% /sup %} supports standard SQL join syntax, but join order matters significantly for performance. The table on the right side of the join is loaded into memory, so [put the smaller table there](https://www.tinybird.co/blog-posts/clickhouse-joins-improvements).

```sql
SELECT
    e.user_id,
    e.event_name,
    u.user_name
FROM sample_events e
INNER JOIN users u ON e.user_id = u.user_id
WHERE e.timestamp >= '2024-01-01';
```

Always filter the larger table before joining when possible. Moving the `WHERE` clause to filter `sample_events` before the join reduces the number of rows that need to be matched. [LEFT JOIN](https://www.tinybird.co/blog-posts/clickhouse-left-join-example) works similarly but includes all rows from the left table even when there's no match.

### Array join for nested data

The `ARRAY JOIN` clause flattens array columns into multiple rows, which is common when working with nested or semi-structured data.

```sql
SELECT
    user_id,
    arrayJoin(tags) AS tag
FROM user_profiles
WHERE tag = 'premium';
```

This query converts an array of tags for each user into separate rows, one per tag. You can then filter, group, or join on the flattened values. `ARRAY JOIN` is particularly useful for JSON data that contains arrays.

### Global join on distributed tables

When querying distributed tables across multiple servers, regular joins only see data on each individual server. The `GLOBAL` keyword broadcasts the right table to all servers.

```sql
SELECT
    e.user_id,
    e.event_count,
    u.signup_date
FROM distributed_events e
GLOBAL INNER JOIN users u ON e.user_id = u.user_id;
```

`GLOBAL JOIN` has performance costs because it transfers the entire right table to every server. Use it only when necessary, and always put the smaller table on the right side.

## Sampling limit by and other ClickHouse{% sup %}®{% /sup %} specific clauses

ClickHouse{% sup %}®{% /sup %} extends standard SQL with clauses designed specifically for analytical queries. These extensions often provide simpler syntax and better performance than standard SQL alternatives.

### Limit by pattern

The `LIMIT BY` clause returns the top N rows for each group without requiring window functions or subqueries.

```sql
SELECT
    user_id,
    event_name,
    timestamp,
    event_value
FROM sample_events
ORDER BY event_value DESC
LIMIT 5 BY user_id;
```

This returns the 5 highest-value events for each user. The query processes data efficiently because it doesn't need to rank all rows or create temporary tables. You can combine `LIMIT BY` with `OFFSET` to skip rows: `LIMIT 5 OFFSET 2 BY user_id` returns rows 3-7 for each user.

### Sample clause syntax

Beyond the percentage-based sampling shown earlier, `SAMPLE` accepts different formats for controlling how much data to read.

```sql
-- Sample 10% of data
SELECT COUNT(*) FROM sample_events SAMPLE 0.1;

-- Sample approximately 1 million rows
SELECT COUNT(*) FROM sample_events SAMPLE 1000000;

-- Sample with offset for A/B testing
SELECT COUNT(*) FROM sample_events SAMPLE 1/2 OFFSET 1/2;
```

The offset syntax divides data into segments, which is useful for splitting datasets for parallel processing or creating consistent test/control groups.

### Final keyword caveats

The `FINAL` modifier forces ClickHouse{% sup %}®{% /sup %} to merge all parts of a `ReplacingMergeTree` or `CollapsingMergeTree` table, returning only the final version of each row.

```sql
SELECT user_id, user_name, last_login
FROM users FINAL
WHERE user_id = 12345;
```

While `FINAL` simplifies queries by handling deduplication automatically, it significantly slows down query execution. The database must merge data at query time instead of using pre-merged parts. Alternatives include using materialized views to pre-aggregate data or designing schemas that avoid the need for `FINAL`.

## Optimizing query speed with settings and final

Query performance in ClickHouse{% sup %}®{% /sup %} depends on both how you write SQL and the settings you use. A few key settings control memory usage, parallelism, and query behavior.

The `max_threads` setting controls how many CPU cores ClickHouse{% sup %}®{% /sup %} uses for a query. By default, ClickHouse{% sup %}®{% /sup %} uses all available cores, but you can limit this for queries that don't need maximum speed:

```sql
SELECT COUNT(*) FROM sample_events
SETTINGS max_threads = 4;
```

The `max_memory_usage` setting prevents queries from consuming too much RAM. When a query exceeds this limit, ClickHouse{% sup %}®{% /sup %} stops execution and returns an error:

```sql
SELECT user_id, COUNT(*) FROM sample_events
GROUP BY user_id
SETTINGS max_memory_usage = 10000000000;
```

For queries on tables with frequent updates, avoid `FINAL` when possible. Instead, use `GROUP BY` with `argMax()` to get the latest value for each key:

```sql
SELECT
    user_id,
    argMax(user_name, updated_at) AS current_name
FROM users
GROUP BY user_id;
```

This approach processes data faster than `FINAL` because it uses ClickHouse{% sup %}®{% /sup %}'s aggregation optimizations. Common optimization patterns include:

- **Filter early:** Apply `WHERE` clauses before `JOIN` operations to reduce data volume
- **Select fewer columns:** Only retrieve columns you need, especially on wide tables
- **Use primary key order:** Query conditions that match the `ORDER BY` clause run faster

## Troubleshooting slow queries in production

When queries run slower than expected, ClickHouse{% sup %}®{% /sup %} provides system tables that show exactly what's happening. These tables contain query logs, performance metrics, and resource usage data.

### Using system query_log

The `system.query_log` table records every query executed on the server, including execution time, memory usage, and rows processed.

```sql
SELECT
    query_duration_ms,
    query,
    read_rows,
    read_bytes,
    memory_usage
FROM system.query_log
WHERE query_duration_ms > 1000
  AND type = 'QueryFinish'
ORDER BY query_duration_ms DESC
LIMIT 10;
```

This query finds the 10 slowest queries that took over 1 second. The `read_rows` and `read_bytes` columns show how much data each query processed, which often explains slow performance. You can filter by user, query type, or time range to [investigate specific performance issues](https://www.tinybird.co/blog-posts/eating-our-own-dog-food-how-we-investigate-performance-bottlenecks-using-our-product-and-google-sheets).

### Interpreting system metrics

The `system.metrics` table provides real-time counters for server activity, while `system.events` shows cumulative statistics since server startup.

```sql
SELECT
    metric,
    value
FROM system.metrics
WHERE metric LIKE '%Memory%'
   OR metric LIKE '%Query%';
```

Key metrics to monitor include:

- `MemoryTracking`: Current memory usage across all queries
- `Query`: Number of queries currently executing
- `BackgroundPoolTask`: Background merge and mutation activity

High values for `MemoryTracking` indicate queries using too much RAM, while many `BackgroundPoolTask` entries suggest heavy write activity that might slow down reads.

## Turning a query into a realtime API with Tinybird

Once you've written a ClickHouse{% sup %}®{% /sup %} query that answers an analytical question, you often need to expose that query as an API endpoint so you can get the results of the query in your application. Tinybird makes this process simple by deploying SQL queries into parameterized REST APIs. You can also connect directly from your application using Python (see our [ClickHouse{% sup %}®{% /sup %} Python client guide](https://www.tinybird.co/blog-posts/clickhouse-python-example)), Java, or Go client libraries.

### Create data source

Tinybird data sources are ClickHouse{% sup %}®{% /sup %} tables that you can populate using the CLI, API, or streaming connectors. To follow these examples, [sign up for a free Tinybird account](https://cloud.tinybird.co/signup) and install the CLI.

First, create a sample data file called `events.ndjson`:

```json
{"timestamp":"2024-01-15 10:23:45","user_id":1001,"event_name":"page_view","event_value":1.5}
{"timestamp":"2024-01-15 10:24:12","user_id":1002,"event_name":"purchase","event_value":49.99}
{"timestamp":"2024-01-15 10:25:03","user_id":1001,"event_name":"add_to_cart","event_value":29.99}
```

Then create and populate a data source:

```bash
tb datasource create events --file events.ndjson
tb datasource append events --file events.ndjson
```

Tinybird infers the schema from your NDJSON data, creating appropriate column types automatically. You can also define schemas explicitly using `.datasource` files for more control.

### Pipe file with parameterized SQL

Pipes are Tinybird's way of defining SQL queries that can be deployed as API endpoints. Create a file called `top_events.pipe`:

```tinybird
TOKEN top_events_read READ

DESCRIPTION >
    Get top events by value for a specific user

NODE endpoint
SQL >
    %
    SELECT
        event_name,
        SUM(event_value) AS total_value,
        COUNT(*) AS event_count
    FROM events
    WHERE user_id = {{ Int64(user_id, 1001) }}
      AND timestamp >= {{ DateTime(start_date, '2024-01-01 00:00:00') }}
    GROUP BY event_name
    ORDER BY total_value DESC
    LIMIT {{ Int32(limit, 10) }}

TYPE endpoint
```

The `{{ }}` syntax defines query parameters with types and default values. Tinybird validates these parameters and automatically generates API documentation.

### Deploy and call the API

```bash
tb --cloud deploy
```

Tinybird returns a URL for your new API endpoint. You can test it immediately:

```bash
curl "https://api.tinybird.co/v0/pipes/top_events.json?user_id=1001&start_date=2024-01-01&token=YOUR_TOKEN"
```

The API returns JSON with your query results, including metadata about rows processed and query execution time. You can call this endpoint from any application without managing database connections or worrying about connection pooling.

## Why developer experience beats database operations

Managing ClickHouse{% sup %}®{% /sup %} infrastructure involves cluster configuration, replication setup, backup management, and performance tuning. These tasks take time away from building features that directly benefit users.

Tinybird provides [managed ClickHouse{% sup %}®{% /sup %} infrastructure](https://www.tinybird.co/product/managed-clickhouse) so developers can focus on writing queries and building APIs instead of managing servers. The platform handles cluster scaling, backup automation, and performance optimization automatically. The API-first approach means you don't write custom backend code to expose ClickHouse{% sup %}®{% /sup %} queries.

Here's how the approaches compare:

| Self-hosted ClickHouse{% sup %}®{% /sup %} | Tinybird |
|---|---|
| Manual cluster setup and configuration | Instant workspace creation |
| Custom API development for each query | SQL pipes deploy as APIs automatically |
| Manual scaling and performance tuning | Automatic scaling based on usage |
| DIY monitoring and alerting | Built-in observability and query analytics |

For teams building analytics features into applications, Tinybird eliminates weeks of infrastructure work. You can go from raw data to production API in hours instead of months. [Sign up for a free Tinybird account](https://cloud.tinybird.co/signup) to start building ClickHouse{% sup %}®{% /sup %}-powered APIs without managing infrastructure.

## FAQs about ClickHouse{% sup %}®{% /sup %} queries

### Can I use window functions in ClickHouse{% sup %}®{% /sup %}?

Yes, ClickHouse{% sup %}®{% /sup %} supports window functions like `ROW_NUMBER()`, `RANK()`, `LAG()`, and `LEAD()` with `OVER()` clauses. They work similarly to standard SQL but benefit from ClickHouse{% sup %}®{% /sup %}'s columnar storage for better performance on large datasets.

### Does ClickHouse{% sup %}®{% /sup %} support multi-statement transactions?

No, ClickHouse{% sup %}®{% /sup %} doesn't support traditional ACID transactions across multiple statements. Each `INSERT` or `ALTER` operation is atomic, but you can't group multiple operations into a single transaction with rollback capability.

### How do I secure a Tinybird API endpoint?

Tinybird API endpoints use token-based authentication with different permission levels. You can create read-only tokens for public APIs, write tokens for data ingestion, or admin tokens for schema changes. Tokens are included in the URL or headers of API requests./
