Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

ClickHouse® queries use SQL syntax extended with specialized functions and clauses built for columnar storage and analytical workloads at scale. The database supports standard SELECT, JOIN, and GROUP BY operations while adding features like LIMIT BY for top-N-per-group queries, SAMPLE for fast prototyping on large datasets, and approximate aggregation functions that trade precision for speed.

This guide covers the query patterns you'll use most often when working with ClickHouse, from basic filtering and aggregation to advanced optimization techniques and troubleshooting slow queries.

Writing your first ClickHouse select

ClickHouse queries use SQL syntax with extensions built for columnar storage and analytical workloads. A basic SELECT statement retrieves data from tables, and you can control output format, limit rows, and filter columns using standard SQL clauses.

To start querying ClickHouse, you first need a table with data. Here's how to create a simple events table and run your first queries.

Create table sample_events

The CREATE TABLE statement defines the schema for storing data. In ClickHouse, you specify column names, data types, and a table engine that determines how data is stored and queried.

CREATE TABLE sample_events (
    timestamp DateTime,
    user_id UInt64,
    event_name String,
    event_value Float64
) ENGINE = MergeTree()
ORDER BY (timestamp, user_id);

The MergeTree engine is the most common choice for analytical queries. The ORDER BY clause defines the primary key, which determines how data is sorted on disk and affects query performance.

Basic select star

The SELECT * syntax retrieves all columns from a table. This works well for exploring data but can be inefficient on large tables with many columns.

SELECT * FROM sample_events LIMIT 10;

The LIMIT clause restricts the number of rows returned. In production queries, always use LIMIT when exploring data to avoid accidentally pulling millions of rows.

ClickHouse returns results in a default tabular format, but you can specify different output formats using the FORMAT clause. For example, FORMAT JSON returns results as JSON objects, while FORMAT TabSeparated outputs tab-delimited text.

Selecting specific columns

Selecting only the columns you need improves query performance, especially on wide tables. ClickHouse's columnar storage means it only reads the columns referenced in your query, achieving 85% compression rates that significantly reduce I/O operations.

SELECT 
    toDate(timestamp) AS event_date,
    user_id,
    event_name
FROM sample_events
LIMIT 100;

Column aliases using AS make results more readable. Functions like toDate() transform data types during the query, converting a DateTime to a Date by truncating the time portion.

Ship real-time data features faster!

Tinybird turns raw data into real-time APIs so companies like Vercel, Canva and Framer can deliver instant insights at scale.

Get started for free

Core query patterns for filtering and ordering

Filtering and sorting data efficiently matters more in ClickHouse than in row-based databases. The order in which you filter data and the functions you use can significantly affect query speed.

Where clause basics

The WHERE clause filters rows based on conditions. ClickHouse offers specialized functions that work better with its columnar storage than generic SQL functions.

SELECT user_id, event_name, timestamp
FROM sample_events
WHERE event_name = 'page_view'
  AND timestamp >= '2024-01-01'
  AND user_id > 1000;

String matching functions like startsWith(), endsWith(), and match() often perform better than SQL LIKE patterns. Date filtering using comparison operators works with both Date and DateTime types.

For date-based filtering, ClickHouse provides functions that convert dates to integers for faster comparisons:

SELECT user_id, COUNT(*) AS event_count
FROM sample_events
WHERE startsWith(event_name, 'purchase')
  AND toYYYYMM(timestamp) = 202401
GROUP BY user_id;

The toYYYYMM() function converts dates to year-month integers, which speeds up filtering when you're grouping by month.

Order by vs limit by

ClickHouse includes a LIMIT BY clause that doesn't exist in standard SQL. While ORDER BY sorts all results and LIMIT restricts total rows, LIMIT BY returns the top N rows per group.

SELECT user_id, event_name, timestamp
FROM sample_events
ORDER BY timestamp DESC
LIMIT 3 BY user_id;

This query returns up to 3 events for each distinct user_id, ordered by timestamp. You can think of LIMIT BY as a more efficient alternative to window functions for top-N-per-group queries.

The difference matters for performance: LIMIT BY can process data as it reads, while window functions typically require more memory and processing.

Using sample for fast prototyping

The SAMPLE clause lets you query a fraction of your data, which speeds up development and testing on large tables. ClickHouse uses deterministic sampling, so the same sample returns consistent results.

SELECT event_name, COUNT(*) AS count
FROM sample_events
SAMPLE 0.1
GROUP BY event_name;

This query processes approximately 10% of the table. The sampling is deterministic based on the primary key, so repeated queries return the same rows. You can also specify an exact number of rows: SAMPLE 10000 returns roughly 10,000 rows.

Aggregations and group by at scale

ClickHouse aggregation functions are optimized for processing billions of rows quickly. Many functions offer approximate versions that trade precision for speed, which is often acceptable for analytics use cases.

Count distinct tricks

Counting unique values is expensive in any database. ClickHouse provides several functions with different speed and accuracy tradeoffs:

uniq(): Fast approximate count using HyperLogLog, typically within 2% accuracy
uniqExact(): Precise count but slower and more memory-intensive
uniqCombined(): Balanced approach using multiple algorithms for better accuracy than uniq()

SELECT 
    uniq(user_id) AS approx_users,
    uniqExact(user_id) AS exact_users,
    uniqCombined(user_id) AS combined_users
FROM sample_events;

For most analytics, uniq() provides enough accuracy while being 5-10x faster than uniqExact(). Use uniqExact() only when you need guaranteed precision, such as for billing or compliance reporting.

Approximate functions

Beyond unique counts, ClickHouse offers approximate functions for percentiles, top values, and other statistical calculations.

SELECT 
    event_name,
    quantile(0.95)(event_value) AS p95_value,
    topK(5)(user_id) AS top_5_users
FROM sample_events
GROUP BY event_name;

The quantile() function calculates percentiles using sampling, which is much faster than sorting all values. The topK() function returns the K most frequent values using a space-efficient algorithm. These functions work well for dashboards and monitoring where approximate answers delivered quickly are more valuable than exact answers delivered slowly. ClickHouse can process aggregations in 75ms on billion-row datasets.

With totals for rollups

The WITH TOTALS modifier adds a summary row to your results, which is useful for creating subtotals and grand totals in a single query.

SELECT 
    event_name,
    COUNT(*) AS event_count,
    AVG(event_value) AS avg_value
FROM sample_events
GROUP BY event_name
WITH TOTALS;

The final row in the result contains aggregates across all groups. This eliminates the need for a separate query to calculate overall totals.

Joining data the ClickHouse way

Joins in ClickHouse work differently than in traditional databases due to its distributed and columnar architecture. Understanding these differences helps you write faster queries.

Inner vs left join

ClickHouse supports standard SQL join syntax, but join order matters significantly for performance. The table on the right side of the join is loaded into memory, so put the smaller table there.

SELECT 
    e.user_id,
    e.event_name,
    u.user_name
FROM sample_events e
INNER JOIN users u ON e.user_id = u.user_id
WHERE e.timestamp >= '2024-01-01';

Always filter the larger table before joining when possible. Moving the WHERE clause to filter sample_events before the join reduces the number of rows that need to be matched. LEFT JOIN works similarly but includes all rows from the left table even when there's no match.

Array join for nested data

The ARRAY JOIN clause flattens array columns into multiple rows, which is common when working with nested or semi-structured data.

SELECT 
    user_id,
    arrayJoin(tags) AS tag
FROM user_profiles
WHERE tag = 'premium';

This query converts an array of tags for each user into separate rows, one per tag. You can then filter, group, or join on the flattened values. ARRAY JOIN is particularly useful for JSON data that contains arrays.

Global join on distributed tables

When querying distributed tables across multiple servers, regular joins only see data on each individual server. The GLOBAL keyword broadcasts the right table to all servers.

SELECT 
    e.user_id,
    e.event_count,
    u.signup_date
FROM distributed_events e
GLOBAL INNER JOIN users u ON e.user_id = u.user_id;

GLOBAL JOIN has performance costs because it transfers the entire right table to every server. Use it only when necessary, and always put the smaller table on the right side.

Sampling limit by and other ClickHouse specific clauses

ClickHouse extends standard SQL with clauses designed specifically for analytical queries. These extensions often provide simpler syntax and better performance than standard SQL alternatives.

Limit by pattern

The LIMIT BY clause returns the top N rows for each group without requiring window functions or subqueries.

SELECT 
    user_id,
    event_name,
    timestamp,
    event_value
FROM sample_events
ORDER BY event_value DESC
LIMIT 5 BY user_id;

This returns the 5 highest-value events for each user. The query processes data efficiently because it doesn't need to rank all rows or create temporary tables. You can combine LIMIT BY with OFFSET to skip rows: LIMIT 5 OFFSET 2 BY user_id returns rows 3-7 for each user.

Sample clause syntax

Beyond the percentage-based sampling shown earlier, SAMPLE accepts different formats for controlling how much data to read.

-- Sample 10% of data
SELECT COUNT(*) FROM sample_events SAMPLE 0.1;

-- Sample approximately 1 million rows
SELECT COUNT(*) FROM sample_events SAMPLE 1000000;

-- Sample with offset for A/B testing
SELECT COUNT(*) FROM sample_events SAMPLE 1/2 OFFSET 1/2;

The offset syntax divides data into segments, which is useful for splitting datasets for parallel processing or creating consistent test/control groups.

Final keyword caveats

The FINAL modifier forces ClickHouse to merge all parts of a ReplacingMergeTree or CollapsingMergeTree table, returning only the final version of each row.

SELECT user_id, user_name, last_login
FROM users FINAL
WHERE user_id = 12345;

While FINAL simplifies queries by handling deduplication automatically, it significantly slows down query execution. The database must merge data at query time instead of using pre-merged parts. Alternatives include using materialized views to pre-aggregate data or designing schemas that avoid the need for FINAL.

Optimizing query speed with settings and final

Query performance in ClickHouse depends on both how you write SQL and the settings you use. A few key settings control memory usage, parallelism, and query behavior.

The max_threads setting controls how many CPU cores ClickHouse uses for a query. By default, ClickHouse uses all available cores, but you can limit this for queries that don't need maximum speed:

SELECT COUNT(*) FROM sample_events
SETTINGS max_threads = 4;

The max_memory_usage setting prevents queries from consuming too much RAM. When a query exceeds this limit, ClickHouse stops execution and returns an error:

SELECT user_id, COUNT(*) FROM sample_events
GROUP BY user_id
SETTINGS max_memory_usage = 10000000000;

For queries on tables with frequent updates, avoid FINAL when possible. Instead, use GROUP BY with argMax() to get the latest value for each key:

SELECT 
    user_id,
    argMax(user_name, updated_at) AS current_name
FROM users
GROUP BY user_id;

This approach processes data faster than FINAL because it uses ClickHouse's aggregation optimizations. Common optimization patterns include:

Filter early: Apply WHERE clauses before JOIN operations to reduce data volume
Select fewer columns: Only retrieve columns you need, especially on wide tables
Use primary key order: Query conditions that match the ORDER BY clause run faster

Troubleshooting slow queries in production

When queries run slower than expected, ClickHouse provides system tables that show exactly what's happening. These tables contain query logs, performance metrics, and resource usage data.

Using system query_log

The system.query_log table records every query executed on the server, including execution time, memory usage, and rows processed.

SELECT 
    query_duration_ms,
    query,
    read_rows,
    read_bytes,
    memory_usage
FROM system.query_log
WHERE query_duration_ms > 1000
  AND type = 'QueryFinish'
ORDER BY query_duration_ms DESC
LIMIT 10;

This query finds the 10 slowest queries that took over 1 second. The read_rows and read_bytes columns show how much data each query processed, which often explains slow performance. You can filter by user, query type, or time range to investigate specific performance issues.

Interpreting system metrics

The system.metrics table provides real-time counters for server activity, while system.events shows cumulative statistics since server startup.

SELECT 
    metric,
    value
FROM system.metrics
WHERE metric LIKE '%Memory%'
   OR metric LIKE '%Query%';

Key metrics to monitor include:

MemoryTracking: Current memory usage across all queries
Query: Number of queries currently executing
BackgroundPoolTask: Background merge and mutation activity

High values for MemoryTracking indicate queries using too much RAM, while many BackgroundPoolTask entries suggest heavy write activity that might slow down reads.

Turning a query into a realtime API with Tinybird

Once you've written a ClickHouse query that answers an analytical question, you often need to expose that query as an API endpoint so you can get the results of the query in your application. Tinybird makes this process simple by deploying SQL queries into parameterized REST APIs. You can also connect directly from your application using Python, Java, or Go client libraries.

Create data source

Tinybird data sources are ClickHouse tables that you can populate using the CLI, API, or streaming connectors. To follow these examples, sign up for a free Tinybird account and install the CLI.

First, create a sample data file called events.ndjson:

{"timestamp":"2024-01-15 10:23:45","user_id":1001,"event_name":"page_view","event_value":1.5}
{"timestamp":"2024-01-15 10:24:12","user_id":1002,"event_name":"purchase","event_value":49.99}
{"timestamp":"2024-01-15 10:25:03","user_id":1001,"event_name":"add_to_cart","event_value":29.99}

Then create and populate a data source:

tb datasource create events --file events.ndjson
tb datasource append events --file events.ndjson

Tinybird infers the schema from your NDJSON data, creating appropriate column types automatically. You can also define schemas explicitly using .datasource files for more control.

Pipe file with parameterized SQL

Pipes are Tinybird's way of defining SQL queries that can be deployed as API endpoints. Create a file called top_events.pipe:

TOKEN top_events_read READ

DESCRIPTION >
    Get top events by value for a specific user

NODE endpoint
SQL >
    %
    SELECT 
        event_name,
        SUM(event_value) AS total_value,
        COUNT(*) AS event_count
    FROM events
    WHERE user_id = {{ Int64(user_id, 1001) }}
      AND timestamp >= {{ DateTime(start_date, '2024-01-01 00:00:00') }}
    GROUP BY event_name
    ORDER BY total_value DESC
    LIMIT {{ Int32(limit, 10) }}

TYPE endpoint

The {{ }} syntax defines query parameters with types and default values. Tinybird validates these parameters and automatically generates API documentation.

Deploy and call the API

tb --cloud deploy

Tinybird returns a URL for your new API endpoint. You can test it immediately:

curl "https://api.tinybird.co/v0/pipes/top_events.json?user_id=1001&start_date=2024-01-01&token=YOUR_TOKEN"

The API returns JSON with your query results, including metadata about rows processed and query execution time. You can call this endpoint from any application without managing database connections or worrying about connection pooling.

Why developer experience beats database operations

Managing ClickHouse infrastructure involves cluster configuration, replication setup, backup management, and performance tuning. These tasks take time away from building features that directly benefit users.

Tinybird provides managed ClickHouse infrastructure so developers can focus on writing queries and building APIs instead of managing servers. The platform handles cluster scaling, backup automation, and performance optimization automatically. The API-first approach means you don't write custom backend code to expose ClickHouse queries.

Here's how the approaches compare:

Self-hosted ClickHouse	Tinybird
Manual cluster setup and configuration	Instant workspace creation
Custom API development for each query	SQL pipes deploy as APIs automatically
Manual scaling and performance tuning	Automatic scaling based on usage
DIY monitoring and alerting	Built-in observability and query analytics

For teams building analytics features into applications, Tinybird eliminates weeks of infrastructure work. You can go from raw data to production API in hours instead of months. Sign up for a free Tinybird account to start building ClickHouse-powered APIs without managing infrastructure.

Subscribe to SCHEMA > Evolution

We are Tinybird and we manage data for companies like Vercel and Canva. Plus, write a newsletter covering Data, AI and everything that matters in between. Join us.

FAQs about ClickHouse queries

Can I use window functions in ClickHouse?

Yes, ClickHouse supports window functions like ROW_NUMBER(), RANK(), LAG(), and LEAD() with OVER() clauses. They work similarly to standard SQL but benefit from ClickHouse's columnar storage for better performance on large datasets.

Does ClickHouse support multi-statement transactions?

No, ClickHouse doesn't support traditional ACID transactions across multiple statements. Each INSERT or ALTER operation is atomic, but you can't group multiple operations into a single transaction with rollback capability.

How do I secure a Tinybird API endpoint?

Tinybird API endpoints use token-based authentication with different permission levels. You can create read-only tokens for public APIs, write tokens for data ingestion, or admin tokens for schema changes. Tokens are included in the URL or headers of API requests./

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

Our Columns:

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

ClickHouse® query examples: How to query the world's fastest OLAP database?

Writing your first ClickHouse select

Create table sample_events

Basic select star

Selecting specific columns

Core query patterns for filtering and ordering

Where clause basics

Order by vs limit by

Using sample for fast prototyping

Aggregations and group by at scale

Count distinct tricks

Approximate functions

With totals for rollups

Joining data the ClickHouse way

Inner vs left join

Array join for nested data

Global join on distributed tables

Sampling limit by and other ClickHouse specific clauses

Limit by pattern

Sample clause syntax

Final keyword caveats

Optimizing query speed with settings and final

Troubleshooting slow queries in production

Using system query_log

Interpreting system metrics

Turning a query into a realtime API with Tinybird

Create data source

Pipe file with parameterized SQL

Deploy and call the API

Why developer experience beats database operations

FAQs about ClickHouse queries

Can I use window functions in ClickHouse?

Does ClickHouse support multi-statement transactions?

How do I secure a Tinybird API endpoint?

Skip the infra work. Deploy your first ClickHouse project now

Product /

Company /

Resources /

Integrations /

Use Cases /

Skip the infra work. Deploy your first ClickHouse
project now

Skip the infra work. Deploy your first ClickHouse
project now

Skip the infra work. Deploy your first ClickHouse
project now

ClickHouse^® query examples: How to query the world's fastest OLAP database?

Skip the infra work. Deploy your first ClickHouse
project now