Creating a table in ClickHouse requires three components: column definitions with data types, a table engine (usually MergeTree), and an ORDER BY clause that determines how data is physically sorted on disk. This combination controls both how your data is stored and how efficiently ClickHouse can query it.
This guide walks through the complete CREATE TABLE
syntax with working examples, explains how to choose the right engine and sorting keys, and shows how to deploy the same table structure using Tinybird's managed ClickHouse platform.
What you need before creating your first ClickHouse table
To create a table in ClickHouse, you use the CREATE TABLE
statement with column definitions, data types, and a table engine. The MergeTree engine is the most common choice for general-purpose analytics tables because it handles large datasets efficiently and supports features like primary keys and partitioning.
Before writing your first CREATE TABLE
statement, you'll need access to a ClickHouse instance, either through a local installation, a cloud service, or a managed platform like Tinybird. You'll also need a database where the table will live, which you can create with CREATE DATABASE database_name
if one doesn't already exist.
Minimal create table statement that always works
The simplest working CREATE TABLE
statement includes three required components: column definitions with types, a table engine, and an ORDER BY
clause. Here's a minimal example that creates an events table:
CREATE TABLE events (
event_id UInt64,
timestamp DateTime,
user_id String
)
ENGINE = MergeTree
ORDER BY event_id;
This statement creates a table named events
with three columns, uses the MergeTree engine for storage, and orders data by event_id
. You can copy and paste this into any ClickHouse client or console to create your first table.
1. Define columns with types
Each column in a ClickHouse table requires a name and a data type. Common types include String
for text, UInt32
or UInt64
for unsigned integers, DateTime
for timestamps, and Float64
for decimal numbers.
Here's how different column types look in practice:
CREATE TABLE user_events (
user_id UInt32,
email String,
signup_date DateTime,
revenue Float64,
is_active UInt8
)
ENGINE = MergeTree
ORDER BY user_id;
The UInt8
type works well for boolean flags because it stores values 0 or 1 efficiently. For larger integer values, UInt32
handles numbers up to about 4 billion, while UInt64
supports even larger ranges.
2. Pick an engine
The table engine determines how data is stored and queried. MergeTree is the default choice for most analytical workloads because it provides fast queries, efficient compression, and support for primary keys and partitioning.
Other engines serve specific purposes:
- MergeTree: General-purpose analytics with fast queries and efficient compression.
- ReplacingMergeTree: Automatic deduplication based on primary key.
- SummingMergeTree: Automatic aggregation of numeric columns during background merges.
- AggregatingMergeTree: Pre-aggregated metrics storage.
For most use cases, starting with MergeTree is the right choice. You can always migrate to a specialized engine later if your requirements change.
3. Add order by
The ORDER BY
clause is required for MergeTree tables and determines how data is physically sorted on disk. This sorting directly affects query performance because ClickHouse can skip reading irrelevant data blocks when filtering on the ordered columns.
CREATE TABLE page_views (
user_id UInt32,
page_url String,
view_time DateTime
)
ENGINE = MergeTree
ORDER BY (user_id, view_time);
In this example, data is sorted first by user_id
, then by view_time
within each user. Queries that filter by user_id
or both user_id
and view_time
will run faster than queries filtering only by page_url
.
Choose a table engine and why MergeTree is the default
MergeTree serves as the foundation for most ClickHouse tables because it balances performance, flexibility, and features. The engine stores data in sorted parts that are periodically merged in the background, which enables both fast inserts and efficient queries.
Specialized MergeTree variants extend the base engine with additional capabilities. ReplacingMergeTree removes duplicate rows with the same primary key during background merges, which helps maintain unique records without requiring explicit deduplication queries. SummingMergeTree automatically sums numeric columns for rows with identical primary keys, reducing storage and speeding up aggregation queries.
Engine | Best for | Key behavior |
---|---|---|
MergeTree | General analytics, time-series data | Sorted storage with efficient queries |
ReplacingMergeTree | Event streams with duplicate records | Keeps latest version of duplicate rows |
SummingMergeTree | Pre-aggregated metrics | Automatically sums numeric columns |
AggregatingMergeTree | Complex pre-aggregations | Stores partial aggregate states |
For new projects, start with MergeTree unless you have a specific need for deduplication or aggregation. The base engine handles most analytical workloads without additional complexity.
Define order by, primary key, and partition by correctly
The ORDER BY
, PRIMARY KEY
, and PARTITION BY
clauses control how ClickHouse organizes data physically on disk. Understanding the relationship between these three concepts helps you design tables that query efficiently at scale.
The ORDER BY
clause determines the physical sort order of data within each data part. ClickHouse uses this ordering to skip reading irrelevant blocks during queries, which significantly speeds up filtering and aggregation. By default, the primary key matches the ORDER BY
expression unless you specify a different PRIMARY KEY
explicitly.
CREATE TABLE events (
user_id UInt32,
event_type String,
timestamp DateTime,
data String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp)
PRIMARY KEY user_id;
In this example, data is partitioned by month, sorted by both user_id
and timestamp
, but the primary key only includes user_id
. This configuration works well when most queries filter by user_id
, since the primary key index will be smaller and faster to scan.
Common partition keys for time-series data
Partitioning splits your table into separate physical directories on disk based on the partition key expression. This makes it efficient to drop old data or query recent time ranges without scanning the entire table.
For time-series data, common partition expressions include:
toYYYYMM(timestamp)
for monthly partitionstoDate(timestamp)
for daily partitionstoStartOfWeek(timestamp)
for weekly partitionstoYYYYMMDD(timestamp)
for daily partitions with a different format
Monthly partitions balance query performance with the overhead of managing partition files. Daily partitions work better for high-volume tables where you frequently drop old data or query recent days.
Primary key and order by best practices
The primary key determines which columns are indexed for fast lookups, while ORDER BY
controls the physical sort order. In most cases, both clauses match because the sorted order naturally provides an efficient index.
You might use a different primary key when your ORDER BY
includes many columns but most queries only filter on the first few. A shorter primary key means a smaller index that fits in memory more easily, which speeds up query planning.
CREATE TABLE logs (
server_id UInt16,
timestamp DateTime64(3),
level String,
message String
)
ENGINE = MergeTree
ORDER BY (server_id, timestamp, level)
PRIMARY KEY server_id;
Choose low-cardinality columns first in your ORDER BY
expression. Columns like server_id
or user_id
that have hundreds or thousands of distinct values work better than high-cardinality columns like timestamp
or message
.
Variants like if not exists, create table as, and on cluster
The CREATE TABLE IF NOT EXISTS
syntax prevents errors when running the same DDL statement multiple times. This makes deployment scripts idempotent, so you can safely re-run them without checking whether tables already exist.
CREATE TABLE IF NOT EXISTS metrics (
metric_name String,
value Float64,
recorded_at DateTime
)
ENGINE = MergeTree
ORDER BY (metric_name, recorded_at);
The CREATE TABLE AS
syntax copies the structure and optionally the data from another table or query result. For copying structure only, use CREATE TABLE events_backup AS events
. To copy both structure and data, add a SELECT
statement like CREATE TABLE events_2024 AS SELECT * FROM events WHERE toYear(timestamp) = 2024
.
For distributed ClickHouse clusters, the ON CLUSTER
clause creates the same table definition across all nodes in a named cluster:
CREATE TABLE events ON CLUSTER production_cluster (
event_id UInt64,
timestamp DateTime
)
ENGINE = MergeTree
ORDER BY event_id;
Insert sample data and run a sanity query
After creating a table, inserting sample data verifies that the table structure works as expected. The INSERT INTO
statement adds rows using either the VALUES
clause for small datasets or the FORMAT
clause for bulk loading.
INSERT INTO events (event_id, timestamp, user_id)
VALUES
(1, '2024-01-15 10:30:00', 'user_123'),
(2, '2024-01-15 10:31:00', 'user_456'),
(3, '2024-01-15 10:32:00', 'user_789');
This inserts three rows into the events
table. For larger datasets, the FORMAT
clause supports formats like CSV, JSON, or Parquet for efficient bulk loading.
Insert values syntax
The VALUES
clause works well for inserting a few rows manually or in application code. Each row appears in parentheses with values matching the column order:
INSERT INTO user_events (user_id, email, signup_date)
VALUES
(1, 'alice@example.com', '2024-01-01'),
(2, 'bob@example.com', '2024-01-02');
For JSON data, the FORMAT JSONEachRow
clause reads newline-delimited JSON where each line is a separate object:
INSERT INTO events FORMAT JSONEachRow
{"event_id": 1, "timestamp": "2024-01-15 10:30:00", "user_id": "user_123"}
{"event_id": 2, "timestamp": "2024-01-15 10:31:00", "user_id": "user_456"}
Query and verify row count
Running a simple SELECT
query confirms that your data inserted correctly. Start with COUNT(*)
to check the total number of rows, then verify a few actual rows to confirm the data looks correct:
SELECT COUNT(*) FROM events;
SELECT * FROM events ORDER BY event_id LIMIT 3;
Modify the table later with alter table
The ALTER TABLE
statement changes table structure after creation. This lets you add columns, drop columns, or modify settings without recreating the entire table and reloading all data.
ALTER TABLE events ADD COLUMN session_id String AFTER user_id;
This adds a new session_id
column positioned after the existing user_id
column. The AFTER
clause controls placement, though column order typically doesn't affect query performance in ClickHouse.
Add or remove columns safely
Adding columns is generally safe and fast because ClickHouse stores columns separately on disk. New columns don't require rewriting existing data, so the operation completes almost instantly even on large tables.
ALTER TABLE events ADD COLUMN country String DEFAULT 'US';
ALTER TABLE events DROP COLUMN session_id;
Dropping columns also happens quickly because ClickHouse simply marks the column as deleted without immediately removing the physical files. The actual disk space is reclaimed during the next background merge.
Create a temporary table for ephemeral analysis
Temporary tables exist only for the duration of your session and automatically disappear when you disconnect. This makes them useful for intermediate calculations, testing, or one-time analysis without cluttering your database with permanent tables.
CREATE TEMPORARY TABLE temp_aggregates (
user_id UInt32,
total_events UInt64
)
ENGINE = Memory;
INSERT INTO temp_aggregates
SELECT user_id, COUNT(*) as total_events
FROM events
GROUP BY user_id;
The Memory engine stores data entirely in RAM, which provides very fast reads and writes at the cost of durability.
From DDL to production: the same table in Tinybird
Tinybird is a managed ClickHouse service that handles infrastructure setup, scaling, and operations so you can focus on building features instead of managing clusters. Creating tables in Tinybird uses the same ClickHouse syntax you've learned, but with additional tooling for version control and deployment.
Instead of running CREATE TABLE
statements directly, Tinybird uses declarative .datasource files that define your table schema as code. This approach makes it easy to track changes in Git, test locally, and deploy to production with confidence.
Write the .datasource file
A .datasource
file defines your table structure using the same column types and engine options as raw ClickHouse DDL. Here's how the events
table looks in Tinybird syntax:
SCHEMA >
`event_id` UInt64 `json:$.event_id`,
`timestamp` DateTime `json:$.timestamp`,
`user_id` String `json:$.user_id`
ENGINE "MergeTree"
ENGINE_SORTING_KEY "event_id"
The SCHEMA
section lists columns and types, with inline JSON path mappings using the json:$.field_name
syntax to map JSON fields from ingestion APIs to each column. The ENGINE
and ENGINE_SORTING_KEY
map directly to the ENGINE
and ORDER BY
clauses in standard ClickHouse DDL. Save this as events.datasource
in your project directory.
Deploy and stream data
The Tinybird CLI deploys your data source definitions to the cloud and provides commands for appending data. First, install the CLI and authenticate:
curl -L tinybird.co | sh
tb login
Then deploy your project to create the table:
tb --cloud deploy
This creates the ClickHouse table in your Tinybird workspace. To append data, use the tb datasource append
command with a file containing sample data. For production workloads, Tinybird's Events API provides a streaming ingestion endpoint that accepts NDJSON over HTTP, letting your application send events in real-time without managing Kafka, batch jobs, or other ingestion infrastructure.
Next steps to build real-time APIs with Tinybird
Creating tables is just the first step. The real value comes from querying that data and exposing results to your application through APIs.
Tinybird turns SQL queries into hosted REST APIs automatically. You write a SQL query in a .pipe
file, define parameters for filtering or pagination, and deploy it as an endpoint. Your application then calls the API with HTTP requests instead of connecting directly to ClickHouse.
This architecture simplifies your backend code because you don't need to manage database connections, connection pools, or query optimization. Tinybird handles authentication, rate limiting, caching, and scaling so your APIs stay fast even under high concurrency.
To get started, sign up for a free Tinybird account and follow the quickstart guide to create your first data source and API endpoint. The entire process takes about 15 minutes from signup to your first working API.
FAQs about creating tables in ClickHouse
What happens if I omit ORDER BY in ClickHouse?
ClickHouse requires an ORDER BY
clause for MergeTree tables and returns an error if you omit it. The clause determines how data is physically sorted on disk, which affects both query performance and storage efficiency.
How do I pick between ReplacingMergeTree and SummingMergeTree engines?
Use ReplacingMergeTree when you need to deduplicate rows with the same primary key, keeping only the most recent version. Use SummingMergeTree when you want to automatically sum numeric columns during background merges, which works well for pre-aggregated metrics.
Can I change the table engine after the table is created?
No, you cannot change the engine of an existing table. To switch engines, create a new table with the desired engine and use INSERT INTO new_table SELECT * FROM old_table
to copy the data.
How do I set TTL to automatically expire old partitions?
Add a TTL
clause to your CREATE TABLE
statement with a Date or DateTime column and an interval expression. For example, TTL timestamp + INTERVAL 30 DAY
deletes data older than 30 days automatically during background merges.
Is there a way to create the same table across multiple ClickHouse clusters automatically?
Yes, use the ON CLUSTER
clause in your CREATE TABLE
statement. This creates identical tables across all nodes in a named cluster configuration, which is useful for distributed or replicated setups./