Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Creating a table in ClickHouse® requires three components: column definitions with data types, a table engine (usually MergeTree), and an ORDER BY clause that determines how data is physically sorted on disk. This combination controls both how your data is stored and how efficiently ClickHouse® can query it.

This guide walks through the complete CREATE TABLE syntax with working examples, explains how to choose the right engine and sorting keys, and shows how to deploy the same table structure using Tinybird's managed ClickHouse® platform.

What you need before creating your first ClickHouse table

To create a table in ClickHouse, you use the CREATE TABLE statement with column definitions, data types, and a table engine. The MergeTree engine is the most common choice for general-purpose analytics tables because it handles large datasets efficiently and supports features like primary keys and partitioning.

Before writing your first CREATE TABLE statement, you'll need access to a ClickHouse instance, either through a local installation, a cloud service, or a managed platform like Tinybird. You'll also need a database where the table will live, which you can create with CREATE DATABASE database_name if one doesn't already exist.

Minimal create table statement that always works

The simplest working CREATE TABLE statement includes three required components: column definitions with types, a table engine, and an ORDER BY clause. Here's a minimal example that creates an events table:

CREATE TABLE events (
    event_id UInt64,
    timestamp DateTime,
    user_id String
)
ENGINE = MergeTree
ORDER BY event_id;

This statement creates a table named events with three columns, uses the MergeTree engine for storage, and orders data by event_id. You can copy and paste this into any ClickHouse client or console to create your first table.

1. Define columns with types

Each column in a ClickHouse table requires a name and a data type. Common types include String for text, UInt32 or UInt64 for unsigned integers, DateTime for timestamps, and Float64 for decimal numbers.

Here's how different column types look in practice:

CREATE TABLE user_events (
    user_id UInt32,
    email String,
    signup_date DateTime,
    revenue Float64,
    is_active UInt8
)
ENGINE = MergeTree
ORDER BY user_id;

The UInt8 type works well for boolean flags because it stores values 0 or 1 efficiently. For larger integer values, UInt32 handles numbers up to about 4 billion, while UInt64 supports even larger ranges.

2. Pick an engine

The table engine determines how data is stored and queried. MergeTree is the default choice for most analytical workloads because it provides fast queries, efficient compression, and support for primary keys and partitioning.

Other engines serve specific purposes:

MergeTree: General-purpose analytics with fast queries and efficient compression.
ReplacingMergeTree: Automatic deduplication based on primary key.
SummingMergeTree: Automatic aggregation of numeric columns during background merges.
AggregatingMergeTree: Pre-aggregated metrics storage.

For most use cases, starting with MergeTree is the right choice. You can always migrate to a specialized engine later if your requirements change.

3. Add order by

The ORDER BY clause is required for MergeTree tables and determines how data is physically sorted on disk. This sorting directly affects query performance because ClickHouse can skip reading irrelevant data blocks when filtering on the ordered columns.

CREATE TABLE page_views (
    user_id UInt32,
    page_url String,
    view_time DateTime
)
ENGINE = MergeTree
ORDER BY (user_id, view_time);

In this example, data is sorted first by user_id, then by view_time within each user. Queries that filter by user_id or both user_id and view_time will run faster than queries filtering only by page_url.

Choose a table engine and why MergeTree is the default

MergeTree serves as the foundation for most ClickHouse tables because it balances performance, flexibility, and features. The engine stores data in sorted parts that are periodically merged in the background, which enables both fast inserts and efficient queries.

Specialized MergeTree variants extend the base engine with additional capabilities. ReplacingMergeTree removes duplicate rows with the same primary key during background merges, which helps maintain unique records without requiring explicit deduplication queries. SummingMergeTree automatically sums numeric columns for rows with identical primary keys, reducing storage and speeding up aggregation queries.

Engine	Best for	Key behavior
MergeTree	General analytics, time-series data	Sorted storage with efficient queries
ReplacingMergeTree	Event streams with duplicate records	Keeps latest version of duplicate rows
SummingMergeTree	Pre-aggregated metrics	Automatically sums numeric columns
AggregatingMergeTree	Complex pre-aggregations	Stores partial aggregate states

For new projects, start with MergeTree unless you have a specific need for deduplication or aggregation. The base engine handles most analytical workloads without additional complexity.

Define order by, primary key, and partition by correctly

The ORDER BY, PRIMARY KEY, and PARTITION BY clauses control how ClickHouse organizes data physically on disk. Understanding the relationship between these three concepts helps you design tables that query efficiently at scale.

The ORDER BY clause determines the physical sort order of data within each data part. ClickHouse uses this ordering to skip reading irrelevant blocks during queries, which significantly speeds up filtering and aggregation. By default, the primary key matches the ORDER BY expression unless you specify a different PRIMARY KEY explicitly.

CREATE TABLE events (
    user_id UInt32,
    event_type String,
    timestamp DateTime,
    data String
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, timestamp)
PRIMARY KEY user_id;

In this example, data is partitioned by month, sorted by both user_id and timestamp, but the primary key only includes user_id. This configuration works well when most queries filter by user_id, since the primary key index will be smaller and faster to scan.

Common partition keys for time-series data

Partitioning splits your table into separate physical directories on disk based on the partition key expression. This makes it efficient to drop old data or query recent time ranges without scanning the entire table.

For time-series data, common partition expressions include:

toYYYYMM(timestamp) for monthly partitions
toDate(timestamp) for daily partitions
toStartOfWeek(timestamp) for weekly partitions
toYYYYMMDD(timestamp) for daily partitions with a different format

Monthly partitions balance query performance with the overhead of managing partition files. Daily partitions work better for high-volume tables where you frequently drop old data or query recent days.

Primary key and order by best practices

The primary key determines which columns are indexed for fast lookups, while ORDER BY controls the physical sort order. In most cases, both clauses match because the sorted order naturally provides an efficient index.

You might use a different primary key when your ORDER BY includes many columns but most queries only filter on the first few. A shorter primary key means a smaller index that fits in memory more easily, which speeds up query planning.

CREATE TABLE logs (
    server_id UInt16,
    timestamp DateTime64(3),
    level String,
    message String
)
ENGINE = MergeTree
ORDER BY (server_id, timestamp, level)
PRIMARY KEY server_id;

Choose low-cardinality columns first in your ORDER BY expression. Columns like server_id or user_id that have hundreds or thousands of distinct values work better than high-cardinality columns like timestamp or message.

Variants like if not exists, create table as, and on cluster

The CREATE TABLE IF NOT EXISTS syntax prevents errors when running the same DDL statement multiple times. This makes deployment scripts idempotent, so you can safely re-run them without checking whether tables already exist.

CREATE TABLE IF NOT EXISTS metrics (
    metric_name String,
    value Float64,
    recorded_at DateTime
)
ENGINE = MergeTree
ORDER BY (metric_name, recorded_at);

The CREATE TABLE AS syntax copies the structure and optionally the data from another table or query result. For copying structure only, use CREATE TABLE events_backup AS events. To copy both structure and data, add a SELECT statement like CREATE TABLE events_2024 AS SELECT * FROM events WHERE toYear(timestamp) = 2024.

For distributed ClickHouse clusters, the ON CLUSTER clause creates the same table definition across all nodes in a named cluster:

CREATE TABLE events ON CLUSTER production_cluster (
    event_id UInt64,
    timestamp DateTime
)
ENGINE = MergeTree
ORDER BY event_id;

Insert sample data and run a sanity query

After creating a table, inserting sample data verifies that the table structure works as expected. The INSERT INTO statement adds rows using either the VALUES clause for small datasets or the FORMAT clause for bulk loading.

INSERT INTO events (event_id, timestamp, user_id)
VALUES 
    (1, '2024-01-15 10:30:00', 'user_123'),
    (2, '2024-01-15 10:31:00', 'user_456'),
    (3, '2024-01-15 10:32:00', 'user_789');

This inserts three rows into the events table. For larger datasets, the FORMAT clause supports formats like CSV, JSON, or Parquet for efficient bulk loading.

Insert values syntax

The VALUES clause works well for inserting a few rows manually or in application code. Each row appears in parentheses with values matching the column order:

INSERT INTO user_events (user_id, email, signup_date)
VALUES 
    (1, 'alice@example.com', '2024-01-01'),
    (2, 'bob@example.com', '2024-01-02');

For JSON data, the FORMAT JSONEachRow clause reads newline-delimited JSON where each line is a separate object:

INSERT INTO events FORMAT JSONEachRow
{"event_id": 1, "timestamp": "2024-01-15 10:30:00", "user_id": "user_123"}
{"event_id": 2, "timestamp": "2024-01-15 10:31:00", "user_id": "user_456"}

Query and verify row count

Running a simple SELECT query confirms that your data inserted correctly. Start with COUNT(*) to check the total number of rows, then verify a few actual rows to confirm the data looks correct:

SELECT COUNT(*) FROM events;
SELECT * FROM events ORDER BY event_id LIMIT 3;

Modify the table later with alter table

The ALTER TABLE statement changes table structure after creation. This lets you add columns, drop columns, or modify settings without recreating the entire table and reloading all data.

ALTER TABLE events ADD COLUMN session_id String AFTER user_id;

This adds a new session_id column positioned after the existing user_id column. The AFTER clause controls placement, though column order typically doesn't affect query performance in ClickHouse.

Add or remove columns safely

Adding columns is generally safe and fast because ClickHouse stores columns separately on disk. New columns don't require rewriting existing data, so the operation completes almost instantly even on large tables.

ALTER TABLE events ADD COLUMN country String DEFAULT 'US';
ALTER TABLE events DROP COLUMN session_id;

Dropping columns also happens quickly because ClickHouse simply marks the column as deleted without immediately removing the physical files. The actual disk space is reclaimed during the next background merge.

Create a temporary table for ephemeral analysis

Temporary tables exist only for the duration of your session and automatically disappear when you disconnect. This makes them useful for intermediate calculations, testing, or one-time analysis without cluttering your database with permanent tables.

CREATE TEMPORARY TABLE temp_aggregates (
    user_id UInt32,
    total_events UInt64
)
ENGINE = Memory;

INSERT INTO temp_aggregates
SELECT user_id, COUNT(*) as total_events
FROM events
GROUP BY user_id;

The Memory engine stores data entirely in RAM, which provides very fast reads and writes at the cost of durability.

From DDL to production: the same table in Tinybird

Tinybird is a managed ClickHouse service that handles infrastructure setup, scaling, and operations so you can focus on building features instead of managing clusters. Creating tables in Tinybird uses the same ClickHouse syntax you've learned, but with additional tooling for version control and deployment.

Instead of running CREATE TABLE statements directly, Tinybird uses declarative .datasource files that define your table schema as code. This approach makes it easy to track changes in Git, test locally, and deploy to production with confidence.

Write the .datasource file

A .datasource file defines your table structure using the same column types and engine options as raw ClickHouse DDL. Here's how the events table looks in Tinybird syntax:

SCHEMA >
    `event_id` UInt64 `json:$.event_id`,
    `timestamp` DateTime `json:$.timestamp`,
    `user_id` String `json:$.user_id`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "event_id"

The SCHEMA section lists columns and types, with inline JSON path mappings using the json:$.field_name syntax to map JSON fields from ingestion APIs to each column. The ENGINE and ENGINE_SORTING_KEY map directly to the ENGINE and ORDER BY clauses in standard ClickHouse DDL. Save this as events.datasource in your project directory.

Deploy and stream data

The Tinybird CLI deploys your data source definitions to the cloud and provides commands for appending data. First, install the CLI and authenticate:

curl -L tinybird.co | sh
tb login

Then deploy your project to create the table:

tb --cloud deploy

This creates the ClickHouse table in your Tinybird workspace. To append data, use the tb datasource append command with a file containing sample data. For production workloads, Tinybird's Events API provides a streaming ingestion endpoint that accepts NDJSON over HTTP, letting your application send events in real-time without managing Kafka, batch jobs, or other ingestion infrastructure.

Next steps to build real-time APIs with Tinybird

Creating tables is just the first step. The real value comes from querying that data and exposing results to your application through APIs.

Tinybird turns SQL queries into hosted REST APIs automatically. You write a SQL query in a .pipe file, define parameters for filtering or pagination, and deploy it as an endpoint. Your application then calls the API with HTTP requests instead of connecting directly to ClickHouse.

This architecture simplifies your backend code because you don't need to manage database connections, connection pools, or query optimization. Tinybird handles authentication, rate limiting, caching, and scaling so your APIs stay fast even under high concurrency.

To get started, sign up for a free Tinybird account and follow the quickstart guide to create your first data source and API endpoint. The entire process takes about 15 minutes from signup to your first working API.

FAQs about creating tables in ClickHouse

What happens if I omit ORDER BY in ClickHouse?

ClickHouse requires an ORDER BY clause for MergeTree tables and returns an error if you omit it. The clause determines how data is physically sorted on disk, which affects both query performance and storage efficiency.

How do I pick between ReplacingMergeTree and SummingMergeTree engines?

Use ReplacingMergeTree when you need to deduplicate rows with the same primary key, keeping only the most recent version. Use SummingMergeTree when you want to automatically sum numeric columns during background merges, which works well for pre-aggregated metrics.

Can I change the table engine after the table is created?

No, you cannot change the engine of an existing table. To switch engines, create a new table with the desired engine and use INSERT INTO new_table SELECT * FROM old_table to copy the data.

How do I set TTL to automatically expire old partitions?

Add a TTL clause to your CREATE TABLE statement with a Date or DateTime column and an interval expression. For example, TTL timestamp + INTERVAL 30 DAY deletes data older than 30 days automatically during background merges.

Is there a way to create the same table across multiple ClickHouse clusters automatically?

Yes, use the ON CLUSTER clause in your CREATE TABLE statement. This creates identical tables across all nodes in a named cluster configuration, which is useful for distributed or replicated setups./

Skip the infra work. Deploy your first ClickHouse^®
project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® CREATE TABLE example: Follow these steps

What you need before creating your first ClickHouse table

Minimal create table statement that always works

1. Define columns with types

2. Pick an engine

3. Add order by

Choose a table engine and why MergeTree is the default

Define order by, primary key, and partition by correctly

Common partition keys for time-series data

Primary key and order by best practices

Variants like if not exists, create table as, and on cluster

Insert sample data and run a sanity query

Insert values syntax

Query and verify row count

Modify the table later with alter table

Add or remove columns safely

Create a temporary table for ephemeral analysis

From DDL to production: the same table in Tinybird

Write the .datasource file

Deploy and stream data

Next steps to build real-time APIs with Tinybird

FAQs about creating tables in ClickHouse

What happens if I omit ORDER BY in ClickHouse?

How do I pick between ReplacingMergeTree and SummingMergeTree engines?

Can I change the table engine after the table is created?

How do I set TTL to automatically expire old partitions?

Is there a way to create the same table across multiple ClickHouse clusters automatically?

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse® project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

ClickHouse® CREATE TABLE example: Follow these steps

What you need before creating your first ClickHouse table

Minimal create table statement that always works

1. Define columns with types

2. Pick an engine

3. Add order by

Choose a table engine and why MergeTree is the default

Define order by, primary key, and partition by correctly

Common partition keys for time-series data

Primary key and order by best practices

Variants like if not exists, create table as, and on cluster

Insert sample data and run a sanity query

Insert values syntax

Query and verify row count

Modify the table later with alter table

Add or remove columns safely

Create a temporary table for ephemeral analysis

From DDL to production: the same table in Tinybird

Write the .datasource file

Deploy and stream data

Next steps to build real-time APIs with Tinybird

FAQs about creating tables in ClickHouse

What happens if I omit ORDER BY in ClickHouse?

How do I pick between ReplacingMergeTree and SummingMergeTree engines?

Can I change the table engine after the table is created?

How do I set TTL to automatically expire old partitions?

Is there a way to create the same table across multiple ClickHouse clusters automatically?

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

ClickHouse^® CREATE TABLE example: Follow these steps

Skip the infra work. Deploy your first ClickHouse^®
project now.