Schema migrations in ClickHouse change your table structure. When your iterating a ClickHouse schema in production, things can get dicey. ClickHouse itself doesn't have built-in handling for live schema migrations, so to avoid data loss in production you need to plan and choose your tools wisely.
Let's dig into why ClickHouse migrations can corrupt or lose data, how you can employ different tools and strategies, and the specific steps to evolve your schema safely in production without downtime.
What is a ClickHouse schema migration?
A schema migration changes the structure of your database tables while they're running in production. In ClickHouse, you might add columns, change data types, modify table settings, or create indexes. Unlike traditional databases such as PostgreSQL or MySQL, ClickHouse doesn't include built-in migration tools, so you'll use external tools or write custom scripts to manage schema changes.
Schema migrations work differently in ClickHouse because of its columnar storage architecture. The database is optimized for analytical workloads with high data throughput, not transactional updates. What looks like a simple column addition in a row-oriented database can trigger a full table rewrite in ClickHouse, especially when you're streaming hundreds of megabytes per second into your tables.
Why migrations can cause data loss in ClickHouse
Data loss happens when write operations conflict with ongoing DDL commands. ClickHouse processes ALTER TABLE
statements as mutations that modify data parts on disk, and if a mutation fails partway through, you can end up with inconsistent table states or corrupted data as mutations cannot be rolled back once started.
The risk gets worse with distributed tables that span multiple nodes. When a schema change runs across a cluster, network failures or memory constraints on individual nodes can cause some replicas to succeed while others fail. ALTER
queries coordinate via ZooKeeper and wait for all replicas, but interruptions can cause inconsistent state where different nodes have different schemas.
Active data ingestion makes this worse. If your application keeps writing data while a mutation runs, ClickHouse might reject those writes, queue them indefinitely, or in the worst case, write them to the wrong schema version. Mutations apply only to existing data and new inserts are not affected.
Common causes of data loss:
- Active writes during mutations: New data arrives while the schema is changing
- Distributed table coordination failures: Network issues cause replicas to diverge
- Memory exhaustion: Large alterations exceed available RAM and crash nodes
How ClickHouse handles DDL under the hood
ClickHouse stores data in immutable parts on disk. Each part contains a subset of rows organized by columns. When you run a DDL statement, ClickHouse doesn't modify existing parts directly. Instead, it creates a mutation task that processes each part separately, writing new versions with the updated schema.
This part-based architecture means some schema changes only update metadata without touching data files. Adding a nullable column with a default value just updates the table's metadata and doesn't rewrite any parts. Other changes, like converting a column from String
to Int64
, require reading every part, transforming the data, and writing new parts to disk.
The difference between metadata changes and data mutations determines both speed and risk. Metadata-only changes complete in milliseconds. Mutations that rewrite data can take hours or days on multi-terabyte tables.
Three types of DDL operations:
- Metadata changes: Adding nullable columns or modifying table settings
- Data mutations: Changing column types or adding non-nullable columns
- Full rewrites: Major structural changes that require creating new tables
Declarative vs imperative migrations for ClickHouse
Imperative migrations use explicit SQL commands executed in a specific sequence. You write ALTER TABLE
statements, track which migrations have run, and manually roll back changes if something fails. You have complete control over each step, but must carefully plan for moving from one version to the next.
Declarative migrations work differently. You define the desired end state of your schema in a configuration file, and an external tool calculates the necessary changes to reach that state. The tool handles version tracking, generates migration plans, creates and destroys auxiliary tables, performs backfills and often provides automatic rollback.
Declarative approach with Atlas
Atlas is an open-source schema-as-code tool that supports ClickHouse. You define your tables in HCL or SQL format, and Atlas compares your desired schema against the current database state to generate migration files. Atlas can preview changes before applying them and maintains a migration history for rollbacks.
Declarative approach with Tinybird
Users of Tinybird's managed ClickHouse platform can also perform declarative schema migrations on ClickHouse tables. As the schema is defined in a plaintext .datasource
file, a Tinybird user simply needs to define a new schema in that file and deploy it. Tinybird handles all of the underlying complexity in the background (more below).
Imperative SQL scripts
Writing SQL scripts directly gives you the most flexibility for complex migrations. You create numbered migration files like 001_add_user_column.sql
and 002_create_events_table.sql
, then execute them in order using a bash script or clickhouse-client
. This works well for small teams or straightforward schema changes but becomes harder to manage as your schema grows.
Tool comparison: Atlas, Tinybird, Flyway, and plain SQL
Different migration tools offer varying levels of automation. The right choice depends on your team's ClickHouse experience, deployment frequency, and how much control you want over the migration process.
1. Atlas
Atlas provides declarative schema management with automatic migration planning. It calculates the difference between your current schema and desired state, generates migration files, and tracks which migrations have been applied. Atlas works well for teams that want infrastructure-as-code workflows and need support for multiple database types beyond ClickHouse.
2. Tinybird
Tinybird is a managed ClickHouse service that handles schema migrations automatically as part of its deployment process. When you update a data source definition and run tb --cloud deploy
, Tinybird calculates the necessary schema changes and applies them in production. This is a complex process under the hood that must handle intermediate states under live ingestion load, properly routing data from old tables to auxiliary tables to new, managing route timing, and backfilling the final table as needed. Tinybird automatically manages the entire schema migration process, which removes manual migration tasks and step-by-step changes.
3. Flyway
Flyway is a mature migration tool with broad database support, including ClickHouse. It uses versioned SQL files and tracks migration state in a special table. Flyway's main advantage is its widespread adoption and extensive documentation, though it requires more manual configuration than declarative tools.
4. Plain SQL with bash scripts
Custom bash scripts that execute SQL files using clickhouse-client
provide more control with more risk. You maintain migration files in version control, write your own state tracking logic, and handle rollbacks manually. This approach has no dependencies but requires significant upfront work to build reliable tooling.
Tool | Automation | Rollbacks | Learning curve | Best for |
---|---|---|---|---|
Atlas | High | Automatic | Medium | Infrastructure-as-code teams |
Tinybird | Very High | Automatic | Low | Developers building applications |
Flyway | Medium | Manual | Low | Teams using multiple databases |
Plain SQL | Low | Custom | High | ClickHouse experts |
Five steps to run zero-downtime migrations
Zero-downtime migrations let you change your schema without interrupting data ingestion or query traffic. This approach takes longer than direct alterations but eliminates the risk of rejected writes or query failures during the migration.
1. Audit existing DDL in version control
Before making any changes, capture your current schema state in SQL files and commit them to version control. Run SHOW CREATE TABLE
for each table you plan to modify and save the output. This baseline lets you track what changed and provides a rollback point if the migration fails.
2. Create auxiliary or "shadow" tables
Build a new table with your desired schema alongside the existing production table. Name it something like events_new
or events_v2
so it's clear this is a migration target. Shadow tables let you test the new schema with real data before switching traffic.
3. Dual-write with materialized views
Set up a materialized view that copies incoming data from your original table to the shadow table. This keeps both tables in sync during the migration without modifying your application code. The materialized view automatically transforms data to match the new schema as it arrives.
CREATE MATERIALIZED VIEW events_to_events_new
TO events_new
AS SELECT * FROM events;
4. Backfill historical data
Copy existing data from the original table to the shadow table using INSERT INTO ... SELECT
. For large tables, do this in batches based on date ranges or primary key values to avoid overwhelming your cluster. Monitor query performance during backfill to make sure it doesn't impact production traffic.
INSERT INTO events_new
SELECT * FROM events
WHERE timestamp >= '2024-01-01' AND timestamp < '2024-02-01';
5. Atomic swap and cleanup
Once the shadow table contains all historical data and the materialized view is keeping it current, rename the tables in a single transaction. ClickHouse's EXCHANGE TABLES
command swaps table names atomically, so queries never see an intermediate state. After the swap, drop the old table and materialized view.
EXCHANGE TABLES events AND events_new;
DROP TABLE events_new;
DROP VIEW events_to_events_new;
CI/CD workflow with versioned migrations
Integrating schema migrations into your deployment pipeline prevents manual errors and makes sure changes are tested before reaching production. A good CI/CD workflow validates migrations in development, tests them against production-like data, and provides automatic rollback if issues arise.
1. Validate in pull requests
Run migration tools in dry-run mode during code review to preview the SQL commands that will execute. Tools like Atlas can generate a migration plan and show you exactly which ALTER TABLE
statements will run. This lets reviewers catch potential issues before merge.
2. Test in staging environment
Apply migrations to a staging cluster that mirrors production data volume and query patterns. Run your application's test suite against the staging environment to verify that schema changes don't break existing queries or data pipelines. Load testing during this phase catches performance regressions before they affect users.
3. Deploy with monitoring and rollback triggers
When deploying to production, monitor key metrics like query latency, error rates, and ingestion throughput. Set up automatic rollback triggers that revert the migration if these metrics cross predefined thresholds. This safety net catches issues that didn't appear in staging due to differences in traffic patterns or data distribution.
Use Tinybird for automatic schema migrations in prod
Tinybird handles schema migrations automatically through its data source versioning system. When you modify a .datasource
file and run tb deploy
, Tinybird analyzes the schema change and determines whether it can apply the change in place or needs to create a new table version.
For backward-compatible changes like adding nullable columns, Tinybird updates the schema without interrupting data ingestion. For breaking changes, Tinybird creates a new table version, migrates existing data, and updates all dependent pipes to use the new version. This happens transparently without manual DDL scripts or migration tools.
The Tinybird CLI includes a local development environment where you can test schema changes against sample data before deploying to production. Run tb local start
to spin up a local ClickHouse instance, make your schema changes, and verify that queries still work as expected. When you're ready, tb --cloud deploy
pushes changes to your production workspace with automatic validation and rollback capabilities.
To get started with Tinybird's managed ClickHouse so you can build without worrying about complex schema changes, sign up for a free account. You can go from zero to a production API backed by ClickHouse in under an hour, without writing any migration scripts or managing cluster infrastructure.
Frequently asked questions about ClickHouse schema migrations
What ClickHouse versions support online ALTER
operations?
Online ALTER TABLE
operations became stable in ClickHouse version 20.3 and have improved significantly in releases after 21.8. Recent versions handle mutations more efficiently and provide better progress tracking, but the fundamental behavior of part-based rewrites remains the same across versions.
Can I rename a column without rewriting data in ClickHouse?
Column renaming requires a full table rewrite because ClickHouse stores each column in separate files named after the column. A better approach is to add a new column with the desired name, copy data from the old column using an UPDATE
or materialized view, then drop the old column once the migration is complete.
How do I estimate migration time on large ClickHouse tables?
Migration duration depends more on data size in bytes and disk I/O speed than row count. A rough estimate is to divide your table's compressed size by your disk's sequential write speed. A 1TB table on NVMe SSDs writing at 2GB/s takes around 8-10 minutes, but add overhead for reading the original data and CPU time for any data transformations.
Is it safer to create a new table and INSERT SELECT
for major changes?
Yes, the shadow table approach with CREATE TABLE
and INSERT INTO ... SELECT
gives you better control and easier rollback than in-place ALTER TABLE
statements. You can validate the new table's data before switching traffic, and if something goes wrong, your original table remains untouched.
What happens if a ClickHouse migration fails halfway through completion?
Failed mutations leave your table in a consistent state because ClickHouse processes each part independently. The parts that completed the mutation contain the new schema, while unprocessed parts keep the old schema. You can retry the mutation, and ClickHouse will skip parts that already completed. However, you may need to manually clean up temporary files in the detached
directory if the failure was due to disk space or memory issues.