Schema migrations in ClickHouse® change your table structure. When your iterating a ClickHouse® schema in production, things can get dicey. ClickHouse® itself doesn't have built-in handling for live schema migrations, so to avoid data loss in production you need to plan and choose your tools wisely.
Let's dig into why ClickHouse migrations can corrupt or lose data, how you can employ different tools and strategies, and the specific steps to evolve your schema safely in production without downtime.
What is a ClickHouse schema migration?
A schema migration changes the structure of your database tables while they're running in production. In ClickHouse, you might add columns, change data types, modify table settings, or create indexes. Unlike traditional databases such as PostgreSQL or MySQL, ClickHouse doesn't include built-in migration tools, so you'll use external tools or write custom scripts to manage schema changes.
Schema migrations work differently in ClickHouse because of its columnar storage architecture. The database is optimized for analytical workloads with high data throughput, not transactional updates. What looks like a simple column addition in a row-oriented database can trigger a full table rewrite in ClickHouse, especially when you're streaming hundreds of megabytes per second into your tables.
Why migrations can cause data loss in ClickHouse
Data loss happens when write operations conflict with ongoing DDL commands. ClickHouse processes ALTER TABLE statements as mutations that modify data parts on disk, and if a mutation fails partway through, you can end up with inconsistent table states or corrupted data as mutations cannot be rolled back once started.
The risk gets worse with distributed tables that span multiple nodes. When a schema change runs across a cluster, network failures or memory constraints on individual nodes can cause some replicas to succeed while others fail. ALTER queries coordinate via ZooKeeper and wait for all replicas, but interruptions can cause inconsistent state where different nodes have different schemas.
Active data ingestion makes this worse. If your application keeps writing data while a mutation runs, ClickHouse might reject those writes, queue them indefinitely, or in the worst case, write them to the wrong schema version. Mutations apply only to existing data and new inserts are not affected.
Common causes of data loss:
- Active writes during mutations: New data arrives while the schema is changing
- Distributed table coordination failures: Network issues cause replicas to diverge
- Memory exhaustion: Large alterations exceed available RAM and crash nodes
How ClickHouse handles DDL under the hood
ClickHouse stores data in immutable parts on disk. Each part contains a subset of rows organized by columns. When you run a DDL statement, ClickHouse doesn't modify existing parts directly. Instead, it creates a mutation task that processes each part separately, writing new versions with the updated schema.
This part-based architecture means some schema changes only update metadata without touching data files. Adding a nullable column with a default value just updates the table's metadata and doesn't rewrite any parts. Other changes, like converting a column from String to Int64, require reading every part, transforming the data, and writing new parts to disk.
The difference between metadata changes and data mutations determines both speed and risk. Metadata-only changes complete in milliseconds. Mutations that rewrite data can take hours or days on multi-terabyte tables.
Three types of DDL operations:
- Metadata changes: Adding nullable columns or modifying table settings
- Data mutations: Changing column types or adding non-nullable columns
- Full rewrites: Major structural changes that require creating new tables
Declarative vs imperative migrations for ClickHouse
Imperative migrations use explicit SQL commands executed in a specific sequence. You write ALTER TABLE statements, track which migrations have run, and manually roll back changes if something fails. You have complete control over each step, but must carefully plan for moving from one version to the next.
Declarative migrations work differently. You define the desired end state of your schema in a configuration file, and an external tool calculates the necessary changes to reach that state. The tool handles version tracking, generates migration plans, creates and destroys auxiliary tables, performs backfills and often provides automatic rollback.
Declarative approach with Atlas
Atlas is an open-source schema-as-code tool that supports ClickHouse. You define your tables in HCL or SQL format, and Atlas compares your desired schema against the current database state to generate migration files. Atlas can preview changes before applying them and maintains a migration history for rollbacks.
Declarative approach with Tinybird
Users of Tinybird's managed ClickHouse platform can also perform declarative schema migrations on ClickHouse tables. As the schema is defined in a plaintext .datasource file, a Tinybird user simply needs to define a new schema in that file and deploy it. Tinybird handles all of the underlying complexity in the background (more below).
Imperative SQL scripts
Writing SQL scripts directly gives you the most flexibility for complex migrations. You create numbered migration files like 001_add_user_column.sql and 002_create_events_table.sql, then execute them in order using a bash script or clickhouse-client. This works well for small teams or straightforward schema changes but becomes harder to manage as your schema grows.
A broader view of ClickHouse schema migration tooling
ClickHouse teams don’t all manage schema changes in the same way. Some rely on a simple folder of versioned SQL files, others define a desired end state in configuration and let a tool compute the diff. Some need approvals, audit trails, or environment-specific configs. All these approaches are valid as long as they respect how ClickHouse executes DDL and maintains replica consistency.At the core is one principle: treat the schema as code. Define structures in version control, apply them through a controlled promotion path, and record exactly what happened in production. Because ClickHouse lacks built-in migration orchestration, this model is the safest way to evolve tables at scale without breaking ingestion or queries.
General-purpose migration CLIs that support ClickHouse
Several CLI-based migration frameworks now offer native or community drivers for ClickHouse. They follow the classic pattern: a directory with numbered migration scripts, a system table for migration state, and a command that applies anything pending.golang-migrate is a compact tool used heavily in containerized CI/CD pipelines. It works well for metadata-only ALTER operations like adding nullable columns or creating new tables. It’s simple, deterministic, and integrates easily with existing automation. For teams that just need a repeatable runner for ClickHouse DDL, it’s a solid fit.Liquibase takes a declarative YAML/XML approach, expressing changesets as configuration rather than raw SQL, and is widely used across relational databases like PostgreSQL.Each change can be tagged, rolled back, or applied selectively across environments. It helps large organizations keep multiple ClickHouse clusters in sync. Because ClickHouse can turn certain ALTER statements into long-running mutations, Liquibase users typically review changes manually that may trigger rewrites.dbmate, Alembic, and similar small tools focus on Git-first workflows: simple SQL migrations run in sequence, tracked in code. They don’t attempt to interpret ClickHouse behavior but are excellent when the team already understands part mutations, replication, and non-rollbackable DDL.Bytebase-style platforms introduce a governance layer. They track who applied what, when, and in which environment, with optional approval flows. For teams under compliance or audit pressure, this gives full visibility and accountability.Every one of these solutions shares the same objective: make schema changes repeatable, visible, and verifiable. With ClickHouse’s immutable-part architecture, that consistency is critical, since a failed mutation on one replica can leave a mixed schema state if not replayed in order.
ClickHouse-focused utilities
Beyond generic tools, a new wave of utilities is built specifically for ClickHouse’s columnar design and replicated architecture. They recognize the need to coordinate schema changes across shards, manage MergeTree engines, and rebuild Materialized Views when target tables evolve.These utilities often support environment-specific configuration, allowing a single logical change to be applied across dev, staging, and prod with cluster-aware parameters. They also make it easier to rerun migrations safely if replicas were unavailable the first time.Their advantage lies in native awareness of ClickHouse internals: replicated tables, mutation order, dependency handling, and view recreation. They’re explicit about ALTER sequencing, ensuring all nodes converge to the same schema version and no replica drifts.
Choosing the right level of abstraction
You can frame schema migration strategies around three main axes.1. Automation level
If you prefer to define a final schema and let a tool compute diffs automatically, go declarative. Tools like schema-as-code frameworks inspect the live database, compare it with your desired state, and generate a precise plan. If you need full control of each step — especially for massive tables where ALTER implies terabytes of rewrite — imperative migrations remain the safest path.2. Number of environments
Single-cluster setups can thrive with plain SQL runners. Multi-region or multi-stage pipelines benefit from tools that track history in a metadata table and support manual marking of applied migrations. This makes cross-environment synchronization predictable.3. Integration with existing pipelines
If your CI/CD already uses containerized jobs, adding a CLI like golang-migrate or dbmate fits naturally. If you’re managing infrastructure with IaC frameworks, embedding your ClickHouse schema in that workflow maintains consistency between app, data, and infra.Across all these models, the golden rule remains: ClickHouse schema changes must be deliberate and versioned. Because mutations cannot roll back, planning and tracking every step is the only way to ensure replica safety and operational reliability.
Tool comparison: Atlas, Tinybird, Flyway, and plain SQL
Different migration tools offer varying levels of automation. The right choice depends on your team's ClickHouse experience, deployment frequency, and how much control you want over the migration process.
1. Atlas
Atlas provides declarative schema management with automatic migration planning. It calculates the difference between your current schema and desired state, generates migration files, and tracks which migrations have been applied. Atlas works well for teams that want infrastructure-as-code workflows and need support for multiple database types beyond ClickHouse.
2. Tinybird
Tinybird is a managed ClickHouse service that handles schema migrations automatically as part of its deployment process. When you update a data source definition and run tb --cloud deploy, Tinybird calculates the necessary schema changes and applies them in production. This is a complex process under the hood that must handle intermediate states under live ingestion load, properly routing data from old tables to auxiliary tables to new, managing route timing, and backfilling the final table as needed. Tinybird automatically manages the entire schema migration process, which removes manual migration tasks and step-by-step changes.
3. Flyway
Flyway is a mature migration tool with broad database support, including ClickHouse. It uses versioned SQL files and tracks migration state in a special table. Flyway's main advantage is its widespread adoption and extensive documentation, though it requires more manual configuration than declarative tools.
4. Plain SQL with bash scripts
Custom bash scripts that execute SQL files using clickhouse-client provide more control with more risk. You maintain migration files in version control, write your own state tracking logic, and handle rollbacks manually. This approach has no dependencies but requires significant upfront work to build reliable tooling.
| Tool | Automation | Rollbacks | Learning curve | Best for |
|---|---|---|---|---|
| Atlas | High | Automatic | Medium | Infrastructure-as-code teams |
| Tinybird | Very High | Automatic | Low | Developers building applications |
| Flyway | Medium | Manual | Low | Teams using multiple databases |
| Plain SQL | Low | Custom | High | ClickHouse experts |
Ingestion and Materialized View patterns for resilient schema evolution
Schema migrations are much simpler when data ingestion is decoupled from analytical storage. A proven production pattern in ClickHouse is to separate raw ingestion from structured tables, using a short-lived ingestion layer and Materialized Views that project data into optimized schemas. This architecture absorbs schema changes without blocking writes or losing events.
A flexible ingestion layer
The ingestion layer captures data exactly as produced by upstream systems — often in semi-structured JSON — such as web services or platforms like Apache web server introduction.Typical columns include event metadata (device, app, timestamp) and one JSON field containing the full payload. Tables are usually partitioned by month and ordered by timestamp or ID, allowing easy pruning and efficient merges.This layer provides major benefits:* It can accept new fields instantly, without altering the schema.
It provides a safe landing zone for malformed or experimental payloads.
It supports multiple downstream projections for analytics, metrics, or monitoring.
It allows re-ingestion or replay from object storage (S3, GCS, Parquet) when logic changes.By design, this table is transient. Teams often set TTL rules to control size and recompact data aggressively.
Materialized Views projecting into stable tables
Materialized Views read from the ingestion table and populate target tables with strict schemas. Within the view, developers can parse JSON, cast data types, assign defaults, and filter rows before insertion. This creates a clean boundary between unstructured input and structured analytics.When evolving a destination schema, the process is consistent:
Pause or drop the current Materialized View.
Run ALTER TABLE … ADD COLUMN on the target table, using nullable or default values to avoid full rewrites.
Recreate the view to include the new field.
Optionally backfill data using
INSERT INTO … SELECTfor specific partitions.Ingestion never stops.The pipeline continues to write raw events, and only the projection logic changes.
Partition-based backfill
For large datasets, partition-based backfills keep operations efficient. Because most ClickHouse tables are monthly-partitioned, teams can:* Drop or detach one partition at a time.
Re-read that month from the ingestion table or archived Parquet files.
Reapply transformations and insert the processed data.
Validate counts and metrics before proceeding.This incremental approach avoids cluster-wide rewrites and aligns with ClickHouse’s merge and compaction behavior.
Handling schema drift
In real pipelines, new fields appear unexpectedly. Devices, apps, or services evolve. With the ingestion-projection model, these variations are harmless:* Ingestion tables accept new fields without modification.
Teams inspect the data, decide if a new column is needed, then update the projection.
They reprocess only affected partitions, leaving the rest untouched.This enables incremental evolution. You see real data first, then commit the change when it’s justified.
Operational best practices
Materialized Views bring flexibility but require discipline:* Limit view count. Too many views on a fast ingestion table can cause “Too many parts” errors.
Coordinate DDL and view changes. Since ClickHouse DDL isn’t transactional, perform combined changes in controlled windows.
Maintain ordering. When re-inserting data, preserve partition order and sorting keys to sustain merge efficiency.
Respect TTL timelines. Don’t backfill after the raw data expires.Even with these caveats, the separation of ingestion and analytics provides enormous stability. You can upgrade schemas live, replay history, and deploy iterative model updates without interrupting ingestion.
Continuous schema evolution
Once this layered architecture is in place, declarative and operational processes align perfectly. The schema-as-code layer tracks versioned definitions, while the ingestion buffer provides a safety valve for continuous change. Together they enable:* Uninterrupted ingestion during DDL updates.
Controlled backfills and partial replays.
Consistent validation of every applied change.
Auditable, reproducible schema evolution across environments.By combining these principles, ClickHouse users can maintain high-velocity data pipelines, stable production schemas, and zero-downtime releases, even as data formats and upstream producers evolve daily.
Five steps to run zero-downtime migrations
Zero-downtime migrations let you change your schema without interrupting data ingestion or query traffic. This approach takes longer than direct alterations but eliminates the risk of rejected writes or query failures during the migration.
1. Audit existing DDL in version control
Before making any changes, capture your current schema state in SQL files and commit them to version control. Run SHOW CREATE TABLE for each table you plan to modify and save the output. This baseline lets you track what changed and provides a rollback point if the migration fails.
2. Create auxiliary or "shadow" tables
Build a new table with your desired schema alongside the existing production table. Name it something like events_new or events_v2 so it's clear this is a migration target. Shadow tables let you test the new schema with real data before switching traffic.
3. Dual-write with materialized views
Set up a materialized view that copies incoming data from your original table to the shadow table. This keeps both tables in sync during the migration without modifying your application code. The materialized view automatically transforms data to match the new schema as it arrives.
CREATE MATERIALIZED VIEW events_to_events_new
TO events_new
AS SELECT * FROM events;
4. Backfill historical data
Copy existing data from the original table to the shadow table using INSERT INTO ... SELECT. For large tables, do this in batches based on date ranges or primary key values to avoid overwhelming your cluster. Monitor query performance during backfill to make sure it doesn't impact production traffic.
INSERT INTO events_new
SELECT * FROM events
WHERE timestamp >= '2024-01-01' AND timestamp < '2024-02-01';
5. Atomic swap and cleanup
Once the shadow table contains all historical data and the materialized view is keeping it current, rename the tables in a single transaction. ClickHouse's EXCHANGE TABLES command swaps table names atomically, so queries never see an intermediate state. After the swap, drop the old table and materialized view.
EXCHANGE TABLES events AND events_new;
DROP TABLE events_new;
DROP VIEW events_to_events_new;
CI/CD workflow with versioned migrations
Integrating schema migrations into your deployment pipeline prevents manual errors and makes sure changes are tested before reaching production. A good CI/CD workflow validates migrations in development, tests them against production-like data, and provides automatic rollback if issues arise.
1. Validate in pull requests
Run migration tools in dry-run mode during code review to preview the SQL commands that will execute. Tools like Atlas can generate a migration plan and show you exactly which ALTER TABLE statements will run. This lets reviewers catch potential issues before merge.
2. Test in staging environment
Apply migrations to a staging cluster that mirrors production data volume and query patterns. Run your application's test suite against the staging environment to verify that schema changes don't break existing queries or data pipelines. Load testing during this phase catches performance regressions before they affect users.
3. Deploy with monitoring and rollback triggers
When deploying to production, monitor key metrics like query latency, error rates, and ingestion throughput. Set up automatic rollback triggers that revert the migration if these metrics cross predefined thresholds. This safety net catches issues that didn't appear in staging due to differences in traffic patterns or data distribution.
Use Tinybird for automatic schema migrations in prod
Tinybird handles schema migrations automatically through its data source versioning system. When you modify a .datasource file and run tb deploy, Tinybird analyzes the schema change and determines whether it can apply the change in place or needs to create a new table version.
For backward-compatible changes like adding nullable columns, Tinybird updates the schema without interrupting data ingestion. For breaking changes, Tinybird creates a new table version, migrates existing data, and updates all dependent pipes to use the new version. This happens transparently without manual DDL scripts or migration tools.
The Tinybird CLI includes a local development environment where you can test schema changes against sample data before deploying to production. Run tb local start to spin up a local ClickHouse instance, make your schema changes, and verify that queries still work as expected. When you're ready, tb --cloud deploy pushes changes to your production workspace with automatic validation and rollback capabilities.
To get started with Tinybird's managed ClickHouse so you can build without worrying about complex schema changes, sign up for a free account. You can go from zero to a production API backed by ClickHouse in under an hour, without writing any migration scripts or managing cluster infrastructure.
Frequently asked questions about ClickHouse schema migrations
What ClickHouse versions support online ALTER operations?
Online ALTER TABLE operations became stable in ClickHouse version 20.3 and have improved significantly in releases after 21.8. Recent versions handle mutations more efficiently and provide better progress tracking, but the fundamental behavior of part-based rewrites remains the same across versions.
Can I rename a column without rewriting data in ClickHouse?
Column renaming requires a full table rewrite because ClickHouse stores each column in separate files named after the column. A better approach is to add a new column with the desired name, copy data from the old column using an UPDATE or materialized view, then drop the old column once the migration is complete.
How do I estimate migration time on large ClickHouse tables?
Migration duration depends more on data size in bytes and disk I/O speed than row count. A rough estimate is to divide your table's compressed size by your disk's sequential write speed. A 1TB table on NVMe SSDs writing at 2GB/s takes around 8-10 minutes, but add overhead for reading the original data and CPU time for any data transformations.
Is it safer to create a new table and INSERT SELECT for major changes?
Yes, the shadow table approach with CREATE TABLE and INSERT INTO ... SELECT gives you better control and easier rollback than in-place ALTER TABLE statements. You can validate the new table's data before switching traffic, and if something goes wrong, your original table remains untouched.
What happens if a ClickHouse migration fails halfway through completion?
Failed mutations leave your table in a consistent state because ClickHouse processes each part independently. The parts that completed the mutation contain the new schema, while unprocessed parts keep the old schema. You can retry the mutation, and ClickHouse will skip parts that already completed. However, you may need to manually clean up temporary files in the detached directory if the failure was due to disk space or memory issues.
