When you insert updated records into ClickHouse, you often want to keep only the latest version of each row rather than accumulating duplicates. The ReplacingMergeTree table engine handles this automatically by removing duplicate rows during background merge operations, keeping either the most recently inserted row or the one with the highest version number.
This article explains how ReplacingMergeTree works, when to use it, and how to configure it correctly with working examples for common patterns like CDC ingestion, user state tracking, and streaming data deduplication.
What is a ReplacingMergeTree and when to use it
ReplacingMergeTree is a ClickHouse table engine that removes duplicate rows with the same sorting key during background merge operations. The engine keeps only one row from each group of duplicates, typically the row with the highest version number or the most recently inserted row.
You'll find this engine useful when maintaining current state tables where each record represents the latest version of an entity. For example, if you're tracking user profiles and receive updates throughout the day, ReplacingMergeTree automatically keeps only the newest profile data for each user. The same pattern works well for product catalogs, order statuses, or any scenario where you care about current state rather than full history.
The deduplication happens during merge operations, not when you insert data. This means you might see duplicate rows in query results until ClickHouse completes a merge in the background.
How row versioning and deduplication work
When ClickHouse merges data parts in a ReplacingMergeTree table, it looks for rows with identical values in all columns specified by the ORDER BY
clause. For each group of matching rows, the engine picks one to keep and discards the rest.
The engine decides which row to keep using one of two methods:
- Version column: If you specify a version column when creating the table, ClickHouse keeps the row with the highest value in that column
- Insertion order: Without a version column, the engine keeps the last inserted row based on when data arrived
The timing of merges depends on ClickHouse's internal scheduling and how you insert data. Smaller, more frequent inserts create more parts that need merging, while larger batch inserts create fewer parts.
Create table syntax with minimal and advanced parameters
The basic syntax for creating a ReplacingMergeTree table uses ENGINE = ReplacingMergeTree()
with an ORDER BY
clause that defines your deduplication key.
Minimal example
The simplest ReplacingMergeTree table uses only the ORDER BY
clause to define which columns identify unique rows:
CREATE TABLE user_activity (
user_id UInt64,
activity_type String,
event_time DateTime,
session_id String
) ENGINE = ReplacingMergeTree()
ORDER BY (user_id, activity_type);
This table deduplicates based on the combination of user_id
and activity_type
. When multiple rows have the same values for both columns, ClickHouse keeps the last inserted row during merges.
Adding a version column
For more control over which row to keep, you can specify a version column as a parameter to the engine:
CREATE TABLE user_activity (
user_id UInt64,
activity_type String,
event_time DateTime,
session_id String,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY (user_id, activity_type);
Now ClickHouse keeps the row with the highest version
value for each unique combination of user_id
and activity_type
. This works well when you have explicit version numbers from your source system, like CDC sequence numbers from PostgreSQL.
Using a sign column for deletes
For scenarios where you want to mark rows as deleted rather than physically removing them, ReplacingMergeTree supports an optional sign column:
CREATE TABLE user_activity (
user_id UInt64,
activity_type String,
event_time DateTime,
session_id String,
version UInt64,
sign Int8
) ENGINE = ReplacingMergeTree(version, sign)
ORDER BY (user_id, activity_type);
The sign column typically uses 1
for active rows and -1
for deleted rows. During merges, if the row with the highest version has sign = -1
, ClickHouse removes all rows for that key.
Choosing the right ORDER BY and version columns
The ORDER BY
clause in a ReplacingMergeTree table serves two purposes: it defines the physical sorting of data on disk and acts as the deduplication key. Choosing the right columns for this clause directly affects both query performance and deduplication behavior.
Single primary key
For tables where each row represents a unique entity, use a single column in your ORDER BY
clause:
CREATE TABLE user_profiles (
user_id UInt64,
email String,
name String,
updated_at DateTime,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY user_id;
This pattern works well for maintaining current state tables where each user, product, or order has exactly one active record at any time.
Composite key
When your business logic requires multiple columns to identify a unique row, use a composite key:
CREATE TABLE user_sessions (
user_id UInt64,
session_id String,
start_time DateTime,
end_time DateTime,
version UInt64
) ENGINE = ReplacingMergeTree(version)
ORDER BY (user_id, session_id);
The order of columns in the composite key matters for query performance. Place the most selective column first and the least selective last, following the same principles as any MergeTree table.
High-cardinality caution
Avoid using high-cardinality columns like timestamps or UUIDs in your ORDER BY
clause. A high-cardinality column is one with many unique values, like a timestamp that changes every second or a UUID that's different for every row. When you use columns like this in your ORDER BY
clause, you create too many unique combinations, which prevents effective deduplication and slows down merge performance.
Instead, use stable identifiers that truly represent your business key, and store high-cardinality values as regular columns.
Querying the latest row state without surprises
A regular SELECT
query against a ReplacingMergeTree table returns all rows, including duplicates, until ClickHouse completes background merge operations. You'll see duplicate rows in your results because ClickHouse stores data in multiple parts on disk, and deduplication only happens when those parts merge together.
SELECT FINAL
The FINAL
modifier forces ClickHouse to deduplicate rows at query time, guaranteeing you see only the latest version of each row:
SELECT
user_id,
activity_type,
event_time,
session_id
FROM user_activity
FINAL
WHERE user_id = 12345;
This query returns deduplicated results immediately, regardless of whether background merges have completed. However, FINAL
adds overhead because ClickHouse performs deduplication logic for every query execution. The performance impact grows with table size and the number of data parts, though proper partitioning can improve performance by 7×.
Materialized views without FINAL
A better approach for frequently accessed deduplicated data uses materialized views to pre-compute the latest state:
CREATE MATERIALIZED VIEW user_activity_latest
ENGINE = AggregatingMergeTree()
ORDER BY (user_id, activity_type)
AS SELECT
user_id,
activity_type,
argMaxState(event_time, version) AS latest_event_time,
argMaxState(session_id, version) AS latest_session_id
FROM user_activity
GROUP BY user_id, activity_type;
This materialized view maintains deduplicated state automatically as new data arrives. Queries against the view run fast because the deduplication work happens during insert time rather than query time.
When to run select final or optimize final
The decision to use FINAL
or trigger manual optimization depends on your latency requirements and query patterns.
Latency tradeoffs
SELECT FINAL
adds query overhead but guarantees consistency for every read. This approach makes sense for low-frequency queries where you need guaranteed deduplication, like administrative dashboards or compliance reports. For high-throughput query workloads, the overhead of FINAL
becomes too expensive.
You can accept eventual consistency and let background merges handle deduplication naturally. Most analytics use cases tolerate seeing duplicate rows for a few minutes until merges complete.
Automated nightly optimize
For batch processing scenarios where you need deduplicated data at specific times, schedule OPTIMIZE TABLE FINAL
to force merges:
OPTIMIZE TABLE user_activity FINAL;
This command merges all parts immediately and completes all pending deduplication. You might run this nightly before generating daily reports, or before exporting data to downstream systems. Keep in mind that OPTIMIZE TABLE FINAL
is resource-intensive and can consume significant CPU and I/O.
Performance tuning and common pitfalls
ReplacingMergeTree performance depends heavily on how you structure inserts and configure merge behavior.
Too many small parts
Frequent small inserts create many small data parts on disk. ClickHouse has to track and merge all of them, which adds overhead to both merge operations and queries. You can monitor part count with this query:
SELECT
table,
count() AS parts_count,
sum(rows) AS total_rows
FROM system.parts
WHERE active AND table = 'user_activity'
GROUP BY table;
If you see hundreds or thousands of active parts, batch your inserts into larger groups. Aim for inserts of at least 10,000 rows, and preferably 100,000 or more for high-volume tables. For real-world production insights on managing parts and merges at scale, see our experience operating ClickHouse clusters.
Large partitions
The PARTITION BY
clause affects merge efficiency and FINAL
query performance. Large partitions with millions of rows take longer to merge and deduplicate at query time. With year-based partitioning, queries can process 2.35× faster compared to no partitioning. Monthly partitions work well for most use cases:
CREATE TABLE user_activity (
user_id UInt64,
activity_type String,
event_time DateTime,
version UInt64
) ENGINE = ReplacingMergeTree(version)
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, activity_type);
This creates one partition per month, which keeps partition size manageable while still allowing efficient pruning for time-range queries.
Streaming and CDC patterns with ReplacingMergeTree
ReplacingMergeTree works particularly well for ingesting data from streaming sources and change data capture systems. The engine's deduplication behavior naturally handles updates and overwrites from streaming data.
Kafka engine ingestion
The Kafka table engine feeds data directly into ReplacingMergeTree tables for real-time deduplication:
CREATE TABLE user_activity_kafka (
user_id UInt64,
activity_type String,
event_time DateTime,
version UInt64
) ENGINE = Kafka()
SETTINGS
kafka_broker_list = 'localhost:9092',
kafka_topic_list = 'user_activity',
kafka_group_name = 'clickhouse_consumer',
kafka_format = 'JSONEachRow';
CREATE MATERIALIZED VIEW user_activity_mv TO user_activity AS
SELECT * FROM user_activity_kafka;
This pattern continuously ingests from Kafka and writes to your ReplacingMergeTree table. The materialized view handles the data flow automatically.
Upserts from PostgreSQL CDC
Change data capture from PostgreSQL or other OLTP databases produces a stream of inserts, updates, and deletes. ReplacingMergeTree handles this stream naturally when you use transaction IDs or LSN (Log Sequence Numbers) as version columns:
CREATE TABLE products (
product_id UInt64,
name String,
price Decimal(10, 2),
updated_at DateTime,
postgres_lsn UInt64,
sign Int8
) ENGINE = ReplacingMergeTree(postgres_lsn, sign)
ORDER BY product_id;
Your CDC tool inserts each change event with the LSN as the version. Updates overwrite previous versions, and deletes insert a row with sign = -1
.
Faster time to production with Tinybird
Tinybird handles the infrastructure complexity of running ReplacingMergeTree tables in production. The platform manages merge optimization, monitoring, and scaling automatically so you can focus on your application logic rather than cluster operations.
When you deploy a ReplacingMergeTree table to Tinybird, the service handles part management, merge scheduling, and resource allocation. The platform provides built-in observability for tracking merge activity, part counts, and deduplication status through the Tinybird console.
Sign up for a free Tinybird account to try ReplacingMergeTree without managing infrastructure. The platform includes managed ingestion from Kafka, PostgreSQL, and other sources, plus hosted API endpoints for querying your deduplicated data.
FAQs about ReplacingMergeTree
Can I query ReplacingMergeTree without using FINAL?
Yes, regular SELECT
queries work but may return duplicate rows until background merges complete. Use FINAL
only when you need guaranteed deduplication, keeping in mind the performance overhead.
How do I backfill historical data without breaking merges?
Insert historical data in chronological order with proper version numbers. Consider using INSERT INTO ... SELECT
with ORDER BY
to maintain version sequence, and batch large backfills to avoid creating too many small parts./