---
title: ReplacingMergeTree Engine reference
meta:
    description: Use the ReplacingMergeTree engine for deduplicating rows.
---

# ReplacingMergeTree engine

The engine differs from [MergeTree](/sql-reference/engines/mergetree) in that it removes duplicate entries with the same sorting key value.

Data deduplication occurs only during a merge. Merging occurs in the background at an unknown time, so you can’t plan for it. Some data may remain unprocessed.

`ReplacingMergeTree` is suitable for clearing out duplicate data in the background to save space, but it doesn't guarantee the absence of duplicates. Check [Deduplication strategies](/work-with-data/strategies/deduplication-strategies) for more information about how to handle duplicates.

## Creating a ReplacingMergeTree Data Source

```tb {% title="post_views_rmt.datasource" %}
DESCRIPTION >
    Data Source to save latest version of post info. ReplacingMergeTree Engine.

SCHEMA >
    `post_id` Int32 `json:$.post_id`,
    `views` Int32 `json:$.views`,
    `likes` Int32 `json:$.likes`,
    `tag` String `json:$.tag`,
    `timestamp` DateTime `json:$.timestamp`,
    `_is_deleted` UInt8 `json:$._is_deleted` DEFAULT 0

ENGINE "ReplacingMergeTree"
ENGINE_PARTITION_KEY ""
ENGINE_SORTING_KEY "post_id"
ENGINE_VER "timestamp"
ENGINE_IS_DELETED "_is_deleted"
```

## ReplacingMergeTree parameters

### ENGINE_VER

`ENGINE_VER` is the column with the version number. Type `UInt*`, `Date`, `DateTime` or `DateTime64`. Optional parameter.

When merging, `ReplacingMergeTree` from all the rows with the same `ENGINE_SORTING_KEY` leaves only one:

   - The last in the selection, if `ENGINE_VER` not set. A selection is a set of rows in a set of parts participating in the merge. The most recently created part, the last insert, is the last one in the selection. After deduplication, the last row from the most recent insert remains for each unique sorting key.
   - With the maximum version, if `ENGINE_VER` specified. If `ENGINE_VER` is the same for several rows, then it uses "if `ENGINE_VER` isn't specified" rule for them, i.e. the most recent inserted row remains.


### ENGINE_IS_DELETED

`ENGINE_IS_DELETED` is the name of the column used during a merge to determine whether the data in this row represents the state or is to be deleted. `1` is a "deleted" row, `0` is a "state" row.

`ENGINE_IS_DELETED` can only be enabled when `ENGINE_VER` is used. Note it's a mask, not an actual delete, so queries to the Data Source with `FINAL` are equivalent to queries with `WHERE <is_deleted_column> = 0`.

No matter the operation on the data, the version must be increased. If two inserted rows have the same version number, the last inserted row is the one kept.

### ENGINE_PARTITION_KEY

To improve performance, the partition key in ReplacingMergeTree data sources should contain only columns from sorting key expression to avoid deduplicating rows from different partitions.

You can read more about `ENGINE_PARTITION_KEY` in [MergeTree Engine Settings](/sql-reference/engines/mergetree#engine-settings).

## Query time de-duplication and FINAL

At merge time, the ReplacingMergeTree identifies duplicate rows, using the values of the `ENGINE_SORTING_KEY` columns as a unique identifier, and retains only the highest version. This, however, offers eventual correctness only - it doesn't guarantee rows are deduplicated. Queries can, therefore, produce incorrect answers due to update and delete rows being considered in queries.

To obtain correct answers, complement background merges with query time deduplication and deletion removal. You can do this by using the `FINAL` operator.

## Example:

```bash

echo '{ "timestamp": "2024-07-02T02:22:17", "post_id": 956, "views": 856875, "likes": 2321, "tag": "Sports" }' > post_views.ndjson
echo '{ "timestamp": "2024-08-02T03:22:17", "post_id": 956, "views": 956875, "likes": 3321, "tag": "Sports" }' >> post_views.ndjson
echo '{ "timestamp": "2024-08-01T00:00:00", "post_id": 1, "views": 56875, "likes": 321, "tag": "Music" }' >> post_views.ndjson
echo '{ "timestamp": "2024-09-01T00:00:00", "post_id": 1, "views": 56875, "likes": 321, "tag": "Music", "_is_deleted": 1 }' >> post_views.ndjson

tb datasource append post_views_rmt post_views.ndjson

tb sql "select * from post_views_rmt final"
-------------------------------------------------------------------------
| post_id |  views | likes | tag    | timestamp           | _is_deleted |
-------------------------------------------------------------------------
|     956 | 956875 |  3321 | Sports | 2024-08-02 03:22:17 |           0 |
-------------------------------------------------------------------------

```


## Settings

For a list of supported settings, see [Engine settings](/sql-reference/engines/engine-settings).
