---
title: Populate and copy data between Data Sources
meta:
  description: Learn how populating data within Tinybird works, including the details of reingesting data and important considerations for maintaining data integrity.
---

# Populate and copy data between Data Sources

You can use Tinybird to populate Data Sources using existing data through the Populate and Copy operations. Both are often used in similar scenarios, with the main distinction being that you can schedule Copy jobs, and that they have more restrictions.

Read on to learn how populating data within Tinybird works, including the details of reingesting data and important considerations for maintaining data integrity.

## Populate by partition

Populating data by partition requires as many steps as partitions in the origin Data Source. You can use this approach for progress tracking and dynamic resource recalculation while the job is in progress. You can also retry steps in case of a memory limit error, which is safer and more reliable.

The following diagram shows a populate scenario that involves two Data Sources:

```mermaid
flowchart LR
    DSA(Data Source A) --> PIPEA
    PIPEA(Materialized Pipe A
    SELECT id, name
    FROM data_source_a
    ) --> DSB
    DSB(Data Source B)
```

Given that Data Source A has 3 partitions:

* The job processes the data partition by partition from the source, `Data Source A`
* After the data has been processed, it updates the data on the destination, `Data Source B`

## Understanding the Data Flow

As a use case expands, it might develop a complex Data Flow. Here are three key points to consider:

* Data is processed by partition from the origin: each step handles data from a single partition of the origin Data Source.
* When more than one Materialized Pipe exists for the same Data Source, the execution order isn't deterministic.
* Destination Data Sources in the Data Flow only use the data from the specific partition being processed.

The following examples illustrates the behavior of Populate jobs in different scenarios.

### Case 1: Joining data from a Data Source that isn't a destination in the Data Flow

When using a Data Source (Data Source C) in a Materialized Pipe query, if Data Source C isn't a destination Data Source for any other Materialized Pipe in the Data Flow, it uses all available data in Data Source C at that moment.

```mermaid
flowchart LR
    DSA(Data Source A) ---> |Data partition from A| PIPEA
    DSC(Data Source C) -.-> |All data from C| PIPEA
    PIPEA["Materialized Pipe
SELECT id, name
FROM data_source_a
LEFT JOIN data_source_c
USING id
    "] --> DSB
    DSB(Data Source B)
```

### Case 2: Joining data from a Data Source that is a destination in the same Materialized View

When using the destination Data Source `Data Source B` in the Materialized Pipe query, `Data Source B` doesn't join any data. This occurs because the data is processed by partition, and the required partition isn't available in the destination at that time.


```mermaid
flowchart LR
    DSA(Data Source A) --> |Data partition from A| PIPE1
    PIPE1["Materialized Pipe
SELECT id, name
FROM data_source_a
LEFT JOIN data_source_b USING id
    "] --> DSB(Data Source B)
    DSB(Data Source B) -..-> |No data joined from B as data partition from A is being processed and isn't present yet on B| PIPE1
```


### Case 3: Joining data from a Data Source that is a destination in another Materialized View

When using a Data Source (Data Source C) in a Materialized Pipe (Materialized Pipe 3) query that is the destination of another Materialized Pipe (Materialized Pipe 2) in the Data Flow, it retrieves the data ingested during the process.

Whether Data Source C contains data before the view on Materialized Pipe 3 runs isn't deterministic. Because the order depends on the internal ID, you can't determine which Data Source is updated first.

```mermaid
flowchart TB
    DSA(Data Source A) --> PIPEA
    PIPEA["Materialized Pipe 1"
    SELECT id, name FROM data_source_a
    ] --> DSB
    DSB(Data Source B) --> |Data partition from A| PIPEB
    DSB(Data Source B) --> |Data partition from A| PIPEC
    DSC(Data Source C) -.- |Undeterministic: might have processed the data or not| PIPEB 
    PIPEB["Materialized Pipe 3
SELECT id, name
FROM data_source_b
LEFT JOIN data_source_c
USING id
    "] --> DSD(Data Source D)
    PIPEC["Materialized Pipe 2
SELECT id, name
FROM data_source_b
    "] --> DSC
```

To control the order of the Data Flow, run each populate operation separately:

1. Run a populate over Materialized Pipe 1 to populate the data from Data Source A to Data Source B. To prevent automatic data propagation through the rest of the Materialized Views, either unlink the views or truncate the dependent Data Sources if they are repopulated.
2. Perform separate populate operations on Materialized Pipe 2 and Materialized Pipe 3, instead of a single operation on Materialized Pipe 1.

## Learn more

Before you use Populate and Copy operations for different [backfill strategies](/classic/work-with-data/strategies/backfill-strategies), understand how they work within the Data Flow and its limitations. 

Read also the [Materialized Views guide](/classic/work-with-data/process-and-copy/materialized-views/best-practices): populating data while continuing ingestion into the origin Data Source might lead to duplicated data.
