---
title: Ingest data from files
meta:
  description: Learn how to ingest data from files to Tinybird.
---

# Ingest data from files

You can ingest data from files to Tinybird using the [Data sources API](/api-reference/datasource-api), the [tb datasource](/forward/dev-reference/commands/tb-datasource) CLI command, the TypeScript SDK, or the Python SDK. [Ingestion limits](/forward/pricing/limits#ingestion-limits) apply.

## Supported file types

Tinybird supports these file types and compression formats at ingest time:

| File type | Method                       | Accepted extensions      | Compression formats supported |
| --------- | ---------------------------- | ------------------------ | ----------------------------- |
| CSV       | File upload, URL             | `.csv`, `.csv.gz`        | `gzip`                        |
| NDJSON    | File upload, URL, Events API | `.ndjson`, `.ndjson.gz`  | `gzip`                        |
| Parquet   | File upload, URL             | `.parquet`, `.parquet.gz`| `gzip`                        |
| Avro      | Kafka                        |                          | `gzip`                        |

## Analyze the schema of a file

Before you upload data from a file or create a data source, you can analyze the schema of the file. Tinybird infers column names, types, and JSONPaths. This is helpful to identify the most appropriate data types for your columns. See [Data types](/sql-reference/data-types).

The following examples show how to analyze a local NDJSON file.

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb datasource analyze local_file.ndjson
```

{% /tab %}

{% tab label="Analyze API" %}

```shell {% title="Analyze an NDJSON file to get a valid schema" %}
curl \
-H "Authorization: Bearer <DATASOURCES:CREATE token>" \
-X POST "{% user("apiHost") %}/v0/analyze" \
-F "ndjson=@local_file.ndjson"
```

{% /tab %}

{% /tabs %}

## Append data from a file

You can append data from a local or remote file to a data source in Tinybird Local or Tinybird Cloud.

Use `tb datasource append` in the CLI, `mode=append` with the Data Sources API, `append` in the TypeScript SDK, or `append` in the Python SDK.

### Append from a local file

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource append <data_source_name> local_file.csv
```

{% /tab %}

{% tab label="cURL" %}

```shell
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=append&name=<my_datasource_name>" \
-F csv=@local_file.csv
```
{% /tab %}

{% tab label="TypeScript SDK" %}

```ts
import { tinybird } from "./tinybird";

await tinybird.client.datasources.append("<data_source_name>", {
  file: "./local_file.csv",
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python
from src.tinybird.client import tinybird

tinybird.client.datasources.append(
    "data_source_name",
    {
        "file": "./local_file.csv",
    },
)
```

{% /tab %}

{% /tabs %}

### Append from a remote file

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource append <data_source_name> http://example_url/file.csv
```

{% /tab %}

{% tab label="cURL" %}

```shell {% title="Appending data to a data source from a remote CSV file" %}
curl \
-H "Authorization: Bearer <DATASOURCES:APPEND token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=append&name=my_datasource_name" \
-d url='https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv'
```
{% /tab %}

{% tab label="TypeScript SDK" %}

```ts
import { tinybird } from "./tinybird";

await tinybird.client.datasources.append("<data_source_name>", {
  url: "https://example_url/file.csv",
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python
from src.tinybird.client import tinybird

tinybird.client.datasources.append(
    "data_source_name",
    {
        "url": "https://example_url/file.csv",
    },
)
```

{% /tab %}

{% /tabs %}

When appending CSV files, you can improve performance by excluding the CSV Header line. However, in this case, make sure the CSV columns are ordered. If you can't guarantee the order of columns in your CSV, include the CSV header.

## Replace data from a file

You can replace existing all data or a selection of data in a data source with the contents of a file. You can replace with data from local or remote files.

{% callout type="warning" %}
When using `mode=replace` with an S3 URL via the API, you must use a pre-signed URL. Unlike `mode=append`, `mode=replace` is a multi-step process that passes the URL to a background worker with no access to your S3 Connector credentials. Generate a pre-signed URL programmatically using the AWS SDK or CLI before passing it to the API.
{% /callout %}

Use `tb datasource replace` in the CLI, `mode=replace` with the Data Sources API, `replace` in the TypeScript SDK, or `replace` in the Python SDK.

### Replace from a local file

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource replace <data_source_name> local_file.csv
```

{% /tab %}

{% tab label="cURL" %}

```shell
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=<data_source_name>&format=csv" \
-F csv=@local_file.csv
```

{% /tab %}

{% tab label="TypeScript SDK" %}

```ts
import { tinybird } from "./tinybird";

await tinybird.client.datasources.replace("<data_source_name>", {
  file: "./local_file.csv",
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python
from src.tinybird.client import tinybird

tinybird.client.datasources.replace(
    "data_source_name",
    {
        "file": "./local_file.csv",
    },
)
```

{% /tab %}

{% /tabs %}

### Replace from a remote file

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource replace <data_source_name> http://example_url/file.csv
```

{% /tab %}

{% tab label="cURL" %}

```shell {% title="Replacing a data source from a URL" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=<data_source_name>&format=csv" \
--data-urlencode "url=http://example_url/file.csv"
```
{% /tab %}

{% tab label="TypeScript SDK" %}

```ts
import { tinybird } from "./tinybird";

await tinybird.client.datasources.replace("<data_source_name>", {
  url: "https://example_url/file.csv",
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python
from src.tinybird.client import tinybird

tinybird.client.datasources.replace(
    "data_source_name",
    {
        "url": "https://example_url/file.csv",
    },
)
```

{% /tab %}

{% /tabs %}

## Replace data based on conditions

Instead of replacing all data, you can also replace specific partitions of data. To do this, you define an SQL condition that describes the filter that's applied. All matching rows are deleted before finally ingesting the new file. Only the rows matching the condition are ingested.

Replacements are made by partition, so make sure that the condition filters on the partition key of the data source. If the source file contains rows that don't match the filter, the rows are ignored.

{% callout type="caution" %}
Conditional replace is supported in the CLI and the Data Sources API. The TypeScript SDK and Python SDK `replace` methods don't currently support replace conditions.
{% /callout %}

### Replace from a local file with a condition

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource replace <data_source_name> local_file.csv --sql-condition "my_partition_key > 123"
```

{% /tab %}

{% tab label="cURL" %}

```shell {% title="Replace filtered data in a data source with data from a local file" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv&replace_condition=my_partition_key%20%3E%20123" \
-F csv=@local_file.csv
```

{% /tab %}

{% /tabs %}

### Replace from a remote file with a condition

{% tabs initial="Tinybird CLI" %}

{% tab label="Tinybird CLI" %}

```shell
tb --cloud datasource replace <data_source_name> http://example_url/file.csv --sql-condition "my_partition_key > 123"
```

{% /tab %}

{% tab label="cURL" %}

```shell {% title="Replace filtered data in a data source with data from a remote file" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv" \
-d replace_condition='my_partition_key > 123' \
--data-urlencode "url=http://example.com/file.csv"
```

{% /tab %}

{% /tabs %}

All the dependencies of the data source are recalculated so that your data is consistent after the replacement. If you have n-level dependencies, they're also updated by this operation.

{% callout type="caution" %}
Although replacements are atomic, Tinybird can't assure data consistency if you continue appending data to any related data source at the same time the replacement takes place. The new incoming data is discarded.
{% /callout %}
