---
title: Ingest data from files
meta:
  description: Learn how to ingest data from files to Tinybird.
---

# Ingest data from files

You can ingest data from files to Tinybird using the [Data sources API](/api-reference/datasource-api) or the [tb datasource](/forward/dev-reference/commands/tb-datasource) CLI command. [Ingestion limits](/forward/pricing/limits#ingestion-limits) apply.

## Supported file types

Tinybird supports these file types and compression formats at ingest time:

| File type | Method                       | Accepted extensions      | Compression formats supported |
| --------- | ---------------------------- | ------------------------ | ----------------------------- |
| CSV       | File upload, URL             | `.csv`, `.csv.gz`        | `gzip`                        |
| NDJSON    | File upload, URL, Events API | `.ndjson`, `.ndjson.gz`  | `gzip`                        |
| Parquet   | File upload, URL             | `.parquet`, `.parquet.gz`| `gzip`                        |
| Avro      | Kafka                        |                          | `gzip`                        |


## Analyze the schema of a file

Before you upload data from a file or create a data source, you can analyze the scheme of the file. Tinybird infers column names, types, and JSONPaths. This is helpful to identify the most appropriate data types for your columns. See [Data types](/sql-reference/data-types).

The following examples show how to analyze a local CSV file.

{% tabs initial="tb datasource" %}

{% tab label="tb datasource" %}

```shell
tb datasource analyze local_file.csv
```

{% /tab %}

{% tab label="Analyze API" %}

```shell {% title="analyze an NDJSON file to get a valid schema" %}
# Calling the local endpoint
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "localhost/v0/analyze" \
-F "csv=@local_file.csv"
```

{% /tab %}

{% /tabs %}

## Append data from a file

You can append data from a local or remote file to a data source in Tinybird Local or Tinybird Cloud. 

The following examples show how to append data from a local file to a data source in Tinybird Cloud:

{% tabs initial="tb datasource (Local file)" %}

{% tab label="tb datasource (Local file)" %}

```shell
tb --cloud datasource append <data_source_name> local_file.csv
```

{% /tab %}

{% tab label="tb datasource (Remote file)" %}

```shell
tb --cloud datasource append <data_source_name> http://example_url/file.csv
```

{% /tab %}

{% tab label="Data source API" %}

```shell
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=append&name=<my_datasource_name>" \
-F csv=@local_file.csv
```
{% /tab %}

{% tab label="Remote file using the API" %}

```shell {% title="Appending data to a data source from a remote CSV file" %}
curl \
-H "Authorization: Bearer <DATASOURCES:APPEND token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=append&name=my_datasource_name" \
-d url='https://s3.amazonaws.com/nyc-tlc/trip+data/fhv_tripdata_2018-12.csv'
```
{% /tab %}

{% /tabs %}

When appending CSV files, you can improve performance by excluding the CSV Header line. However, in this case, make sure the CSV columns are ordered. If you can't guarantee the order of columns in your CSV, include the CSV header.

## Replace data from a file

You can replace existing all data or a selection of data in a data source with the contents of a file. You can replace with data from local or remote files.

{% callout type="warning" %}
When using `mode=replace` with an S3 URL via the API, you must use a pre-signed URL. Unlike `mode=append`, `mode=replace` is a multi-step process that passes the URL to a background worker with no access to your S3 Connector credentials. Generate a pre-signed URL programmatically using the AWS SDK or CLI before passing it to the API.
{% /callout %}

The following examples show how to replace data in Tinybird Cloud:

{% tabs initial="tb datasource (Local file)" %}

{% tab label="tb datasource (Local file)" %}

```shell
tb --cloud datasource replace <data_source_name> local_file.csv
```

{% /tab %}

{% tab label="tb datasource (Remote file)" %}

```shell
tb --cloud datasource replace <data_source_name> http://example_url/file.csv
```

{% /tab %}

{% tab label="Data source API" %}

```shell
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=<data_source_name>&format=csv" \
-F csv=@local_file.csv
```

{% /tab %}

{% tab label="Remote file using the API" %}

```shell {% title="Replacing a data source from a URL" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=<data_source_name>&format=csv" \
--data-urlencode "url=http://example_url/file.csv"
```
{% /tab %}

{% /tabs %}

## Replace data based on conditions

Instead of replacing all data, you can also replace specific partitions of data. To do this, you define an SQL condition that describes the filter that's applied. All matching rows are deleted before finally ingesting the new file. Only the rows matching the condition are ingested.

Replacements are made by partition, so make sure that the condition filters on the partition key of the data source. If the source file contains rows that don't match the filter, the rows are ignored.

The following examples show how to replace partial data using a condition:

{% tabs initial="tb datasource (Local file)" %}

{% tab label="tb datasource (Local file)" %}

```shell
tb --cloud datasource replace <data_source_name> local_file.csv --sql-condition "my_partition_key > 123"
```

{% /tab %}

{% tab label="tb datasource (Remote file)" %}

```shell
tb --cloud datasource replace <data_source_name> http://example_url/file.csv --sql-condition "my_partition_key > 123"
```

{% /tab %}

{% tab label="Data source API" %}

```shell {% title="Replace filtered data in a data source with data from a local file" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv&replace_condition=my_partition_key%20%3E%20123" \
-F csv=@local_file.csv
```

{% /tab %}

{% tab label="Remote file using the API" %}

```shell {% title="Replace filtered data in a data source with data from a remote file" %}
curl \
-H "Authorization: Bearer <your-token>" \
-X POST "https://api.europe-west2.gcp.tinybird.co/v0/datasources?mode=replace&name=data_source_name&format=csv" \
-d replace_condition='my_partition_key > 123' \
--data-urlencode "url=http://example.com/file.csv"
```

{% /tab %}

{% /tabs %}

All the dependencies of the data source are recalculated so that your data is consistent after the replacement. If you have n-level dependencies, they're also updated by this operation. 

{% callout type="caution" %}
Although replacements are atomic, Tinybird can't assure data consistency if you continue appending data to any related data source at the same time the replacement takes place. The new incoming data is discarded.
{% /callout %}



