---
title: S3 Sink
meta:
    description: Offload data to S3 on a batch-based schedule using Tinybird's fully managed S3 Sink Connector.
---

# S3 Sink

You can set up an S3 Sink to export your data from Tinybird to any S3 bucket in CSV, NDJSON, or Parquet format. The S3 Sink allows you to offload data on a batch-based schedule using Tinybird's fully managed connector.

Setting up the S3 Sink requires:

1. Configuring AWS [permissions](#aws-permissions) using [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html).
2. Creating a connection file in Tinybird.
3. Creating a Sink pipe that uses this connection.

{% callout type="info" %}
The S3 Sink feature is available for Developer and Enterprise plans. See [Plans](/forward/pricing).
{% /callout %}

## Environment considerations

Before setting up the S3 Sink, understand how it works in different environments.

### Cloud environment

In the Tinybird Cloud environment, Tinybird uses its own AWS account to assume the IAM role you create, allowing it to write to your S3 bucket.

### Local environment

When using the S3 Sink in the Tinybird Local environment, which runs in a container, you need to pass your local AWS credentials to the container. These credentials must have the [permissions described in the AWS permissions section](#aws-permissions), including access to S3 operations like `PutObject`, `ListBucket`, etc. This allows Tinybird Local to assume the IAM role you specify in your connection.

To pass your AWS credentials, use the `--use-aws-creds` flag when starting Tinybird Local:

```bash
tb local start --use-aws-creds
» Starting Tinybird Local...
✓ AWS credentials found and will be passed to Tinybird Local (region: us-east-1)
* Waiting for Tinybird Local to be ready...
✓ Tinybird Local is ready!
```

If you're using a specific AWS profile, you can specify it using the `AWS_PROFILE` environment variable:

```bash
AWS_PROFILE=my-profile tb local start --use-aws-creds
```

{% callout type="caution" %}
When using the S3 Sink in the `--local` environment, scheduled sink operations are not supported. You can only run on-demand sinks using `tb sink run <pipe_name>`. For scheduled sink operations, use the Cloud environment.
{% /callout %}

## AWS permissions

The S3 Sink requires an IAM Role with specific permissions to write objects to your Amazon S3 bucket:

- `s3:GetObject`
- `s3:PutObject`
- `s3:PutObjectAcl`
- `s3:ListBucket`
- `s3:GetBucketLocation`

{% callout type="tip" %}
**One IAM role can access multiple buckets**: You can update the access policy to include multiple buckets by adding their ARNs to the `Resource` array. This allows you to reuse the same IAM role across multiple S3 Sink Connections for different buckets.
{% /callout %}

You need to create both an access policy and a trust policy in AWS:

{% tabs variant="code" initial="AWS Access Policy" %}
{% tab label="AWS Access Policy" %}
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::{bucket-name}/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:ListBucket"
            ],
            "Resource": "arn:aws:s3:::{bucket-name}"
        }
    ]
}
```
{% /tab %}
{% tab label="AWS Trust Policy" %}
```json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "sts:AssumeRole",
            "Principal": {
                "AWS": "arn:aws:iam::{AWS_ACCOUNT_ID}:root"
            },
            "Condition": {
                "StringEquals": {
                    "sts:ExternalId": "{EXTERNAL_ID}"
                }
            }
        }
    ]
}
```

For the AWS Trust Policy you need to replace placeholder values with values specific to your Tinybird environment:

- `{AWS_ACCOUNT_ID}`:
  - For Cloud environments: Tinybird's AWS account ID, which varies depending on your region and provider.
  - For Local environments: The AWS account ID of the credentials you pass to the Docker container with `--use-aws-creds`.
- `{EXTERNAL_ID}`: A unique identifier provided by Tinybird and generated from your Connection name.

To get the correct values for your Trust Policy:

1. Use the guided CLI process with `tb connection create s3`.
2. Or access the API endpoint `/v0/integrations/s3/policies/trust-policy?external_id_seed={CONNECTION_NAME}` for your Workspace.

{% callout type="tip" %}
To allow access from both Local and Cloud environments with a single IAM role, add both account IDs to the `Principal.AWS` in array format: `["arn:aws:iam::{LOCAL_ACCOUNT_ID}:root", "arn:aws:iam::{CLOUD_ACCOUNT_ID}:root"]` in the Trust Policy.
{% /callout %}

{% /tab %}
{% /tabs %}

## Set up the sink

{% steps %}

### Create an S3 connection

You can create an S3 connection in Tinybird using either the guided CLI process or by manually creating a connection file.

#### Option 1: Use the guided CLI process (recommended)

The Tinybird CLI provides a guided process that helps you set up the required AWS permissions and creates the connection file automatically:

```bash
tb connection create s3
```

When prompted, you'll need to:

1. Enter a name for your connection.
2. Specify whether you'll use this connection for sinking or ingesting data.
3. Enter the S3 bucket name.
4. Enter the AWS region where your bucket is located.
5. Copy the displayed AWS IAM policy to your clipboard (you'll need this to set up permissions in AWS).
6. Copy the displayed AWS IAM role trust policy for your Local environment, then enter the ARN of the role you create.
7. Copy the displayed AWS IAM role trust policy for your Cloud environment, then enter the ARN of the role you create.
8. The ARN values will be stored securely using [tb secret](/forward/dev-reference/commands/tb-secret), which will allow you to have different roles for each environment.

#### Option 2: Define the Connection manually

You can also define the S3 Connection manually in your project. The recommended authentication method is an IAM role with secrets for the role ARN.

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```tinybird {% title="connections/s3sample.connection" %}
TYPE s3
S3_REGION "<S3_REGION>"
S3_ARN {{ tb_secret("AWS_ROLE_ARN") }}
```

{% /tab %}

{% tab label="TypeScript SDK" %}

```typescript {% title="tinybird.ts" %}
import { defineS3Connection, secret } from "@tinybirdco/sdk";

export const s3sample = defineS3Connection("s3sample", {
  region: "<S3_REGION>",
  arn: secret("AWS_ROLE_ARN"),
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python {% title="tinybird.py" %}
from tinybird_sdk import define_s3_connection, secret

s3sample = define_s3_connection("s3sample", {
    "region": "<S3_REGION>",
    "arn": secret("AWS_ROLE_ARN"),
})
```

{% /tab %}
{% /tabs %}

When creating your Connection manually, set up the required AWS IAM role with appropriate permissions. See the [AWS permissions](#aws-permissions) section for details on the required access policy and trust policy configurations.

{% callout type="info" %}
IAM role authentication is recommended over HMAC authentication. It uses temporary credentials and follows AWS security best practices.
{% /callout %}

See [Connection files](/forward/dev-reference/datafiles/connection-files) for more details on how to create a connection file and manage secrets.

{% callout type="caution" %}
You need to create separate connections for each environment you're working with, Local and Cloud.

For example, you can create:

- `my-s3-local` for your Local environment
- `my-s3-cloud` for your Cloud environment
{% /callout %}

### Create a Sink pipe

To create a Sink pipe, filter the data you want to export to your bucket in the SQL section as in any other pipe. Then, specify the pipe as a Sink and add the needed configuration.

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```tinybird {% title="s3_export.pipe" %}
NODE node_0

SQL >
    SELECT *
    FROM events
    WHERE status = 'processed'

TYPE sink
EXPORT_CONNECTION_NAME "s3sample"
EXPORT_BUCKET_URI "s3://tinybird-sinks"
EXPORT_FILE_TEMPLATE "daily_prices" # Supports partitioning
EXPORT_SCHEDULE "*/5 * * * *" 
EXPORT_FORMAT "csv" # Optional
EXPORT_COMPRESSION "gz" # Optional
EXPORT_STRATEGY "create_new" # Optional
```

{% /tab %}

{% tab label="TypeScript SDK" %}

```typescript {% title="tinybird.ts" %}
import {
  defineDatasource,
  defineS3Connection,
  defineSinkPipe,
  engine,
  node,
  secret,
  t,
} from "@tinybirdco/sdk";

export const s3sample = defineS3Connection("s3sample", {
  region: "us-east-1",
  arn: secret("AWS_ROLE_ARN"),
});

export const events = defineDatasource("events", {
  schema: {
    timestamp: t.dateTime(),
    session_id: t.string(),
    status: t.string(),
  },
  engine: engine.mergeTree({
    sortingKey: ["timestamp"],
  }),
});

export const s3Export = defineSinkPipe("s3_export", {
  sink: {
    connection: s3sample,
    bucketUri: "s3://tinybird-sinks",
    fileTemplate: "daily_prices",
    schedule: "*/5 * * * *",
    format: "csv",
  },
  nodes: [
    node({
      name: "node_0",
      sql: `
        SELECT *
        FROM events
        WHERE status = 'processed'
      `,
    }),
  ],
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python {% title="tinybird.py" %}
from tinybird_sdk import (
    define_datasource,
    define_s3_connection,
    define_sink_pipe,
    engine,
    node,
    secret,
    t,
)

s3sample = define_s3_connection("s3sample", {
    "region": "us-east-1",
    "arn": secret("AWS_ROLE_ARN"),
})

events = define_datasource("events", {
    "schema": {
        "timestamp": t.date_time(),
        "session_id": t.string(),
        "status": t.string(),
    },
    "engine": engine.merge_tree({
        "sorting_key": ["timestamp"],
    }),
})

s3_export = define_sink_pipe("s3_export", {
    "sink": {
        "connection": s3sample,
        "bucket_uri": "s3://tinybird-sinks",
        "file_template": "daily_prices",
        "schedule": "*/5 * * * *",
        "format": "csv",
    },
    "nodes": [
        node({
            "name": "node_0",
            "sql": """
                SELECT *
                FROM events
                WHERE status = 'processed'
            """,
        }),
    ],
})
```

{% /tab %}
{% /tabs %}

### Deploy the Sink pipe

After defining your S3 data source and connection, test it by running a deploy check:

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```shell
tb --cloud deploy --check
```

{% /tab %}
{% tab label="TypeScript SDK" %}

```shell
npx tinybird deploy --check
```

{% /tab %}
{% tab label="Python SDK" %}

```shell
uv run tinybird deploy --check
```

{% /tab %}
{% /tabs %}

This runs the connection locally and checks if the connection is valid. To see the connection details, run `tb --cloud connection ls`.

When ready, push the datafile to your Workspace using `tb deploy` to create the Sink pipe:

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```shell
tb --cloud deploy
```

{% /tab %}
{% tab label="TypeScript SDK" %}

```shell
npx tinybird deploy
```

{% /tab %}
{% tab label="Python SDK" %}

```shell
uv run tinybird deploy
```

{% /tab %}
{% /tabs %}

This creates the Sink pipe in your workspace and makes it available for execution.

{% /steps %}

## .connection settings

The S3 connector use the following settings in .connection files:

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `S3_REGION`
   * Yes
   * Region of the S3 bucket.
   ---
   * `S3_ARN`
   * No*
   * ARN of the IAM role with the required permissions. Required for IAM Role authentication.
   ---
   * `S3_ACCESS_KEY`
   * No*
   * AWS access key for HMAC authentication. Store as a [Tinybird secret](/forward/dev-reference/commands/tb-secret).
   ---
   * `S3_SECRET`
   * No*
   * AWS secret key for HMAC authentication. Store as a [Tinybird secret](/forward/dev-reference/commands/tb-secret).
{% /table %}

*Either `S3_ARN` (for IAM Role authentication) or both `S3_ACCESS_KEY` and `S3_SECRET` (for HMAC authentication) are required.


## .pipe settings

The S3 Sink pipe uses the following settings in .pipe files:

{% table %}
  * Key
  * Type
  * Description
  ---
  * `EXPORT_CONNECTION_NAME`
  * string
  * Required. The connection name to the destination service. This the connection created in Step 1.
  ---
  * `EXPORT_BUCKET_URI`
  * string
  * Required. The path to the destination bucket. Example: `s3://tinybird-export`
  ---
  * `EXPORT_FILE_TEMPLATE`
  * string
  * Required. The target file name. Can use parameters to dynamically name and partition the files. See File partitioning section below. Example: `daily_prices_{customer_id}`
  ---
  * `EXPORT_SCHEDULE`
  * string
  * Required. A crontab expression that sets the frequency of the Sink operation or the @on-demand string.
  ---
  * `EXPORT_FORMAT`
  * string
  * Optional. The output format of the file. Values: CSV, NDJSON, Parquet. Default value: CSV
  ---
  * `EXPORT_COMPRESSION`
  * string
  * Optional. Accepted values: `none`, `gz` for gzip, `br` for brotli, `xz` for LZMA, `zst` for zstd. Default: `none`
  ---
  * `EXPORT_STRATEGY`
  * string
  * Optional. Defines how to handle existing files. Values: `create_new` (default), `replace`. See [Write strategies](#write-strategies) section below.
{% /table %}


### Supported regions

The Tinybird S3 Sink feature only supports exporting data to the following AWS regions:

- `us-east-*`
- `us-west-*`
- `eu-central-*`
- `eu-west-*`
- `eu-south-*`
- `eu-north-*`

### Scheduling considerations

The schedule applied to a Sink pipe doesn't guarantee that the underlying job executes immediately at the configured time. The job is placed into a job queue when the configured time elapses. It is possible that, if the queue is busy, the job could be delayed and executed after the scheduled time.

To reduce the chances of a busy queue affecting your Sink pipe execution schedule, distribute the jobs over a wider period of time rather than grouping them close together.

### Write strategies

The `EXPORT_STRATEGY` parameter determines how Tinybird handles existing files in your S3 bucket:

- **`create_new`** (default): Creates new files without overwriting existing ones. If a file with the same name already exists, Tinybird will append a suffix to make the filename unique.
- **`replace`**: Overwrites existing files with the same name. Use this when you want to replace previous exports entirely.


### Query parameters

You can add [query parameters](/forward/query-data/query-parameters) to your Sink pipes, the same way you do in API Endpoints or Copy pipes.

- For on-demand executions, you can set parameters when you trigger the Sink pipe to whatever values you wish.
- For scheduled executions, the default values for the parameters will be used when the Sink pipe runs.


## Execute the Sink pipe

### On-demand execution

You can trigger your Sink pipe manually using:

```bash
tb sink run <pipe_name>
```

{% callout type="tip" %}
When triggering a Sink pipe you have the option of overriding several of its settings, like format or compression. Refer to the [Sink pipes API spec](/api-reference/sink-pipes-api) for the full list of parameters.
{% /callout %}

### Scheduled execution

If you configured a schedule with `EXPORT_SCHEDULE`, the Sink pipe will run automatically according to the cron expression.

Once the Sink pipe is triggered, it creates a standard Tinybird job that can be followed via the `v0/jobs` API or using `tb job ls --kind=sink`.

## File template

The export process allows you to partition the result in different files, allowing you to organize your data and get smaller files. The partitioning is defined in the file template and based on the values of columns of the result set.

### Partition by column

Add a template variable like `{COLUMN_NAME}` to the filename. For instance, consider the following query schema and result for an export:

{% table %}
  * customer_id
  * invoice_id
  * amount
  ---
  * ACME
  * INV20230608
  * 23.45
  ---
  * ACME
  * 12345INV
  * 12.3
  ---
  * GLOBEX
  * INV-ABC-789
  * 35.34
  ---
  * OSCORP
  * INVOICE2023-06-08
  * 57
  ---
  * ACME
  * INV-XYZ-98765
  * 23.16
  ---
  * OSCORP
  * INV210608-001
  * 62.23
  ---
  * GLOBEX
  * 987INV654
  * 36.23
{% /table %}

With the given file template `invoice_summary_{customer_id}.csv` you'd get 3 files:

`invoice_summary_ACME.csv`
{% table %}
  * customer_id
  * invoice_id
  * amount
  ---
  * ACME
  * INV20230608
  * 23.45
  ---
  * ACME
  * 12345INV
  * 12.3
  ---
  * ACME
  * INV-XYZ-98765
  * 23.16
{% /table %}

`invoice_summary_OSCORP.csv`
{% table %}
  * customer_id
  * invoice_id
  * amount
  ---
  * OSCORP
  * INVOICE2023-06-08
  * 57
  ---
  * OSCORP
  * INV210608-001
  * 62.23
{% /table %}

`invoice_summary_GLOBEX.csv`
{% table %}
  * customer_id
  * invoice_id
  * amount
  ---
  * GLOBEX
  * INV-ABC-789
  * 35.34
  ---
  * GLOBEX
  * 987INV654
  * 36.23
{% /table %}

### Values format

{% callout type="caution" %}
In the case of DateTime columns, it can be dangerous to partition just by the column. Why? Because you could end up with as many files as seconds, as they're the different values for a DateTime column. In an hour, that's potentially 3600 files.
{% /callout %}

To help partition in a sensible way, you can add a format string to the column name using the following placeholders:

{% table %}
  * Placeholder
  * Description
  * Example
  ---
  * %Y
  * Year
  * 2023
  ---
  * %m
  * Month as an integer number (01-12)
  * 06
  ---
  * %d
  * Day of the month, zero-padded (01-31)
  * 07
  ---
  * %H
  * Hour in 24h format (00-23)
  * 14
  ---
  * %i
  * Minute (00-59)
  * 45
{% /table %}

For instance, for a result like this:

{% table %}
  * timestamp
  * invoice_id
  * amount
  ---
  * 2023-07-07 09:07:05
  * INV20230608
  * 23.45
  ---
  * 2023-07-07 09:07:01
  * 12345INV
  * 12.3
  ---
  * 2023-07-07 09:06:45
  * INV-ABC-789
  * 35.34
  ---
  * 2023-07-07 09:05:35
  * INVOICE2023-06-08
  * 57
  ---
  * 2023-07-06 23:14:05
  * INV-XYZ-98765
  * 23.16
  ---
  * 2023-07-06 23:14:02
  * INV210608-001
  * 62.23
  ---
  * 2023-07-06 23:10:55
  * 987INV654
  * 36.23
{% /table %}

Note that all 7 events have different times in the column timestamp. Using a file template like `invoices_{timestamp}` would create 7 different files.

If you were interested in writing one file per hour, you could use a file template like `invoices_{timestamp, '%Y%m%d-%H'}`. You'd then get only two files for that dataset:

`invoices_20230707-09.csv`
{% table %}
  * timestamp
  * invoice_id
  * amount
  ---
  * 2023-07-07 09:07:05
  * INV20230608
  * 23.45
  ---
  * 2023-07-07 09:07:01
  * 12345INV
  * 12.3
  ---
  * 2023-07-07 09:06:45
  * INV-ABC-789
  * 35.34
  ---
  * 2023-07-07 09:05:35
  * INVOICE2023-06-08
  * 57
{% /table %}

`invoices_20230706-23.csv`
{% table %}
  * timestamp
  * invoice_id
  * amount
  ---
  * 2023-07-06 23:14:05
  * INV-XYZ-98765
  * 23.16
  ---
  * 2023-07-06 23:14:02
  * INV210608-001
  * 62.23
  ---
  * 2023-07-06 23:10:55
  * 987INV654
  * 36.23
{% /table %}

### By number of files

You also have the option to write the result into X files. Instead of using a column name, use an integer between brackets.

Example: `invoice_summary.{8}.csv`

This is convenient to reduce the file size of the result, especially when the files are meant to be consumed by other services, like Snowflake where uploading big files is discouraged.

The results are written in random order. This means that the final result rows would be written in X files, but you can't count the specific order of the result.

There are a maximum of 16 files.

### Combining different partitions

It's possible to add more than one partitioning parameter in the file template. This is useful, for instance, when you do a daily dump of data, but want to export one file per hour.

Setting the file template as `invoices/dt={timestamp, '%Y-%m-%d'}/H{timestamp, '%H}.csv` would create the following file structure in different days and executions:

```text
Invoices
├── dt=2023-07-07
│   └── H23.csv
│   └── H22.csv
│   └── H21.csv
│   └── ...
├── dt=2023-07-06
│   └── H23.csv
│   └── H22.csv
```

You can also mix column names and number of files. For instance, setting the file template as `invoices/{customer_id}/dump_{4}.csv` would create the following file structure in different days and executions:

```text
Invoices
├── ACME
│   └── dump_0.csv
│   └── dump_1.csv
│   └── dump_2.csv
│   └── dump_3.csv
├── OSCORP
│   └── dump_0.csv
│   └── dump_1.csv
│   └── dump_2.csv
│   └── dump_3.csv
```

{% callout type="caution" %}
Be careful with excessive partitioning. Take into consideration that the write process will create as many files as combinations of the values of the partitioning columns for a given result set.
{% /callout %}

## Supported file types

The S3 Sink supports exporting data in the following file formats:

{% table %}
* File type
* Accepted extensions
* Compression formats supported
---
* CSV
* `.csv`, `.csv.gz`
* `gzip`
---
* NDJSON
* `.ndjson`, `.ndjson.gz`, `.jsonl`, `.jsonl.gz`, `.json`, `.json.gz`
* `gzip`
---
* Parquet
* `.parquet`, `.parquet.gz`
* `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd`
{% /table %}

You can optionally configure the export format using the `EXPORT_FORMAT` parameter (defaults to CSV) and compression using the `EXPORT_COMPRESSION` parameter in your Sink pipe configuration.

## Observability

Sink pipes operations are logged in the [tinybird.jobs_log](/forward/monitoring/service-datasources#tinybird-jobs-log) Service Data Source. You can filter by `job_type = 'sink'` to see only Sink pipe executions.

For more detailed Sink-specific information, you can also use [tinybird.sinks_ops_log](/forward/monitoring/service-datasources#tinybird-sinks-ops-log).

Data Transfer incurred by Sink pipes is tracked in [tinybird.data_transfer](/forward/monitoring/service-datasources#tinybird-data-transfer) Service Data Source.

## Limits & quotas

{% snippet title="forward-limits-reminder" /%}

## Billing

Tinybird bills Sink pipes based on Data Transfer. When a Sink pipe executes, it uses your plan's included compute resources (vCPUs and active minutes) to run the query, then writes the result to a bucket (Data Transfer). If the resulting files are compressed, Tinybird accounts for the compressed size.

### Data Transfer

Data Transfer depends on your environment. There are two scenarios:

- The destination bucket is in the **same** cloud provider and region as your Tinybird Workspace: $0.01 / GB
- The destination bucket is in a **different** cloud provider or region as your Tinybird Workspace: $0.10 / GB

## Next steps

- Get familiar with the [Service Data Source](/forward/monitoring/service-datasources) and see what's going on in your account
- Deep dive on Tinybird's [pipes concept](/forward/core-concepts/pipes)