---
title: GCS Connector
meta:
   description: Learn how to configure the GCS connector for Tinybird.
---

# GCS connector

You can set up a GCS connector to load your CSV, NDJSON, or Parquet files into Tinybird from any GCS bucket. Tinybird automatically ingests matching files on the first deployment, but does **not** detect new files afterwards. You must trigger subsequent ingestion manually.

Setting up the GCS connector requires:

1. Configuring a [Service Account](https://cloud.google.com/iam/docs/service-accounts-create) with these [permissions](#gcs-permissions) in GCP.
2. Defining a GCS Connection in your Tinybird project.
3. Defining a Data Source that uses this Connection.

## Environment considerations

In the Tinybird Cloud environment, Tinybird uses the Service Account credentials you provide to access your GCS bucket. When you deploy to your main Cloud Workspace, use `tb --cloud deploy` as usual.

When you test GCS connector Data Sources in a Cloud Branch, include `--with-connections` so Tinybird creates the connector data linkers in the branch:

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```shell
tb build --with-connections
```

{% /tab %}
{% tab label="TypeScript SDK" %}

```shell
npx tinybird build --with-connections
```

{% /tab %}
{% tab label="Python SDK" %}

```shell
uv run tinybird build --with-connections
```

{% /tab %}
{% /tabs %}

In branches and Tinybird Local, use sample imports to validate schemas and pipelines without syncing every matching file. See [Import sample data](#import-sample-data).

## GCS permissions

To authenticate Tinybird with GCS, you need a GCP service account key in JSON format with the **Object Storage Viewer** role.

1. In the Google Cloud Console, create or use an existing service account.
2. Assign the `roles/storage.objectViewer` role.
3. Generate a JSON key file and download it.
4. Store the key as a Tinybird secret in a `.env.local` file to work in local:

```bash
GCS_KEY='<your-json-key-content>'
```

5. Store the key in Cloud as a Tinybird secret:

```bash
tb --cloud secret set GCS_KEY '<your-json-key-content>'
```

## Set up the connector

{% steps %}

### Create a GCS connection

Define the GCS Connection in your project. For Tinybird CLI datafile projects, `tb connection create gcs` is a useful helper for generating a `.connection` file you can edit.

Run the following command to create a connection:

```bash
tb connection create gcs
```

You will be prompted to enter:

1. A name for your Connection.
2. The GCS bucket name.
3. The service account credentials (JSON key file). You can check [Google Cloud docs](https://cloud.google.com/iam/docs/keys-create-delete) for mode details.
4. Whether to create the connection for your Cloud environment.

You can also define the Connection manually:

{% snippet title="gcs-connection-examples" /%}

Ensure your GCP Service Account has the `roles/storage.objectViewer` role.

{% callout type="caution" %}
Use different Service Account keys for each environment leveraging [Tinybird Secrets](/forward/dev-reference/commands/tb-secret).
{% /callout %}

### Create a GCS Data Source

After setting up the Connection, create a Data Source that uses it. For Tinybird CLI datafile projects, `tb datasource create --gcs` is a useful helper for generating the `.datasource` file.

```bash
tb datasource create --gcs
```

Define the Data Source schema as with any other Data Source, then attach the GCS Connection. The connection name or object must match the Connection you created in the previous step.

{% snippet title="gcs-datasource-examples" /%}

### Sync data

{% callout type="info" %}
On the first deployment, Tinybird automatically ingests all files that match the `IMPORT_BUCKET_URI` pattern. `@auto` mode is not supported, so you must manually trigger subsequent syncs to ingest new files.
{% /callout %}

To trigger a manual sync, use the API or the CLI.

#### Using the API

```sh
curl -X POST "{% user("apiHost") %}/v0/datasources/<datasource_name>/scheduling/runs" \
  -H "Authorization: Bearer <your-tinybird-token>"
```

#### Using the CLI

```sh
tb datasource sync <datasource_name>
```

## .connection settings

The GCS connector uses the following settings in .connection files:

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON`
   * Yes
   * Service Account Key in JSON format, inlined. We recommend using [Tinybird Secrets](/forward/dev-reference/commands/tb-secret).
{% /table %}

{% callout type="warning" %}
Once a Connection is used in a Data Source, you can't change the Service Account Key. To modify it, you must:

1. Remove the Connection from the Data Source.
2. Deploy the changes.
3. Add the Connection again with the new values.
{% /callout %}

{% /steps %}

## .datasource settings

The GCS connector uses the following settings in .datasource files:

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `IMPORT_CONNECTION_NAME`
   * Yes
   * Name given to the Connection inside Tinybird. For example, `'my_connection'`. This is the name of the connection file you created in the previous step.
   ---
   * `IMPORT_BUCKET_URI`
   * Yes
   * Full bucket path, including the `gs://` protocol, bucket name, object path, and an optional pattern to match against object keys. For example, `gs://my-bucket/my-path` discovers all files in the bucket `my-bucket` under the prefix `/my-path`. You can use patterns in the path to filter objects, for example, ending the path with `*.csv` matches all objects that end with the `.csv` suffix.
   ---
   * `IMPORT_SCHEDULE`
   * Yes
   * Use `@on-demand` to sync new files as needed. On the first deployment, Tinybird automatically ingests all matching files. After the initial ingestion, when you manually trigger a sync, Tinybird appends only the files added since the last execution. You can also use `@once`, which behaves the same as `@on-demand`. `@auto` mode is not supported; if you use this option, Tinybird only executes the initial sync.

   ---
   * `IMPORT_FROM_TIMESTAMP`
   * No
   * Sets the date and time from which to start ingesting files on an GCS bucket. The format is `YYYY-MM-DDTHH:MM:SSZ`.
{% /table %}

{% callout type="warning" %}
We don't support changing these settings after the data source is created. If you need to do that, you must:

1. Remove the Connection from the Data Source.
2. Deploy the changes.
3. Add the Connection again with the new values.
4. Deploy again.
{% /callout %}

## Import sample data

In branches and Tinybird Local, you can import a sample of files from GCS using the API. This is useful for validating schemas and testing pipelines without syncing all files from the bucket.

```bash
curl -X POST "{% user("apiHost") %}/v0/datasources/my_datasource/sample" \
  -H "Authorization: Bearer $TB_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_files": 1}'
```

The sample import starts an asynchronous job that imports up to `max_files` files (maximum 10). The response includes a `job_id` that you can use to track progress:

```bash
curl "{% user("apiHost") %}/v0/jobs/{job_id}?token=$TB_TOKEN"
```

{% callout type="info" %}
The sample import runs as a separate job and doesn't affect production sync state or offsets.
{% /callout %}

## GCS file URI

Use GCS wildcards to match multiple files:

- `*` (single asterisk): Matches files at one directory level.
  - Example: `gs://bucket-name/*.ndjson` (matches all `.ndjson` files in the root directory, but not in subdirectories).
- `**` (double asterisk): Recursively matches files across multiple directory levels.
  - Example: `gs://bucket-name/**/*.ndjson` (matches all `.ndjson` files anywhere in the bucket).

{% callout type="caution" %}
GCS does not allow overlapping ingestion paths. For example, you cannot have:
- `gs://my_bucket/**/*.csv`
- `gs://my_bucket/transactions/*.csv`
{% /callout %}

## Supported file types

The GCS Connector supports the following formats:

{% table %}
* File Type | Accepted Extensions | Supported Compression
---
* CSV | `.csv`, `.csv.gz` | `gzip`
* NDJSON | `.ndjson`, `.ndjson.gz`, `.jsonl`, `.jsonl.gz` | `gzip`
* Parquet | `.parquet`, `.parquet.gz` | `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd`
{% /table %}

{% callout type="info" %}
JSON files must follow the **Newline Delimited JSON (NDJSON)** format. Each line must be a valid JSON object and must end with a `\n` character.
{% /callout %}

## Limitations

- **No `@auto` mode**: After the initial ingestion on first deployment, you must trigger subsequent ingestion manually.
- **File format support**: Only CSV, NDJSON, and Parquet are supported.
- **Permissions**: Ensure your service account has the correct role assigned.