---
title: GCS Connector
meta:
   description: Learn how to configure the GCS connector for Tinybird.
---

# GCS connector

You can set up a GCS connector to load your CSV, NDJSON, or Parquet files into Tinybird from any GCS bucket. Tinybird does **not** automatically detect new files; ingestion must be triggered manually.

Setting up the GCS connector requires:

1. Configuring a [Service Account](https://cloud.google.com/iam/docs/service-accounts-create) with these [permissions](#gcs-permissions) in GCP.
2. Creating a connection file in Tinybird.
3. Creating a data source that uses this connection.

## Set up the connector

{% steps %}

### Create a GCS connection

You can create a GCS connection in Tinybird using either the CLI or by manually creating a connection file.

#### Option 1: Use the CLI (recommended)

Run the following command to create a connection:

```bash
tb connection create gcs
```

You will be prompted to enter:

1. A name for your connection.
2. The GCS bucket name.
3. The service account credentials (JSON key file). You can check [Google Cloud docs](https://cloud.google.com/iam/docs/keys-create-delete) for mode details.
4. Whether to create the connection for your Cloud environment.

#### Option 2: Manually create a connection file

Create a `.connection` file with the required credentials:

```tb {% title="gcs_sample.connection" %}
TYPE gcs
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("GCS_KEY") }}
```

Ensure your GCP Service Account has the `roles/storage.objectViewer` role.

{% callout type="caution" %}
Use different Service Account keys for each environment leveraging [Tinybird Secrets](/forward/dev-reference/commands/tb-secret).
{% /callout %}

### Create a GCS data source

After setting up the connection, create a data source.

Create a [.datasource](/forward/dev-reference/datafiles/datasource-files) file using `tb datasource create --gcs` or manually:

```tb {% title="gcs_sample.datasource" %}
DESCRIPTION >
    Analytics events landing data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

IMPORT_CONNECTION_NAME gcs_sample
IMPORT_BUCKET_URI gs://my-bucket/*.csv
IMPORT_SCHEDULE '@on-demand'

```

The `IMPORT_CONNECTION_NAME` setting must match the name of your `.connection` file.

### Sync data

Since **automatic ingestion (`@auto` mode) is not supported**, you must manually sync data when new files are available.

#### Using the API

```sh
curl -X POST "https://api.tinybird.co/v0/datasources/<datasource_name>/scheduling/runs" \
  -H "Authorization: Bearer <your-tinybird-token>"
```

#### Using the CLI

```sh
tb datasource sync <datasource_name>
```

## .connection settings

The GCS connector use the following settings in .connection files:

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON`
   * Yes
   * Service Account Key in JSON format, inlined. We recommend using [Tinybird Secrets](/forward/dev-reference/commands/tb-secret).
{% /table %}

{% callout type="warning" %}
Once a connection is used in a data source, you can't change the Service Account Key. To modify it, you must:

1. Remove the connection from the data source.
2. Deploy the changes.
3. Add the connection again with the new values.
{% /callout %}

{% /steps %}

## .datasource settings

The GCS connector uses the following settings in .datasource files:

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `IMPORT_CONNECTION_NAME`
   * Yes
   * Name given to the connection inside Tinybird. For example, `'my_connection'`. This is the name of the connection file you created in the previous step.
   ---
   * `IMPORT_BUCKET_URI`
   * Yes
   * Full bucket path, including the `gs://` protocol, bucket name, object path, and an optional pattern to match against object keys. For example, `gs://my-bucket/my-path` discovers all files in the bucket `my-bucket` under the prefix `/my-path`. You can use patterns in the path to filter objects, for example, ending the path with `*.csv` matches all objects that end with the `.csv` suffix.
   ---
   * `IMPORT_SCHEDULE`
   * Yes
   * Use `@on-demand` to sync new files as needed, only files added to the bucket since the last execution will be appended to the datasource. You can also use `@once`, which behaves the same as `@on-demand`. However, `@auto` mode is not supported yet; if you use this option, only the initial sync will be executed.

   ---
   * `IMPORT_FROM_TIMESTAMP`
   * No
   * Sets the date and time from which to start ingesting files on an GCS bucket. The format is `YYYY-MM-DDTHH:MM:SSZ`.
{% /table %}

{% callout type="warning" %}
We don't support changing these settings after the data source is created. If you need to do that, you must:

1. Remove the connection from the data source.
2. Deploy the changes.
3. Add the connection again with the new values.
4. Deploy again.
{% /callout %}

## Import sample data

In branches and Tinybird Local, you can import a sample of files from GCS using the API. This is useful for validating schemas and testing pipelines without syncing all files from the bucket.

```bash
curl -X POST "https://api.tinybird.co/v0/datasources/my_datasource/sample" \
  -H "Authorization: Bearer $TB_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"max_files": 1}'
```

The sample import starts an asynchronous job that imports up to `max_files` files (maximum 10). The response includes a `job_id` that you can use to track progress:

```bash
curl "https://api.tinybird.co/v0/jobs/{job_id}?token=$TB_TOKEN"
```

{% callout type="info" %}
The sample import runs as a separate job and doesn't affect production sync state or offsets.
{% /callout %}

## GCS file URI

Use GCS wildcards to match multiple files:

- `*` (single asterisk): Matches files at one directory level.
  - Example: `gs://bucket-name/*.ndjson` (matches all `.ndjson` files in the root directory, but not in subdirectories).
- `**` (double asterisk): Recursively matches files across multiple directory levels.
  - Example: `gs://bucket-name/**/*.ndjson` (matches all `.ndjson` files anywhere in the bucket).

{% callout type="caution" %}
GCS does not allow overlapping ingestion paths. For example, you cannot have:
- `gs://my_bucket/**/*.csv`
- `gs://my_bucket/transactions/*.csv`
{% /callout %}

## Supported file types

The GCS Connector supports the following formats:

{% table %}
* File Type | Accepted Extensions | Supported Compression
---
* CSV | `.csv`, `.csv.gz` | `gzip`
* NDJSON | `.ndjson`, `.ndjson.gz`, `.jsonl`, `.jsonl.gz` | `gzip`
* Parquet | `.parquet`, `.parquet.gz` | `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd`
{% /table %}

{% callout type="info" %}
JSON files must follow the **Newline Delimited JSON (NDJSON)** format. Each line must be a valid JSON object and must end with a `\n` character.
{% /callout %}

## GCS Permissions

To authenticate Tinybird with GCS, you need a GCP service account key in JSON format with the **Object Storage Viewer** role.

1. In the Google Cloud Console, create or use an existing service account.
2. Assign the `roles/storage.objectViewer` role.
3. Generate a JSON key file and download it.
4. Store the key as a Tinybird secret in a `.env.local` file to work in local:

```bash
GCS_KEY='<your-json-key-content>'
```

5. Store the key in Cloud as a Tinybird secret:

```bash
tb --cloud secret set GCS_KEY '<your-json-key-content>'
```

## Limitations

- **No `@auto` mode**: Ingestion must be triggered manually.
- **File format support**: Only CSV, NDJSON, and Parquet are supported.
- **Permissions**: Ensure your service account has the correct role assigned.
