GCS connector¶
You can set up a GCS connector to load your CSV, NDJSON, or Parquet files into Tinybird from any GCS bucket. Tinybird does not automatically detect new files; ingestion must be triggered manually.
Setting up the GCS connector requires:
- Configuring a Service Account with these permissions in GCP.
- Creating a connection file in Tinybird.
- Creating a data source that uses this connection.
Set up the connector¶
Create a GCS connection¶
You can create a GCS connection in Tinybird using either the CLI or by manually creating a connection file.
Option 1: Use the CLI (recommended)¶
Run the following command to create a connection:
tb connection create gcs
You will be prompted to enter:
- A name for your connection.
- The GCS bucket name.
- The service account credentials (JSON key file). You can check Google Cloud docs for mode details.
- Whether to create the connection for your Cloud environment.
Option 2: Manually create a connection file¶
Create a .connection file with the required credentials:
gcs_sample.connection
TYPE gcs
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON {{ tb_secret("GCS_KEY") }}
Ensure your GCP Service Account has the roles/storage.objectViewer role.
Use different Service Account keys for each environment leveraging Tinybird Secrets.
Create a GCS data source¶
After setting up the connection, create a data source.
Create a .datasource file using tb datasource create --gcs or manually:
gcs_sample.datasource
DESCRIPTION >
Analytics events landing data source
SCHEMA >
`timestamp` DateTime `json:$.timestamp`,
`session_id` String `json:$.session_id`,
`action` LowCardinality(String) `json:$.action`,
`version` LowCardinality(String) `json:$.version`,
`payload` String `json:$.payload`
ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"
IMPORT_CONNECTION_NAME gcs_sample
IMPORT_BUCKET_URI gs://my-bucket/*.csv
IMPORT_SCHEDULE '@on-demand'
The IMPORT_CONNECTION_NAME setting must match the name of your .connection file.
Sync data¶
Since automatic ingestion (@auto mode) is not supported, you must manually sync data when new files are available.
Using the API¶
curl -X POST "https://api.tinybird.co/v0/datasources/<datasource_name>/scheduling/runs" \ -H "Authorization: Bearer <your-tinybird-token>"
Using the CLI¶
tb datasource sync <datasource_name>
.connection settings¶
The GCS connector use the following settings in .connection files:
| Instruction | Required | Description |
|---|---|---|
GCS_SERVICE_ACCOUNT_CREDENTIALS_JSON | Yes | Service Account Key in JSON format, inlined. We recommend using Tinybird Secrets. |
Once a connection is used in a data source, you can't change the Service Account Key. To modify it, you must:
- Remove the connection from the data source.
- Deploy the changes.
- Add the connection again with the new values.
.datasource settings¶
The GCS connector uses the following settings in .datasource files:
| Instruction | Required | Description |
|---|---|---|
IMPORT_CONNECTION_NAME | Yes | Name given to the connection inside Tinybird. For example, 'my_connection'. This is the name of the connection file you created in the previous step. |
IMPORT_BUCKET_URI | Yes | Full bucket path, including the gs:// protocol, bucket name, object path, and an optional pattern to match against object keys. For example, gs://my-bucket/my-path discovers all files in the bucket my-bucket under the prefix /my-path. You can use patterns in the path to filter objects, for example, ending the path with *.csv matches all objects that end with the .csv suffix. |
IMPORT_SCHEDULE | Yes | Use @on-demand to sync new files as needed, only files added to the bucket since the last execution will be appended to the datasource. You can also use @once, which behaves the same as @on-demand. However, @auto mode is not supported yet; if you use this option, only the initial sync will be executed. |
IMPORT_FROM_TIMESTAMP | No | Sets the date and time from which to start ingesting files on an GCS bucket. The format is YYYY-MM-DDTHH:MM:SSZ. |
We don't support changing these settings after the data source is created. If you need to do that, you must:
- Remove the connection from the data source.
- Deploy the changes.
- Add the connection again with the new values.
- Deploy again.
GCS file URI¶
Use GCS wildcards to match multiple files:
*(single asterisk): Matches files at one directory level.- Example:
gs://bucket-name/*.ndjson(matches all.ndjsonfiles in the root directory, but not in subdirectories).
- Example:
**(double asterisk): Recursively matches files across multiple directory levels.- Example:
gs://bucket-name/**/*.ndjson(matches all.ndjsonfiles anywhere in the bucket).
- Example:
GCS does not allow overlapping ingestion paths. For example, you cannot have:
gs://my_bucket/**/*.csvgs://my_bucket/transactions/*.csv
Supported file types¶
The GCS Connector supports the following formats:
| File Type | Accepted Extensions | Supported Compression | ||
|---|---|---|
CSV | .csv, .csv.gz | gzip | NDJSON | .ndjson, .ndjson.gz, .jsonl, .jsonl.gz | gzip | Parquet | .parquet, .parquet.gz | snappy, gzip, lzo, brotli, lz4, zstd |
JSON files must follow the Newline Delimited JSON (NDJSON) format. Each line must be a valid JSON object and must end with a \n character.
GCS Permissions¶
To authenticate Tinybird with GCS, you need a GCP service account key in JSON format with the Object Storage Viewer role.
- In the Google Cloud Console, create or use an existing service account.
- Assign the
roles/storage.objectViewerrole. - Generate a JSON key file and download it.
- Store the key as a Tinybird secret in a
.env.localfile to work in local:
GCS_KEY='<your-json-key-content>'
- Store the key in Cloud as a Tinybird secret:
tb --cloud secret set GCS_KEY '<your-json-key-content>'
Limitations¶
- No
@automode: Ingestion must be triggered manually. - File format support: Only CSV, NDJSON, and Parquet are supported.
- Permissions: Ensure your service account has the correct role assigned.