---
title: Ingest from BigQuery using Google Cloud Storage
meta:
  description: Learn how to send data from BigQuery to Tinybird using Google Cloud Storage.
---

# Ingest data from BigQuery using Google Cloud Storage

Read on to learn how to send data from BigQuery to Tinybird, for example when you need to periodically run full replaces of a table or do a one-off ingest.

This process relies on [BigQuery's exporting capabilities](https://cloud.google.com/bigquery/docs/exporting-data#sql), or bulk exporting data as CSV, NDJSON, or Parquet files and then ingesting them using the [GCS Connector](/forward/get-data-in/connectors/gcs).

## Prerequisites

You need a Tinybird account and access to BigQuery and Google Cloud Storage.

{% steps %}

## Export the BigQuery table

The first step consists in exporting the BigQuery table, or query result set, to a GCS bucket.

### 1. Grant the required permissions in Google Cloud Console

Make sure the GCS bucket allows BigQuery to write files. The service account used by BigQuery needs the `roles/storage.objectCreator` role on the bucket. You can grant this permission in the Google Cloud Console:

1. Navigate to your GCS bucket in the Cloud Console
2. Go to the **Permissions** tab
3. Click **Grant Access**
4. Add the BigQuery service account (format: `service-<project-number>@gcp-sa-bigquery.iam.gserviceaccount.com`) with the **Storage Object Creator** role

### 2. Export the data

Run the following SQL statement to export the data:

```sql
EXPORT DATA OPTIONS(
  uri = 'gs://your-bucket-name/path/orders/*.csv',
  format = 'CSV',
  compression = 'GZIP',
  overwrite = true,
  header = true,
  field_delimiter = ','
) AS (
  SELECT
    order_id,
    customer_id,
    order_status,
    total_price,
    order_date,
    order_priority,
    clerk
  FROM `your-project.your_dataset.orders`
  ORDER BY order_date
);
```

Replace `your-bucket-name`, `path`, and the table reference with your actual values.

{% callout type="warning" %}
**CSV format limitations**: CSV does not support nested or repeated fields (arrays, structs). If your BigQuery table contains arrays or nested data types, you must use JSON or Parquet format instead. Attempting to export arrays or nested structures to CSV will result in an error.
{% /callout %}

### 3. Export arrays and nested data

For tables with arrays or nested structures, use JSON format:

```sql
EXPORT DATA OPTIONS(
  uri = 'gs://your-bucket-name/path/orders/*.json',
  format = 'JSON',
  compression = 'GZIP',
  overwrite = true
) AS (
  SELECT
    order_id,
    customer_id,
    items,  -- Array field
    shipping_address,  -- Nested struct
    order_date
  FROM `your-project.your_dataset.orders`
  ORDER BY order_date
);
```

{% callout type="info" %}
**JSON format considerations**: When exporting to JSON format, BigQuery encodes `INT64` values as JSON strings to maintain precision. You may need to handle this in your Tinybird schema by parsing these strings as integers.
{% /callout %}

### 4. Export using Parquet format

Parquet format also supports nested and repeated data, and can be more efficient for large datasets:

```sql
EXPORT DATA OPTIONS(
  uri = 'gs://your-bucket-name/path/orders/*.parquet',
  format = 'PARQUET',
  compression = 'GZIP',
  overwrite = true
) AS (
  SELECT
    order_id,
    customer_id,
    items,
    shipping_address,
    order_date
  FROM `your-project.your_dataset.orders`
  ORDER BY order_date
);
```

### 5. Automate the export

To automate the export, you can create a BigQuery Scheduled Query:

1. In the BigQuery Console, click **Scheduled queries**
2. Click **Create scheduled query**
3. Enter your `EXPORT DATA` statement
4. Set the schedule (e.g., daily, hourly)
5. Configure the destination dataset and table if needed

Follow the [Google Cloud documentation for Scheduled Queries](https://cloud.google.com/bigquery/docs/scheduling-queries) for more details.

## Ingest data into Tinybird

Before ingesting your BigQuery data from the GCS bucket, you need to create the GCS connection. For more details and advanced use cases, see the [GCS Connector](/forward/get-data-in/connectors/gcs) documentation.

### Supported file formats

The GCS Connector supports the following formats exported from BigQuery:

| File Type | Accepted Extensions | Supported Compression | Notes |
|-----------|---------------------|----------------------|-------|
| CSV | `.csv`, `.csv.gz` | `gzip` | Does not support arrays or nested data. Use for simple, flat schemas. |
| NDJSON | `.ndjson`, `.ndjson.gz`, `.jsonl`, `.jsonl.gz` | `gzip` | Supports arrays and nested data. Each line must be a valid JSON object. |
| Parquet | `.parquet`, `.parquet.gz` | `snappy`, `gzip`, `lzo`, `brotli`, `lz4`, `zstd` | Supports arrays and nested data. More efficient for large datasets. |

{% callout type="info" %}
**NDJSON format**: JSON files must follow the **Newline Delimited JSON (NDJSON)** format. Each line must be a valid JSON object and must end with a `\n` character. BigQuery's `EXPORT DATA` with `format='JSON'` produces NDJSON format.
{% /callout %}

### Create the data source

You can use the following schema for the data source:

```tb
SCHEMA >
    `order_id` Int64,
    `customer_id` Int64,
    `order_status` String,
    `total_price` Float32,
    `order_date` DateTime,
    `order_priority` String,
    `clerk` String

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(order_date)"
ENGINE_SORTING_KEY "order_date, order_priority"

IMPORT_CONNECTION_NAME 'tb-gcs'
IMPORT_BUCKET_URI 'gs://your-bucket-name/path/orders/*.csv.gz'
IMPORT_SCHEDULE '@on-demand'
```

For JSON exports with arrays, you'll need to handle the nested structure:

```tb
SCHEMA >
    `order_id` Int64 `json:$.order_id`,
    `customer_id` Int64 `json:$.customer_id`,
    `items` Array(String) `json:$.items[:]`,
    `shipping_address` String `json:$.shipping_address`,
    `order_date` DateTime `json:$.order_date`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(order_date)"
ENGINE_SORTING_KEY "order_date"

IMPORT_CONNECTION_NAME 'tb-gcs'
IMPORT_BUCKET_URI 'gs://your-bucket-name/path/orders/*.json.gz'
IMPORT_SCHEDULE '@on-demand'
```

{% callout type="caution" %}
**Handling BigQuery arrays and nested data**: When exporting arrays or nested structures from BigQuery to JSON, they are stored as JSON strings in Tinybird. You can parse them in your Pipes using JSON functions like `JSONExtractArrayRaw()` or `JSONExtractString()` to work with the nested data.
{% /callout %}

Deploy the data source. Since the GCS Connector uses `@on-demand` mode, you'll need to manually sync data when new files are available.

### Sync data

After BigQuery exports new files to GCS, trigger a sync in Tinybird:

#### Using the API

```sh
curl -X POST "https://api.tinybird.co/v0/datasources/<datasource_name>/scheduling/runs" \
  -H "Authorization: Bearer <your-tinybird-token>"
```

#### Using the CLI

```sh
tb datasource sync <datasource_name>
```

{% callout type="caution" %}
Each file is appended to the Tinybird data source. As records might be duplicated if BigQuery re-exports the same data, consider using a materialized view to consolidate a stateful set of your BigQuery table.
{% /callout %}

{% /steps %}

## Data type considerations

When exporting from BigQuery to Tinybird, be aware of these data type quirks:

{% callout type="warning" %}
**INT64 in JSON**: BigQuery exports `INT64` values as JSON strings in JSON format to preserve precision. In your Tinybird schema, you may need to parse these strings. For example, if BigQuery exports `"1234567890123456789"` as a string, use `toInt64(column_name)` in your Pipes.
{% /callout %}

## Limits

Because you're using the GCS Connector, its [limits](/forward/get-data-in/connectors/gcs#limitations) apply.

The GCS Connector has its own limitations. To stay within your Tinybird plan's limits, you might need to [limit the size of the exported files](https://docs.cloud.google.com/bigquery/docs/exporting-data#limit_the_exported_file_size). BigQuery automatically splits large exports into multiple files of up to 1 GB when you use a wildcard in the URI (e.g., `gs://bucket/path/*.csv`).

## Next steps

See the following resources:

- [GCS Connector](/forward/get-data-in/connectors/gcs)
- [Ingest from Snowflake using AWS S3](./ingest-from-snowflake-using-aws-s3)
- [Ingest from Snowflake using Azure Blob Storage](./ingest-from-snowflake-using-azure-blob-storage)
