---
title: Apache Iceberg table function
meta:
    description: Documentation for the Tinybird Iceberg table function.
---

# Iceberg table function

The Tinybird `iceberg()` table function allows you to read data from your existing Apache Iceberg database in S3 into Tinybird, then schedule a regular copy pipe to orchestrate synchronization. You can load full tables, and every run performs a full replace on the data source.

To use it, define a node using standard SQL and the `iceberg` function keyword, then publish the node as a copy pipe that does a sync on every run. See [Table functions](../table-functions) for general information and tips.

Additionally you can use the `iceberg` table function in API endpoints.

## Setting secrets

Table functions require authentication credentials that must be stored securely. In Tinybird Forward, manage those credentials with `tb secret`:

```shell
tb secret set AWS_ACCESS_KEY_ID <access_key>
tb secret set AWS_SECRET_ACCESS_KEY <secret_key>
```

Set the secret in each environment where the copy pipe runs. For example, use `tb --cloud secret set` for Cloud and `tb --branch=<branch_name> secret set` for a branch. Then reference the secrets in SQL with `tb_secret()`.

In Tinybird Classic, use the [Environment Variables API](/api-reference/environment-variables-api) to create the same secret values.

For more details, see [tb secret](/forward/dev-reference/commands/tb-secret).

## Syntax

Create a new pipe node. Call the `iceberg` table function and pass the AWS access key and secret as Tinybird secrets:

```sql {% title="Example query logic" %}
SELECT *
FROM iceberg(
  's3://your_bucket/iceberg/db/table',
  {{ tb_secret("AWS_ACCESS_KEY_ID") }},
  {{ tb_secret("AWS_SECRET_ACCESS_KEY") }}
)
```

Publish this node as a copy pipe. You can choose to append only new data or replace all data.

Check a full working example in this [GitHub repository](https://github.com/tinybirdco/iceberg-tinybird)

## Example: sync an Iceberg table from S3

The following example copies an Apache Iceberg table stored in S3 into a Tinybird Data Source every hour.

First, define the target Data Source. Then define the copy pipe that reads from Iceberg.

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```tb {% title="datasources/iceberg_orders.datasource" %}
SCHEMA >
    `order_id` UInt64,
    `customer_id` UInt64,
    `status` String,
    `amount` Float64,
    `updated_at` DateTime

ENGINE "ReplacingMergeTree(updated_at)"
ENGINE_SORTING_KEY "order_id"
```

```tb {% title="pipes/iceberg_orders_sync.pipe" %}
NODE iceberg_orders
SQL >
    %
    SELECT
        order_id,
        customer_id,
        status,
        amount,
        updated_at
    FROM iceberg(
        's3://my-lakehouse/warehouse/orders',
        {{ tb_secret("AWS_ACCESS_KEY_ID") }},
        {{ tb_secret("AWS_SECRET_ACCESS_KEY") }}
    )

TYPE copy
TARGET_DATASOURCE iceberg_orders
COPY_MODE replace
COPY_SCHEDULE 0 * * * *
```

{% /tab %}

{% tab label="TypeScript SDK" %}

```ts {% title="tinybird.ts" %}
import { defineCopyPipe, defineDatasource, engine, node, t } from "@tinybirdco/sdk";

export const icebergOrders = defineDatasource("iceberg_orders", {
  schema: {
    order_id: t.uint64(),
    customer_id: t.uint64(),
    status: t.string(),
    amount: t.float64(),
    updated_at: t.dateTime(),
  },
  engine: engine.replacingMergeTree({
    sortingKey: ["order_id"],
    ver: "updated_at",
  }),
  jsonPaths: false,
});

export const icebergOrdersSync = defineCopyPipe("iceberg_orders_sync", {
  datasource: icebergOrders,
  schedule: "0 * * * *",
  mode: "replace",
  nodes: [
    node({
      name: "iceberg_orders",
      sql: `
        SELECT
          order_id,
          customer_id,
          status,
          amount,
          updated_at
        FROM iceberg(
          's3://my-lakehouse/warehouse/orders',
          {{ tb_secret("AWS_ACCESS_KEY_ID") }},
          {{ tb_secret("AWS_SECRET_ACCESS_KEY") }}
        )
      `,
    }),
  ],
});
```

{% /tab %}

{% tab label="Python SDK" %}

```python {% title="tinybird.py" %}
from tinybird_sdk import define_copy_pipe, define_datasource, engine, node, t

iceberg_orders = define_datasource("iceberg_orders", {
    "schema": {
        "order_id": t.uint64(),
        "customer_id": t.uint64(),
        "status": t.string(),
        "amount": t.float64(),
        "updated_at": t.date_time(),
    },
    "engine": engine.replacing_merge_tree({
        "sorting_key": ["order_id"],
        "ver": "updated_at",
    }),
    "json_paths": False,
})

iceberg_orders_sync = define_copy_pipe("iceberg_orders_sync", {
    "datasource": iceberg_orders,
    "copy_schedule": "0 * * * *",
    "copy_mode": "replace",
    "nodes": [
        node({
            "name": "iceberg_orders",
            "sql": """
                SELECT
                    order_id,
                    customer_id,
                    status,
                    amount,
                    updated_at
                FROM iceberg(
                    's3://my-lakehouse/warehouse/orders',
                    {{ tb_secret("AWS_ACCESS_KEY_ID") }},
                    {{ tb_secret("AWS_SECRET_ACCESS_KEY") }}
                )
            """,
        }),
    ],
})
```

{% /tab %}
{% /tabs %}

Use `COPY_MODE replace` when the Iceberg table is the source of truth and the full table is small enough to refresh on the schedule. For larger tables, filter by an update timestamp and use `COPY_MODE append` with a deduplicating engine.

## See also

- [How to effectively use table functions](../table-functions)
{% - [Copy pipes](/forward/core-concepts/copy-pipes) /%}
{% - [Secrets](/forward/dev-reference/commands/tb-secret) /%}
