---
title: DynamoDB connector
description: Ingest data from an Amazon DynamoDB table into Tinybird with an initial backfill and Change Data Capture.
---

# DynamoDB connector

Stream data from an Amazon DynamoDB table into a Tinybird data source. Tinybird performs an initial backfill of the table via a PITR (Point-in-Time Recovery) by exporting to S3, then continuously ingests changes through DynamoDB Streams (Change Data Capture).

Use the DynamoDB connector when you want to mirror an operational DynamoDB table into Tinybird for analytics, while keeping it up to date in near real time.

## How it works

When you deploy a DynamoDB data source, Tinybird does two things:

1. **Initial export**: triggers an on-demand PITR export of your table to an S3 bucket you own, then loads that snapshot into the data source. AWS exports can take several minutes. The process will keep polling until AWS marks the export as `COMPLETED`.
2. **Change Data Capture (CDC)**: starts a worker on Tinybird's infrastructure that reads from DynamoDB Streams and appends inserts, updates, and deletes to the same data source. Each row in the Data Source represents a *change* to your table, not the current state. To keep its size under control DynamoDB Data Sources use the [`ReplacingMergeTree` engine](/sql-reference/engines/replacingmergetree). See [Query the data](#query-the-data) below for considerations.

## Requirements

Before you create the connection, make sure your DynamoDB table meets these requirements:

- **Point-in-Time Recovery (PITR)** is enabled on the table.
- **DynamoDB Streams** is enabled, with a stream view type of `NEW_IMAGE` or `NEW_AND_OLD_IMAGES`.
- The table should not be larger than **500 GB** and write no more than **250 WCU (Write Capacity Unit)** (≈ 250 KB/s of writes). If you need higher limits, contact [Tinybird support](/forward/support).

## AWS permissions

Tinybird ingests from DynamoDB by assuming an IAM role in *your* AWS account via `sts:AssumeRole` with an external ID. The role needs two policies: an **access policy** (what Tinybird may do) and a **trust policy** (who may assume it). You need to create both policies in AWS.

{% tabs variant="code" initial="AWS Access Policy" %}
{% tab label="AWS Access Policy" %}
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:Scan",
        "dynamodb:DescribeStream",
        "dynamodb:DescribeExport",
        "dynamodb:GetRecords",
        "dynamodb:GetShardIterator",
        "dynamodb:DescribeTable",
        "dynamodb:DescribeContinuousBackups",
        "dynamodb:ExportTableToPointInTime",
        "dynamodb:UpdateTable",
        "dynamodb:UpdateContinuousBackups"
      ],
      "Resource": [
        "arn:aws:dynamodb:us-east-1:123456789012:table/orders",
        "arn:aws:dynamodb:us-east-1:123456789012:table/orders/stream/*",
        "arn:aws:dynamodb:us-east-1:123456789012:table/orders/export/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject", "s3:GetObject", "s3:ListBucket"],
      "Resource": [
        "arn:aws:s3:::my-orders-exports",
        "arn:aws:s3:::my-orders-exports/*"
      ]
    }
  ]
}
```
{% /tab %}
{% tab label="AWS Trust Policy" %}
```json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::<TINYBIRD_CONNECTOR_ACCOUNT>:root" },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": { "sts:ExternalId": "<EXTERNAL_ID>" }
      }
    }
  ]
}
```
{% /tab %}
{% /tabs %}

The access policy grants read on the table, its stream, and its exports, plus read-write on the export bucket. Scope the resources to your table and bucket. `dynamodb:UpdateTable` and `dynamodb:UpdateContinuousBackups` let the connector enable PITR and Streams (`NEW_AND_OLD_IMAGES`) on the table if they aren't already on.

The trust policy must name Tinybird's connector account for your region and environment, and the Workspace's external ID. The account and external ID differ per region and environment. See [Set up the connector](#set-up-the-connector) section to see how to get `<TINYBIRD_CONNECTOR_ACCOUNT>` and `<EXTERNAL_ID>` values.

{% callout type="caution" %}
A `403 ... include the following external ID` error means the trust policy's external ID or `Principal` account doesn't match what Tinybird presents when it assumes the role. If you defined the connection in code without running `tb connection create dynamodb`, the trust policy is likely missing the Workspace-specific external ID entirely. If the external ID is already there, the `Principal` account is wrong for this environment: Tinybird assumes the role from a different account per region and environment. Either way, the `tb connection create dynamodb` output is the source of truth.
{% /callout %}

{% callout type="info" %}
One role can serve many Workspaces. `sts:ExternalId` accepts a list, so you can add more Workspaces' external IDs to the same role: `"sts:ExternalId": ["<workspace-a-id>", "<workspace-b-id>"]`.
{% /callout %}

## Environment considerations

The DynamoDB connector behaves differently across the Cloud, Branch, and Local environments. PITR exports and stream reads run in *your* AWS account, but the AWS account that assumes your IAM role changes from one environment to the next — so the trust policy you write depends on where the connector runs.

### Cloud environment

In Tinybird Cloud, Tinybird uses its own AWS account to assume the IAM role you create. When you deploy to your main Cloud Workspace, use `tb deploy` as usual.

### Branch environment

When you test a data source using the DynamoDB connector in a Cloud Branch, include `--with-connections` so Tinybird sets up the DynamoDB connections in the branch:

```bash
tb build --with-connections
```

A cloud branch reuses the same connection (and therefore the same IAM role) as the parent Workspace, so no extra AWS setup is needed. To avoid duplicate exports and CDC workers competing over the same DynamoDB stream, point branch Data Sources at a separate test table.

PITR exports and CDC run on Cloud branches and in Local, not just main. A fresh PITR export is triggered whenever the connection file becomes active in a new context:

- Declaring the connection in a branch triggers an export in that branch.
- Checking out another branch triggers another export for that branch.
- Moving a connection declared in a branch up to main triggers an export in main.

### Local environment

Tinybird Local runs in a container. Because PITR exports run in your AWS account, Tinybird Local needs your local AWS credentials to assume the role:

```bash
tb local restart --use-aws-creds
```

The trust policy differs per environment: Cloud is assumed by Tinybird's AWS account, while Local is assumed by the AWS account of the credentials you pass with `--use-aws-creds`. When you create the connection, choose **Local**, **Cloud**, or **Both** so the generated trust policy lists the right account IDs. If local credentials aren't available, the CLI warns you and continues Cloud-only — the connection stays valid for `tb --cloud deploy`, but `tb build` and `tb deploy` against Local skip the DynamoDB resource.

## Set up the connector

The Tinybird CLI includes a wizard that walks you through the whole flow: creating the IAM role, generating the `.connection` and `.datasource` files, and validating the table.

```bash
tb connection create dynamodb
```

{% callout type="tip" %}
Working in the TypeScript or Python SDK? Run the Tinybird CLI wizard anyway to handle the IAM role and external ID, then convert the generated `.connection` and `.datasource` files to their SDK equivalents (see the [TypeScript SDK](#3-define-the-connection-file) and [Python SDK](#3-define-the-connection-file) tabs below). The IAM role and secret carry over unchanged.
{% /callout %}

You'll be asked for:

1. A name for the connection.
2. The DynamoDB **table name** and **export bucket name** (used to scope the IAM policy — use `*` for unrestricted).
3. The AWS **region** of your table.
4. Which environments will use the connection: **Local**, **Cloud**, or **Both**. Tinybird builds a trust policy containing the AWS account IDs of the selected environments.

The wizard prints a managed IAM **access policy** and **trust policy** with the correct values for you to paste into AWS. After you create the role, paste its ARN back into the CLI. Tinybird then validates the table and writes the connection file.

Finally, the wizard asks for:

- The **DynamoDB table ARN** (e.g. `arn:aws:dynamodb:us-east-1:123456789012:table/my-table`).
- The **S3 export bucket** (just the bucket name, no `s3://` prefix).

It generates `connections/<name>.connection` and `datasources/<name>.datasource`, ready to deploy. The generated `.datasource` file includes the table's partition key (pk) and sort key (sk) as typed columns, extracted from the change record with `json:` paths and set as the engine sorting key, so you can query and filter on them without writing `JSONExtract*` expressions yourself.

Build the project locally or on a Tinybird Cloud Branch to validate the generated datafiles. Include `--with-connections` flag so the DynamoDB connections are set up:

```bash
tb build --with-connections
```

When the build succeeds, deploy to Tinybird Cloud:

```bash
tb --cloud deploy
```

### Manual setup

To write the `.connection` and `.datasource` files manually instead of using the wizard, follow these steps.

#### 1. Create the IAM role

Create the IAM role Tinybird assumes to read your table, its stream, and the S3 export bucket. The role needs an **access policy** and a **trust policy** — see [AWS permissions](#aws-permissions) for both policy documents, the per-environment placeholders, and the AWS IAM console steps.

The trust policy's `Principal` account and `ExternalId` differ per region and environment.

To create the role in the AWS IAM console:

1. Go to **Policies** → **Create policy**, paste the access policy JSON from above, and name it (for example, `tinybird-dynamodb-orders`).
2. Go to **Roles** → **Create role** → **Custom trust policy**, and paste the trust policy JSON from above.
3. Attach the access policy from step 1, then name the role (for example, `TinybirdRole-dynamo`).
4. Copy the role ARN and paste it back into the wizard, or store it as a secret (see [Add the role ARN as a secret](#2-add-the-role-arn-as-a-secret)).

Since the `<TINYBIRD_CONNECTOR_ACCOUNT>` and `<EXTERNAL_ID>` values vary per environment, use the Tinybird CLI wizard `tb connection create dynamodb` to get them.

#### 2. Add the role ARN as a secret

Store the role ARN as a Tinybird secret so it isn't checked into your repo. When you create the secret manually, its name **must** follow the format `dynamodb_role_arn_<connection_name>`, where `<connection_name>` matches the name of your `.connection` file — Tinybird looks up the secret by this exact name:

```bash
tb secret set dynamodb_role_arn_<connection_name> "arn:aws:iam::123456789012:role/tb-my-dynamodb-role"
```

The wizard does this automatically in Local and Cloud when it creates the connection.

#### 3. Define the `.connection` file

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```tinybird {% title="connections/my_ddb.connection" %}
TYPE dynamodb
DYNAMODB_ARN {{ tb_secret("dynamodb_role_arn_my_ddb") }}
DYNAMODB_REGION us-east-1
```

{% /tab %}
{% tab label="TypeScript SDK" %}
```typescript {% title="connections.ts" %}
import { defineDynamoDBConnection, secret } from "@tinybirdco/sdk";

export const ordersDynamo = defineDynamoDBConnection("my_ddb", {
  region: "us-east-1",
  arn: secret("dynamodb_role_arn_my_ddb"),
});
```

{% /tab %}
{% tab label="Python SDK" %}
```python {% title="connections.py" %}
from tinybird_sdk import define_dynamodb_connection, secret

orders_dynamo = define_dynamodb_connection("my_ddb", {
    "region": "us-east-1",
    "arn": secret("dynamodb_role_arn_my_ddb"),
})
```

{% /tab %}
{% /tabs %}

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `TYPE`
   * Yes
   * Must be `dynamodb`.
   ---
   * `DYNAMODB_ARN`
   * Yes
   * The IAM role ARN. Reference via `tb_secret(...)` so it stays out of git.
   ---
   * `DYNAMODB_REGION`
   * Yes
   * The AWS region the DynamoDB table lives in. Must match the region in `IMPORT_TABLE_ARN`. See [AWS service endpoints](https://docs.aws.amazon.com/general/latest/gr/rande.html) for valid region codes.
{% /table %}

#### 4. Define the `.datasource` file

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```tinybird {% title="datasources/orders.datasource" %}
SCHEMA >
    `<partition_key>` String `json:$.Item.<partition_key>`,
    `<sort_key>` String `json:$.Item.<sort_key>`,
    `_record` String `json:$.NewImage`,
    `_old_record` Nullable(String) `json:$.OldImage`,
    `_timestamp` DateTime64(3) `json:$.ApproximateCreationDateTime`,
    `_event_name` LowCardinality(String) `json:$.eventName`,
    `_is_deleted` UInt8 `json:$._is_deleted`

ENGINE "ReplacingMergeTree"
ENGINE_SORTING_KEY <partition_key>, <sort_key>
ENGINE_VER _timestamp
ENGINE_IS_DELETED _is_deleted

IMPORT_CONNECTION_NAME 'my_ddb'
IMPORT_TABLE_ARN 'arn:aws:dynamodb:us-east-1:123456789012:table/orders'
IMPORT_EXPORT_BUCKET 'my-orders-exports'
```

{% /tab %}
{% tab label="TypeScript SDK" %}
```typescript {% title="datasources.ts" %}
import { defineDatasource, t, engine } from "@tinybirdco/sdk";
import { ordersDynamo } from "./connections";

export const orders = defineDatasource("orders", {
  schema: {
    pk: t.string().jsonPath("$.Item.pk"),
    sk: t.string().jsonPath("$.Item.sk"),
    _record: t.string().jsonPath("$.NewImage"),
    _old_record: t.string().nullable().jsonPath("$.OldImage"),
    _timestamp: t.dateTime64(3).jsonPath("$.ApproximateCreationDateTime"),
    _event_name: t.string().lowCardinality().jsonPath("$.eventName"),
    _is_deleted: t.uint8().jsonPath("$._is_deleted"),
  },
  engine: engine.replacingMergeTree({
    sortingKey: ["pk", "sk"],
    ver: "_timestamp",
    isDeleted: "_is_deleted",
  }),
  dynamodb: {
    connection: ordersDynamo,
    tableArn: "arn:aws:dynamodb:us-east-1:123456789012:table/orders",
    exportBucket: "my-orders-exports",
  },
});
```

{% /tab %}
{% tab label="Python SDK" %}
```python {% title="datasources.py" %}
from tinybird_sdk import column, define_datasource, engine, t
from connections import orders_dynamo

orders = define_datasource("orders", {
    "schema": {
        "pk": column(t.string(), {"json_path": "$.Item.pk"}),
        "sk": column(t.string(), {"json_path": "$.Item.sk"}),
        "_record": column(t.string(), {"json_path": "$.NewImage"}),
        "_old_record": column(t.string().nullable(), {"json_path": "$.OldImage"}),
        "_timestamp": column(t.date_time64(3), {"json_path": "$.ApproximateCreationDateTime"}),
        "_event_name": column(t.string().low_cardinality(), {"json_path": "$.eventName"}),
        "_is_deleted": column(t.uint8(), {"json_path": "$._is_deleted"}),
    },
    "engine": engine.replacing_merge_tree({
        "sorting_key": ["pk", "sk"],
        "ver": "_timestamp",
        "is_deleted": "_is_deleted",
    }),
    "dynamodb": {
        "connection": orders_dynamo,
        "table_arn": "arn:aws:dynamodb:us-east-1:123456789012:table/orders",
        "export_bucket": "my-orders-exports",
    },
})
```

{% /tab %}
{% /tabs %}

The first columns are the table's partition key (pk) and sort key (sk), named after your table's key attributes. `tb connection create dynamodb` adds them automatically, pulls them from the item with `json:` paths, and uses them as the `ENGINE_SORTING_KEY`.

DynamoDB data sources **must** use the `ReplacingMergeTree` engine. Other engines are rejected at build time.

{% table %}
   * Instruction
   * Required
   * Description
   ---
   * `IMPORT_CONNECTION_NAME`
   * Yes
   * Name of the `.connection` file (without the extension).
   ---
   * `IMPORT_TABLE_ARN`
   * Yes
   * Full ARN of the DynamoDB table to mirror. Must start with `arn:aws:dynamodb:`.
   ---
   * `IMPORT_EXPORT_BUCKET`
   * Yes
   * Name of the S3 bucket where PITR exports will be written. Bucket name only — no `s3://` prefix.
{% /table %}

##### Schema columns

Alongside the key columns described above, every DynamoDB data source has these system columns, each populated from the change record with a `json:` path:

{% table %}
   * Column
   * Type
   * `json:` path
   * Description
   ---
   * `_record`
   * `String`
   * `$.NewImage`
   * JSON-encoded current item image after the change.
   ---
   * `_old_record`
   * `Nullable(String)`
   * `$.OldImage`
   * JSON-encoded previous item image. Only present when the stream view type is `NEW_AND_OLD_IMAGES`.
   ---
   * `_timestamp`
   * `DateTime64(3)`
   * `$.ApproximateCreationDateTime`
   * Approximate time the change happened in DynamoDB. Used as the `ReplacingMergeTree` version column.
   ---
   * `_event_name`
   * `LowCardinality(String)`
   * `$.eventName`
   * `INSERT`, `MODIFY`, `REMOVE`, or `EXPORT` for initial backfill rows.
   ---
   * `_is_deleted`
   * `UInt8`
   * `$._is_deleted`
   * `1` for deletes, `0` otherwise. Drives `ReplacingMergeTree`'s deleted-row semantics.
{% /table %}

To extract any other typed columns from your items, query `_record` with `JSONExtract*` functions in a pipe rather than adding more columns to the data source. The connector maps columns from the change-record envelope (`$.NewImage`, `$.eventName`, and so on), not from the attributes inside your item, so item fields aren't available as top-level columns. Keeping the full item in `_record` also means the mirror keeps working when DynamoDB attributes are added, renamed, or retyped, since there's no fixed item schema to migrate. If a field is read often and you want it as a typed, pre-computed column, extract it in a downstream materialized view instead.

## Query the data

Because the data source captures *every* change, querying it directly will return multiple rows per item. Use `FINAL` (or rely on the underlying ReplacingMergeTree merges) to get the current state:

```sql
SELECT
    JSONExtractString(_record, 'id')        AS id,
    JSONExtractString(_record, 'status')    AS status,
    JSONExtractFloat (_record, 'amount')    AS amount
FROM orders FINAL
```

## Deploying

Deploy to Tinybird Cloud:

{% tabs initial="Tinybird CLI" %}
{% tab label="Tinybird CLI" %}

```bash
tb --cloud deploy
```

{% /tab %}
{% tab label="TypeScript SDK" %}

```bash
npx tinybird deploy
```

{% /tab %}
{% tab label="Python SDK" %}

```bash
uv run tinybird deploy
```

{% /tab %}
{% /tabs %}

On deploy, Tinybird:

1. Validates the table (PITR enabled, streams enabled with a supported view type, within size and WCU limits).
2. Triggers the PITR export to your S3 bucket.
3. Streams the export into the data source.
4. Starts the CDC worker.

You'll see a message like:

```text
△ DynamoDB initial export backfill started for datasource 'orders'.
  AWS exports can stay in progress for several minutes; Tinybird Local will keep
  retrying the import until AWS marks the export as completed.
  Export ARN: arn:aws:dynamodb:us-east-1:123456789012:export/...
```

## Validation errors

`tb deploy` runs the same validation as `tb connection create dynamodb`. Common errors:

{% table %}
   * Error
   * What to do
   ---
   * `The DynamoDB table was not found.`
   * Check the table ARN and that the region in the `.connection` file matches the ARN's region.
   ---
   * `Point-in-Time Recovery (PITR) must be enabled.`
   * Enable PITR on the table in the DynamoDB console.
   ---
   * `DynamoDB Streams must be enabled.`
   * Enable streams on the table.
   ---
   * `DynamoDB Streams must use NEW_IMAGE or NEW_AND_OLD_IMAGES.`
   * Change the stream view type — `KEYS_ONLY` and `OLD_IMAGE` are not supported.
   ---
   * `The DynamoDB table exceeds the current size limit.`
   * The table is over 500 GB. Contact support to raise the limit.
   ---
   * `The DynamoDB table exceeds the current write-capacity limit.`
   * The table writes more than 250 WCU. Contact support to raise the limit.
{% /table %}

## Limitations

- One CDC worker per data source. Throughput is bounded by ~250 WCU.
- Stream records have a 24-hour retention in DynamoDB. If CDC is paused for more than 24 hours (for example, a broken IAM role), some changes will be missed and you'll need to re-backfill.
- CDC delivery is **at-least-once** — duplicate change events can appear in recovery scenarios. `ReplacingMergeTree` with `_timestamp` as the version column collapses them on read with `FINAL`.
