Connect Amazon DynamoDB to Tinybird

In this guide, you'll learn how to ingest data into Tinybird from Amazon DynamoDB.

You'll use DynamoDB S3 Exports and DynamoDB Streams to capture historical and change data from DynamoDB. A custom Lambda function will forward historical snapshots and change events to Tinybird's Events API.

This guide is based on the code in the DynamoDBExporter GitHub repo.

Tinybird is actively developing a native DynamoDB connector. Contact us on support@tinybird.co to register your interest and receive notifications when it's live.

Architecture

AWS provides two free, out-of-the-box functions for DynamoDB: DynamoDB Streams and point-in-time recovery (PITR).

  • DynamoDB Streams captures change events for a given DynamoDB table and provides an API to access events as a stream. This enables CDC-like access to the table for continuous updates.
  • PITR allows you to take snapshots of your entire DynamoDB table at a point in time and save the export to S3. This enables historical access to table data for batch uploads.

This solution uses both of these services, combined with AWS Lambda functions, to send DynamoDB data to Tinybird:

Connecting DynamoDB to Tinybird architecture
Connecting DynamoDB to Tinybird architecture

When a PITR snapshot is written to S3, a Lambda function is triggered to send the file to Tinybird as a bulk import.

When changes are made to individual items in DynamoDB, a Lambda function is triggered to capture 5 second windows of change events, and push these to Tinybird's Events API.

The future DynamoDB Connector for Tinybird will follow a similar pattern but replace the need for Lambdas. You will still need to enable DynamoDB Streams and PITR, but Tinybird will manage the infrastructure to pull these into Tinybird.

Prerequisites

This guide assumes you have:

  • An existing Tinybird account & Workspace.
  • An existing AWS account & DynamoDB table.

1. Gather Tinybird details

Further in the guide you'll need the following details:

  • Your regional API URL, e.g. https://api.tinybird.co/v0
  • Admin Token
  • Desired Data Source name

You can get the regional API URL and Admin Token from your Tinybird Workspace.

Replace the Tinybird API hostname/region with the right API URL region that matches your Workspace. Your Token lives in the Workspace under "Tokens".

2. Enable DynamoDB Streams & PITR

Go to the DynamoDB service in the AWS console.

Open the table you want to sync and go to the Exports and streams tab. Scroll down and under DynamoDB stream details select the Turn on button.

Then, go to the Backups tab. Next to Point-in-time recovery (PITR) click Edit. Tick Turn on point-in-time recovery and click Save changes.

3. Create S3 bucket for backups

In the AWS console, go to the S3 service. Create a new bucket to use for your DynamoDB snapshots. Make a note of the bucket name as you'll need it in the next step.

4. Create the IAM policy

In the AWS console, go to the IAM service. Create a new policy and select the JSON policy editor. Copy the policy template from the DynamoDBExporter GitHub repo and paste it into the policy editor.

In the policy, there are two lines that need updating. Replace <bucket_name> with your S3 bucket name. The lines are:

"Resource": [
  "arn:aws:s3:::<bucket_name>",
  "arn:aws:s3:::<bucket_name>/*"
]

Once updated, name and save the policy.

5. Create the IAM Role

In the IAM service, create a new role.

The AWS entity type is AWS service and the Use case is Lambda. Attach the policy created in step 4 to the role.

Give the role a name and complete the creation wizard.

6. Create AWS Lambda

In the AWS console, go to the Lambda service. Create a new Lambda function.

Select the Author from scratch option and give the function a name.

Use the following settings:

  • Runtime: Python 3.10
  • Architecture: x86_64
  • Change default execution role -> Existing role: Use the role you created in step 5

All other settings can be left as default or configured as desired.

Finish creating the Lambda.

Once created, use the AWS console to edit the lambda_function.py code in the Code tab. Replace the contents of the file with the function code from the DynamoDBExporter GitHub repo. Save the file and click Deploy.

Go to the Configuration tab and select Environment variables.

Create the following environment variables:

  1. TB_CREATE_DS_TOKEN: Your Admin Token from step 1
  2. TB_DS_PREFIX: A table prefix string e.g. ddb_
  3. TB_API_ENDPOINT: Your regional API URL from step 1

Finally, configure the Lambda triggers.

On the Configuration tab, select Triggers.

Add a new trigger, and choose S3 as the Source. Select the bucket you created in step 3. Leave Event types as All object create events. Set the Prefix to exports/. Set Suffix to .gz. Tick the Recursive invocation acknowledgement, and click Add.

Add a second trigger. Choose DynamoDB as the Source. Select your DynamoDB table.

Use the following settings:

  • Batch size: 100
  • Starting Position: Trim Horizon
  • Batch window: 5
  • Additional settings -> Retry attempts: 5
  • Additional settings -> Split batch on error: Yes

Select Add.

7. Start a DynamoDB S3 Export

Go to the DynamoDB service in the AWS console. Open the table you want to sync and go to the Exports and streams tab.

Under Exports to S3 select Export to S3. In Destination S3 bucket, select the bucket you created in step 3, and add the prefix used when configuring the S3 Trigger in step 6, e.g. s3://<bucket_name>/exports/. In Export settings leave all settings as default. Finish by selecting Export.

If the table is empty, the Lambda will fail as there is nothing to push to Tinybird. If you are testing with a new table, add some items to the table before starting the export.

You will see an in-progress export in the Exports to S3 table. This export can take ~25 minutes regardless of the table size.

When the export completes, the Lambda function will be triggered by the file landing in S3. The file will be sent to Tinybird to load the historical table data.

8. Verify Tinybird Data Source

When the DynamoDB export status shows as completed, the Lambda function should trigger in seconds.

When the Lambda sends data, Tinybird will automatically create a Data Source if one does not already exist. Tinybird will infer the schema from the incoming data. It is also possible to pre-create a Data Source with a manually defined schema.

Go to your Tinybird Workspace and verify that a new Data Source has been created. The name of the Data Source will follow the pattern <prefix>_<table_name>, e.g., ddb_my_table.

9. Test change capture

To test that change capture is working, make a change to your DynamoDB table by creating a new item, updating an item or deleting an item. Any item change will be captured by DynamoDB Streams, which will trigger the Lambda and push the change event to Tinybird's Events API. In step 6 when creating the Lambda, the batch window was configured as 5 seconds, meaning you should see any change events appear in your Tinybird Data Source within 5-10 seconds.

Next steps

At this stage, you have DynamoDB change events arriving in a Tinybird Data Source. However, note that this is an append log of change events, and not the final deduplicated/upserted table. You will need to create a Materialized View with a ReplacingMergeTree engine to create the finalized table.