Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Our friends at Estuary have just released Dekaf, a Kafka protocol-compatible interface to Estuary Flow. This is great news for Tinybird users; you can now use Tinybird's existing Kafka Connector to consume CDC streams from the many, many data sources that Estuary integrates!

Estuary's data sources cover a huge range of the data ecosystem, from the world's most popular database, PostgreSQL, to NoSQL stores like MongoDB and warehouses like Snowflake and BigQuery. There are too many to list here, so check out Estuary's connector page to find the connector you need!

Together, Estuary and Tinybird offer an incredible combination of price, performance, and productivity. Both platforms solve the developer experience problems common across the data ecosystem without compromising the computational efficiency of the underlying tech. In Tinybird's case, that's because we use the best-in-breed, open source database ClickHouse®. For Estuary, it's their open source Flow.

Like many, we had a warehouse-centric data platform that just couldn't support the latency and concurrency requirements for user-facing analytics at any reasonable cost. When we moved those use cases to Tinybird, we shipped them 5x faster and at a 10x lower cost than our prior approach.

Guy Needham, Staff Engineer at Canva

You can sign up for Tinybird and sign up for Estuary for free. No credit card is required. Need help? Join the Tinybird Slack Community and the Estuary Slack Community to get support with both products and the integration.

Continue reading to learn more about using Tinybird and Estuary together.

How it works

Estuary has released Dekaf, a new way to integrate with Estuary Flow that presents a Kafka protocol-compatible API, allowing any existing Kafka consumer to consume messages from Estuary Flow as if it were a Kafka broker. Thanks to Tinybird's existing Kafka Connector, that means Tinybird can now connect to Estuary Flow directly.

Tinybird provides various out-of-the-box Connectors to ingest data, with the most popular being Kafka, S3, and HTTP. While Tinybird also supports first-party integrations with tools like DynamoDB, Postgres, and Snowflake, it doesn't currently support the same breadth of data source integrations offered by Estuary. Which is why we're so excited not only about the integration with Estuary Flow, but with how it's built, too.

Dekaf is yet another example of the Kafka protocol becoming bigger than Kafka itself (something similar can also be said about the S3 API and the glut of S3-compatible storage systems available today). This has some pretty nice knock knock-on benefits. For you, it makes your data pipelines more consistent and easy to integrate with your existing stack and reduces vendor lock-in, as you can take advantage of the Kafka API's wide support in data tooling.

For Tinybird and Estuary, it means neither of our engineering teams needs to spend time building vendor-specific API integrations. Instead, we can focus on building the best real-time analytics platform, and Estuary can focus on solving real-time data capture.

We're huge fans of how Estuary is solving that problem. Real-time ETLs allow you to maintain data freshness in your pipeline and eliminate stale data with diminished value. When data arrives faster, it gets processed faster. When it gets processed faster, it provides answers faster. Fast data begets fast and responsive features and services that increase customer trust and adoption, turning your data into a revenue generator (instead of a cost center).

Get started with CDC using Tinybird and Estuary Flow's Dekaf

This example will walk you through setting up a PostgreSQL Change Data Capture pipeline that streams data into Tinybird in real time using Estuary Flow and Dekaf. You can also read more on the Tinybird docs, or see Estuary's docs.

Step 1: Create PostgreSQL Capture in Estuary Flow

With your PostgreSQL running and configured for CDC, create a capture in Estuary Flow.

Configure the Endpoint by entering your Postgres server address, user, password, and database.

Select the tables that you want to sync by defining collections.

Press Save & Publish to initialize the connector. This will kick off a backfill first, then automatically switch into an incremental continuous change data capture mode.

Step 2: Create Dekaf Materialization in Estuary Flow

In Estuary, create a new Dekaf materialization to use for the Tinybird connection. You can create it from the Estuary destinations tab. For detailed instructions, see the Tinybird Dekaf Estuary docs page.

When creating the Dekaf materialization, Estuary will provide you with an auth token. Save this token along with your materialization task name (in the format YOUR-ORG/YOUR-PREFIX/YOUR-MATERIALIZATION), as you'll need both for the Tinybird connection.

Step 3: Install and Set up Tinybird

Before connecting to Estuary Dekaf, you need to install the Tinybird CLI and authenticate with your account. This guide uses the CLI for a hands-on technical workflow.

Install Tinybird CLI

First, install the Tinybird CLI on your machine:

```bash curl -L tinybird.co | sh ```

This installs the Tinybird CLI tool and sets up Tinybird Local for local development. For more installation options, see the Tinybird installation guide.

Authenticate with Tinybird

Next, authenticate with your Tinybird account:

```bash tb login ```

This command opens a browser window where you can sign in to Tinybird Cloud. If you don't have an account yet, you can create one during this process. After signing in, create a new workspace or select an existing one.

For a complete quick start guide, see Get started with Tinybird.

Step 4: Connect Tinybird to Estuary Dekaf

With CDC events being captured in Estuary Flow and exposed via Dekaf, your next step is connecting Tinybird to Estuary Dekaf. This is quite simple using the Tinybird Kafka Connector, which will securely enable Tinybird to consume messages from Estuary Dekaf as if it were a Kafka broker.

The Kafka Connector is fully managed and requires no additional tooling. Simply connect Tinybird to Estuary Dekaf, choose a collection (topic), and Tinybird will automatically begin consuming messages. As part of the ingestion process, Tinybird will extract JSON event objects with attributes that are parsed and stored in its underlying real-time database.

Create a Kafka connection

First, create a connection to Estuary Dekaf using the Tinybird CLI. You'll need the materialization task name and auth token from Step 2.

Run the following command to start the interactive wizard:

```bash tb connection create kafka ```

The wizard will prompt you to enter:

A name for your connection (e.g., estuary_dekaf)
The bootstrap server: dekaf.estuary-data.com
Security protocol: SASL_SSL
SASL mechanism: PLAIN
SASL username: Your materialization task name (e.g., YOUR-ORG/YOUR-PREFIX/YOUR-MATERIALIZATION)
SASL password: The auth token from your Estuary Dekaf materialization

If you're using Schema Registry (recommended for Avro messages), the wizard will also prompt you for:

Schema Registry URL: https://dekaf.estuary-data.com
Schema Registry username: Same as your materialization task name
Schema Registry password: Same as your auth token

Create a Kafka Data Source

Now, create a Data Source that will consume messages from your Estuary collection. You can use the guided CLI process or create the files manually.

Option 1: Use the guided CLI process (recommended)

Run the following command to start the guided process:

```bash tb datasource create --kafka ```

The CLI will prompt you to:

Select or enter the connection name (use the name you created above, e.g., estuary_dekaf)
Enter the Kafka topic name (this is the Estuary collection name you want to ingest)
Enter a consumer group ID (use a unique name, e.g., estuary_cdc_consumer)
Choose the offset reset behavior (earliest to read from the beginning, or latest to read only new messages)

Option 2: Manually create the Data Source files

Alternatively, you can manually create a .datasource file. First, create the connection file if you haven't already. Create a file named connections/estuary_dekaf.connection:

```tinybird TYPE kafka KAFKA_BOOTSTRAP_SERVERS dekaf.estuary-data.com KAFKA_SECURITY_PROTOCOL SASL_SSL KAFKA_SASL_MECHANISM PLAIN KAFKA_SASL_USERNAME YOUR-ORG/YOUR-PREFIX/YOUR-MATERIALIZATION KAFKA_SASL_PASSWORD {{ tb_secret("ESTUARY_AUTH_TOKEN") }} ```

If you're using Schema Registry for Avro messages, add these settings to your connection file:

```tinybird KAFKA_SCHEMA_REGISTRY_URL https://dekaf.estuary-data.com KAFKA_SCHEMA_REGISTRY_USERNAME YOUR-ORG/YOUR-PREFIX/YOUR-MATERIALIZATION KAFKA_SCHEMA_REGISTRY_PASSWORD {{ tb_secret("ESTUARY_AUTH_TOKEN") }} ```

Then, create a Data Source file (e.g., datasources/postgres_cdc.datasource) that references this connection. Here's an example that defines a Tinybird Data Source to hold the change events from your PostgreSQL table. In your case, the SCHEMA should match the data in your Estuary collection. Use JSONPath expressions to extract specific fields from the CDC events into separate columns:

```tinybird SCHEMA > `id` Int64 `json:$.id`, `name` String `json:$.name`, `email` String `json:$.email`, `created_at` DateTime `json:$.created_at`, `updated_at` DateTime `json:$.updated_at` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(__timestamp)" ENGINE_SORTING_KEY "__timestamp" KAFKA_CONNECTION_NAME estuary_dekaf KAFKA_TOPIC your_estuary_collection_name KAFKA_GROUP_ID {{ tb_secret("KAFKA_GROUP_ID") }} ```

Replace your_estuary_collection_name with the actual Estuary collection name. Adjust the schema fields to match the structure of your source data. The __timestamp column is automatically added by Tinybird and represents when the event was ingested.

Deploy the Data Source

After creating your connection and Data Source files, deploy them to Tinybird Cloud:

```bash tb --cloud deploy ```

You can also validate the setup before deploying by running:

```bash tb --cloud deploy --check ```

This will verify that Tinybird can connect to Estuary Dekaf with the provided credentials.

Once deployed, Tinybird will automatically begin consuming messages from your Estuary collection, and you'll start seeing CDC events stream into your Data Source as changes are made to the source data system.

Step 5: Handle Deduplication for CDC at Scale

When implementing CDC at scale, deduplication is essential. CDC streams can produce duplicate events due to network retries, connector restarts, or Kafka consumer rebalancing. Without proper deduplication, you'll have inconsistent analytics and incorrect aggregations.

There are several strategies for handling deduplication in Tinybird:

ReplacingMergeTree Engine: Use Tinybird's ReplacingMergeTree engine to automatically deduplicate rows based on a primary key. This is ideal when you have a unique identifier (like a document ID) and a timestamp or version field that indicates the latest state.
Lambda Architecture: Implement a Lambda Architecture pattern where you maintain both a real-time stream and a batch layer. The batch layer provides the source of truth, while the stream layer provides low-latency updates. This approach is particularly effective for handling late-arriving data and ensuring eventual consistency.
Query-time Deduplication: Use SQL functions like argMax to deduplicate at query time. This approach is flexible but can impact query performance on large datasets.

For detailed guidance on implementing these strategies, see:

Deduplication Strategies: Comprehensive guide on deduplication techniques in Tinybird
Lambda Architecture: Guide on implementing Lambda Architecture patterns for real-time analytics

Step 6: Start building real-time analytics with Tinybird

Now your CDC data pipeline should be up and running, capturing changes from your source database, streaming them through Estuary Flow and Dekaf, and then sinking them into a real-time, analytical datastore on Tinybird's real-time data platform.

You can now query, shape, join, and enrich your CDC data with SQL Pipes and instantly publish your transformations as high-concurrency, low-latency APIs to power your next use case.

For example, create a Pipe file (e.g., pipes/get_cdc_changes.pipe) to query your CDC data:

```tinybird NODE endpoint SQL > SELECT id, name, email, created_at, updated_at FROM postgres_cdc ORDER BY __timestamp DESC LIMIT 100 ```

Deploy your Pipe to Tinybird Cloud:

```bash tb --cloud deploy ```

After deployment, Tinybird automatically creates API endpoints for your Pipes. You can access your endpoint using the token you created. Here's an example of how to call the endpoint:

```bash curl "https://api.tinybird.co/v0/pipes/get_cdc_changes.json?token=YOUR_TOKEN" ```

The endpoint returns data in JSON format by default. You can also request other formats:

.csv for CSV format
.ndjson for newline-delimited JSON
.parquet for Parquet format

Example response:

```json { "data": [ { "id": 1, "name": "John Doe", "email": "john.doe.com", "created_at": "2024-09-23 10:30:00", "updated_at": "2024-09-23 10:30:00" } ], "rows": 1, "statistics": { "elapsed": 0.001, "rows_read": 1, "bytes_read": 256 } } ```

A simpler architecture for CDC and real-time analytics

In the last year, Kafka compatible layers like Estuary's Dekaf have quietly changed how teams think about CDC pipelines. Instead of standing up and operating full Kafka clusters, you can treat Estuary Flow as a Kafka compatible endpoint and let it fan in changes from hundreds of databases and SaaS tools into a single stream that Tinybird can consume. This pattern keeps the familiar Kafka API for your downstream tools while offloading the hard parts of capture, transformation, and delivery to Flow.

On the Tinybird side, this fits perfectly with its connector and real time analytics model. Estuary provides a low latency CDC feed from sources like PostgreSQL, MySQL, MongoDB, BigQuery, and many others, and Tinybird turns that feed into queryable tables, SQL based transformations, and HTTP APIs that can power dashboards or user facing product features with fresh data in milliseconds.

The result is a simpler architecture for real-time analytics: no custom ETL, no manual schema juggling, and far less infra to run, but still the ability to ship things like personalization, fraud checks, or cart abandonment alerts backed directly by CDC events.

Get started with Tinybird

Tinybird is the real-time platform for user-facing analytics. Unify your data from hundreds of sources (thanks, Estuary 😉), query it with SQL, and integrate it into your application via highly scalable, hosted REST API Endpoints.

You can sign up for Tinybird for free, with no time limit or credit card required. Need help? Join our Slack Community or check out the Tinybird docs for more resources.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Blog

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

From CDC to real-time analytics with Tinybird and Estuary

How it works

Get started with CDC using Tinybird and Estuary Flow's Dekaf

Step 1: Create PostgreSQL Capture in Estuary Flow

Step 2: Create Dekaf Materialization in Estuary Flow

Step 3: Install and Set up Tinybird

Install Tinybird CLI

Authenticate with Tinybird

Step 4: Connect Tinybird to Estuary Dekaf

Create a Kafka connection

Create a Kafka Data Source

Deploy the Data Source

Step 5: Handle Deduplication for CDC at Scale

Step 6: Start building real-time analytics with Tinybird

A simpler architecture for CDC and real-time analytics

Get started with Tinybird

Ship faster
with Tinybird

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse® project now.

Blog

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

From CDC to real-time analytics with Tinybird and Estuary

How it works

Get started with CDC using Tinybird and Estuary Flow's Dekaf

Step 1: Create PostgreSQL Capture in Estuary Flow

Step 2: Create Dekaf Materialization in Estuary Flow

Step 3: Install and Set up Tinybird

Install Tinybird CLI

Authenticate with Tinybird

Step 4: Connect Tinybird to Estuary Dekaf

Create a Kafka connection

Create a Kafka Data Source

Deploy the Data Source

Step 5: Handle Deduplication for CDC at Scale

Step 6: Start building real-time analytics with Tinybird

A simpler architecture for CDC and real-time analytics

Get started with Tinybird

Ship faster with Tinybird

Skip the infra work. Deploy your first ClickHouse project now

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Ship faster
with Tinybird