Kafka Connector¶
The Kafka Connector allows you to ingest data from your existing Kafka cluster into Tinybird so that you can quickly turn them into high-concurrency, low-latency REST APIs.
The Kafka Connector is fully managed and requires no additional tooling. Connect Tinybird to your Kafka cluster, choose a topic, and Tinybird automatically begins consuming messages from Kafka.
The Kafka Connector is:
- Easy to use. Connect to Kafka and start building APIs right away. Choose a topic, what fields you are interested in and ingest millions of rows per second.
- SQL-based. Transform or enrich your Kafka topics with JOINs using our serverless Data Pipes.
- Secure. Use Auth tokens to control access to API endpoints. Implement access policies as you need. Support for row-level security.
Prerequisites¶
You'll need to grant READ permissions to both the Topic and the Consumer Group to ingest data from Kafka into Tinybird.
Your Kafka brokers must be secured with SSL/TLS and SASL. Tinybird will always use SASL_SSL
as the security.protocol
for the Kafka consumer. Connections will be rejected if the brokers only support PLAINTEXT
or SASL_PLAINTEXT
.
Remember that connections created using the UI flow are only created in the main Workspace, so if you create a new Branch from a Workspace with existing Kafka Data Sources, the Branch Data Sources won't receive that streaming data automatically. You'll need to use the CLI to re-create the Kafka Data Source.
For testing purposes, it's also recommended to use different Kafka connections in the main Workspace vs. any Branches.
Add a Kafka connection¶
Create the connection using the UI or CLI.
Using the CLI¶
Adding a Kafka connection in the main Workspace
tb auth # use the main Workspace admin Token tb connection create kafka --bootstrap-servers <server> --key <key> --secret <secret> --connection-name <name>
Using the UI¶
Alternatively, using the UI, navigate to the left-hand nav ("Data Project"), select the + icon, then "Data Source". Select "Kafka" and configure the connection.
Update a Kafka connection¶
Updating your credentials or cluster details can only be done in the Tinybird web UI. Navigate to the left-hand nav ("Data Project"), select the + icon, then "Data Source". Select "Kafka" and the connection you want to update. Edit (or delete) the connection details using the three dot menu:
Any Data Source that depends on this connection will be affected by these updates, so be sure before you save your changes.
Use .datasource files¶
If you are managing your Tinybird resources in files, there are several settings available to configure the Kafka Connector in .datasource
files.
See the datafiles docs for more information.
Use INCLUDE to store connection settings¶
To avoid configuring the same connection settings across many files, or to prevent leaking sensitive information, you can store connection details in an external file and use INCLUDE
to import them into one or more .datasource
files.
You can find more information about INCLUDE
in the Advanced Templates documentation.
As an example, you may have two Kafka .datasource
files, which re-use the same Kafka connection. You can create an include file which stores the Kafka connection details.
The Tinybird project may use the following structure:
Tinybird data project file structure
ecommerce_data_project/ datasources/ connections/ my_connector_name.incl my_kafka_datasource.datasource another_datasource.datasource endpoints/ pipes/
Where the file my_connector_name.incl
has the following content:
Include file containing Kafka connection details
KAFKA_CONNECTION_NAME my_connection_name KAFKA_BOOTSTRAP_SERVERS my_server:9092 KAFKA_KEY my_username KAFKA_SECRET my_password
And the Kafka .datasource
files look like the following:
Data Source using includes for Kafka connection details
SCHEMA > `__value` String, `__topic` LowCardinality(String), `__partition` Int16, `__offset` Int64, `__timestamp` DateTime, `__key` String ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "timestamp" INCLUDE "connections/my_connection_name.incl" KAFKA_TOPIC my_topic KAFKA_GROUP_ID my_group_id
When using tb pull
to pull a Kafka Data Source using the CLI, the KAFKA_KEY
and KAFKA_SECRET
settings are not included in the file, to avoid exposing credentials.
Iterate a Kafka Data Source¶
This section uses Branches. Be sure you're familiar with the behavior of Branches in Tinybird when using the Kafka Connector - see Prerequisites.
Update a Kafka Data Source¶
When you create a Branch that has existing Kafka Data Sources, the Data Sources in the Branch won't be connected to Kafka.
Therefore, if you want to update the schema, you need to re-create the Kafka Data Source in the Branch.
In branches, Tinybird automatically appends _{BRANCH}
to the Kafka group ID to prevent collisions. It also forces the consumers in Branches to always consume the latest
messages, to reduce the performance impact.
Add a new Kafka Data Source¶
To create and test a Kafka Data Source in a Branch, start by using an existing connection. It's possible to create and use existing connections from the Branch via UI, but remember that these connections will always be created in the main Workspace.
You can create a Kafka Data Source in a Branch as in production, but this Data Source won't have any connection details internally. It's useful for testing purposes, but in the end, you should always define the connection in the .datafile
and Kafka parameters that will be used in production.
To move the Data Source to production, include the connection settings in the Data Source .datafile
, as explained in the .datafiles docs.
Delete a Kafka Data Source¶
If a Data Source has been created in a Branch, the Data Source would be active until the Data Source is removed in the Branch or when the entire Branch is removed.
If you delete an existing Kafka Data Source in a Branch, it won't be deleted in the main Workspace. To delete a Kafka Data Source, it should be done directly against the main Workspace explicitly. It's possible to use the CLI for that purpose, and include it in the CI/CD workflows as necessary.
Limits¶
The limits for the Kafka connector are:
- Minimum flush time: 4 seconds
- Throughput (uncompressed) 20MB/s
- Up to 3 connections per Workspace
If you're hitting these limits, contact support@tinybird.co for support.
Troubleshooting¶
If you aren't receiving data¶
When Kafka commits a message for a topic and a group id, it always sends data from the latest committed offset. In Tinybird, each Kafka Data Source receives data from a topic and uses a group id, and this combination of topic + group id
must be unique - Tinybird won't allow you to create a Kafka Data Source using an existing topic + group id
combination.
However, if you remove a Kafka Data Source and you re-create it again with the same settings after having received data, you'll only get data from the latest committed offset, even if KAFKA_AUTO_OFFSET_RESET
is set to earliest
.
This happens both in the main Workspace and in Branches (if you're using them), since connections are always created in the main Workspace and are shared across Branches.
Recommended next steps:
- Use always a different group id when testing Kafka Data Sources.
- Check in the
tinybird.kafka_ops_log
Service Data Source to see if you've already used a group id to ingest data from a topic.
FAQs¶
Is the Kafka Schema Registry supported?¶
Yes, for decoding Avro messages. You can choose to enable Schema Registry support when connecting Tinybird to Kafka. You will be prompted to add your Schema Registry connection details, e.g. https://<SCHEMA_REGISTRY_API_KEY>:<SCHEMA_REGISTRY_API_SECRET>@<SCHEMA_REGISTRY_API_URL>
. However, the Kafka Data Source schema will not be defined using the Schema Registry, the Schema Registry is simply used to decode the messages.
Can Tinybird ingest compressed messages?¶
Yes, Tinybird can consume from Kafka topics where Kafka compression has been enabled, as decompressing the message is a standard function of the Kafka Consumer.
However, if you compressed the message before passing it through the Kafka Producer, then Tinybird cannot do post-Consumer processing to decompress the message.
For example, if you compressed a JSON message through gzip and produced it to a Kafka topic as a bytes
message, it would be ingested by Tinybird as bytes
. If you produced a JSON message to a Kafka topic with the Kafka Producer setting compression.type=gzip, while it would be stored in Kafka as compressed bytes, it would be decoded on ingestion and arrive to Tinybird as JSON.
What are the __<field>
fields stored in the Kafka Data Source?¶
Those fields represent the raw data received from Kafka:
__value
: A String representing the whole Kafka record inserted__topic
: The Kafka topic that the message belongs to__partition
: The kafka partition that the message belongs to__offset
: The Kafka offset of the message__timestamp
: The timestamp stored in the Kafka message received by Tinybird__key
: The key of the kafka message
How do I connect to Aiven Kafka¶
Aiven for Apache Kafka service instances expose multiple SASL ports with 2 different kinds of SASL certificates: Private CA (self-signed), and Public CA (signed by Let's Encrypt. To connect to Aiven Kafka, you need to enable the Public CA port, which is disabled by default.