Kafka connector troubleshooting guide¶

This guide helps you diagnose and resolve common issues with Tinybird's Kafka connector. Use the tinybird.kafka_ops_log Service Data Source to monitor errors and warnings in real time.

For setup instructions and configuration details, see the Kafka connector documentation.

Quick error lookup¶

Use this table to quickly find errors and their solutions. Errors may appear in kafka_ops_log (Kafka connector operations) or datasources_ops_log (Data Source ingestion operations).

Error message / symptom	Category	Log source	Solution link
Connection timeout or broker unreachable	Connectivity	kafka_ops_log	Connection timeout
Authentication failed	Authentication	kafka_ops_log, datasources_ops_log	Authentication failed
SSL handshake failed	SSL/TLS	kafka_ops_log	SSL certificate validation
Schema Registry connection failed	Deserialization	kafka_ops_log	Schema Registry
Deserialization failed - Avro	Deserialization	kafka_ops_log	Avro deserialization
Deserialization failed - JSON	Deserialization	kafka_ops_log	JSON deserialization
Offset commit failed	Consumer group	kafka_ops_log	Offset commit
Consumer lag continuously increasing	Performance	kafka_ops_log	Consumer lag
Schema mismatch or type conversion failed	Schema	kafka_ops_log	Schema mismatch
Materialized View errors	Schema	kafka_ops_log, datasources_ops_log	Materialized View errors
Low throughput or processing stall	Performance	kafka_ops_log	Low throughput
Uneven partition processing	Performance	kafka_ops_log	Uneven partitions
Message too large	Message size	kafka_ops_log	Message size
Compressed message handling	Message format	kafka_ops_log	Compression
Unknown topic or partition	Kafka	datasources_ops_log	Unknown topic
Group authorization failed	Authorization	datasources_ops_log	Group authorization
Topic authorization failed	Authorization	datasources_ops_log	Topic authorization
Unknown partition	Kafka	datasources_ops_log	Unknown partition
Table in readonly mode	Data Source	datasources_ops_log	Readonly mode
Timeout or memory limit exceeded	Resource	datasources_ops_log	Timeout

How to diagnose errors¶

Use both kafka_ops_log and datasources_ops_log to diagnose Kafka connector issues:

Check Kafka connector operations¶

Query recent errors and warnings from kafka_ops_log:

SELECT
    timestamp,
    datasource_id,
    topic,
    partition,
    msg_type,
    msg,
    lag
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND msg_type IN ('warning', 'error')
ORDER BY timestamp DESC

Check Data Source ingestion errors¶

Query errors from datasources_ops_log to see issues during data ingestion:

SELECT
    timestamp,
    datasource_id,
    event_type,
    result,
    error,
    elapsed_time
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND result = 'error'
  AND event_type LIKE '%kafka%'
ORDER BY timestamp DESC

This shows errors that occur during the actual data processing phase, even when the Kafka connection itself might be working.

Set up automated monitoring: Connect these diagnostic queries to your monitoring and alerting tools. Query the ClickHouse® HTTP interface directly from tools like Grafana, Datadog, PagerDuty, and Slack. Alternatively, create API endpoints from these queries, or export them in Prometheus format for Prometheus-compatible tools. Configure your tools to poll these queries periodically and trigger alerts when errors are detected.

For detailed monitoring queries, see Monitor Kafka connectors.

Connectivity errors¶

Error: Connection timeout or broker unreachable¶

Symptoms:

No messages are being processed
Errors in kafka_ops_log with messages like "Connection timeout" or "Broker unreachable"
High lag values that continue to increase

Root causes:

Incorrect KAFKA_BOOTSTRAP_SERVERS configuration
Network connectivity issues between Tinybird and your Kafka cluster
Firewall or security group rules blocking access
Kafka broker is down or unreachable

Solutions:

Verify bootstrap servers configuration:
- Check that KAFKA_BOOTSTRAP_SERVERS in your .connection file includes the correct host and port
- Ensure you're using the advertised listeners address, not the internal broker address
- For multiple brokers, use comma-separated values: broker1:9092,broker2:9092,broker3:9092
- For cloud providers, verify you're using the public endpoint provided by your Kafka service
Test connectivity:

tb connection data <connection_name>

This command allows you to select a topic and consumer group ID, then returns preview data. This validates that Tinybird can reach your Kafka broker, authenticate, and consume messages.

Check network configuration:
- Verify firewall rules allow outbound connections from Tinybird to your Kafka cluster
- For AWS MSK, ensure security groups allow inbound traffic on the Kafka port
- For Confluent Cloud, verify network access settings
- For PrivateLink setups (Enterprise), verify the PrivateLink connection is active
Verify security protocol:
- Ensure KAFKA_SECURITY_PROTOCOL matches your Kafka cluster configuration
- For most cloud providers, use SASL_SSL
- For local development, you may use PLAINTEXT

For vendor-specific network configuration help, see:

Error: Authentication failed¶

Symptoms:

Errors in kafka_ops_log with "Authentication failed" or "SASL authentication error"
Connection check fails with authentication errors

Root causes:

Incorrect KAFKA_KEY or KAFKA_SECRET credentials
Wrong KAFKA_SASL_MECHANISM configuration
Expired credentials or tokens
For AWS MSK with OAuthBearer, incorrect IAM role configuration

Solutions:

Verify credentials:

tb [--cloud] secret get KAFKA_KEY
tb [--cloud] secret get KAFKA_SECRET

Ensure the secrets match your Kafka cluster credentials.

Check SASL mechanism:
- Verify KAFKA_SASL_MECHANISM matches your Kafka cluster (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, or OAUTHBEARER)
- For Confluent Cloud, typically use PLAIN
- For AWS MSK with IAM, use OAUTHBEARER with KAFKA_SASL_OAUTHBEARER_METHOD AWS
- For Redpanda, check your cluster's configured SASL mechanism
For AWS MSK OAuthBearer:
- Verify the IAM role ARN is correct: tb [--cloud] secret get AWS_ROLE_ARN
- Check that the IAM role has the correct trust policy allowing Tinybird to assume the role
- Verify the external ID matches between your connection configuration and IAM trust policy
- Ensure the IAM role has the required Kafka cluster permissions (see AWS IAM permissions)
Rotate credentials if needed:
- If credentials have expired, update them using tb secret set
- Redeploy your connection after updating secrets

For detailed authentication setup, see:

AWS MSK setup guide for IAM authentication
Confluent Cloud setup guide for API key authentication

Error: SSL/TLS certificate validation failed¶

Symptoms:

Errors mentioning "SSL handshake failed" or "certificate validation error"
Connection failures when using SASL_SSL security protocol

Root causes:

Missing or incorrect CA certificate
Self-signed certificate not provided
Certificate expired or invalid

Solutions:

Provide CA certificate:

tb [--cloud] secret set --multiline KAFKA_SSL_CA_PEM

Paste your CA certificate in PEM format.

Add certificate to connection file:

KAFKA_SSL_CA_PEM >
   {{ tb_secret("KAFKA_SSL_CA_PEM") }}

Note: This is a multiline setting.

Verify certificate format:
- Ensure the certificate is in PEM format (starts with -----BEGIN CERTIFICATE-----)
- Include the full certificate chain if required
- For Aiven Kafka, download the CA certificate from the Aiven console

Deserialization errors¶

Error: Schema Registry connection failed¶

Symptoms:

Errors in kafka_ops_log mentioning "Schema Registry" or "Failed to fetch schema"
Messages not being ingested when using Avro or JSON with schema

Root causes:

Incorrect KAFKA_SCHEMA_REGISTRY_URL configuration
Missing or incorrect Schema Registry credentials
Schema Registry is unreachable
Schema not found in Schema Registry

Solutions:

Verify Schema Registry URL:
- Check that KAFKA_SCHEMA_REGISTRY_URL in your .connection file is correct
- For Basic Auth, use format: https://<username>:<password>@<registry_host>
- Ensure the URL is accessible from Tinybird's network
Check schema exists:
- Verify the schema exists in your Schema Registry for the topic
- Ensure the schema subject name matches your topic naming convention
- For Confluent Schema Registry, check subject names like {topic-name}-value or {topic-name}-key
Test Schema Registry access:
- Use curl or similar tool to verify Schema Registry is reachable
- Verify credentials work with Schema Registry API

For more information on schema management, see the schema management guide.

Error: Deserialization failed - Avro¶

Symptoms:

Warnings in kafka_ops_log with "Deserialization failed" or "Avro parsing error"
Messages sent to Quarantine Data Source
processed_messages > committed_messages in monitoring queries

Root causes:

Schema mismatch between message and Schema Registry
Schema evolution incompatibility
Incorrect KAFKA_VALUE_FORMAT or KAFKA_KEY_FORMAT configuration
Corrupted message data

Solutions:

Verify format configuration:
- Ensure KAFKA_VALUE_FORMAT is set to avro for Avro messages
- Ensure KAFKA_KEY_FORMAT is set to avro if keys are Avro-encoded
- Verify KAFKA_SCHEMA_REGISTRY_URL is configured
Check schema compatibility:
- Verify the message schema matches the schema in Schema Registry
- Check for schema evolution issues (backward/forward compatibility)
- Review Quarantine Data Source to see the actual message that failed
Inspect quarantined messages:

SELECT *
FROM your_datasource_quarantine
WHERE timestamp > now() - INTERVAL 1 hour
ORDER BY timestamp DESC
LIMIT 100

This helps you see the actual message content and identify the issue.

Schema evolution:
- Ensure schema changes are backward compatible
- Consider using schema versioning strategies
- Test schema changes in a development environment first

For detailed schema evolution guidance, see the schema management guide.

Error: Deserialization failed - JSON¶

Symptoms:

Warnings in kafka_ops_log with "JSON parsing error" or "Invalid JSON"
Messages in Quarantine Data Source
Low success rate in throughput monitoring

Root causes:

Invalid JSON format in message payload
Schema mismatch with JSONPath expressions
Missing required fields in JSON
Incorrect KAFKA_VALUE_FORMAT configuration

Solutions:

Verify JSON format:
- Check that messages are valid JSON
- Use a JSON validator to test sample messages
- Review Quarantine Data Source for examples of failed messages
Check JSONPath expressions:
- Verify JSONPath expressions in your Data Source schema match the message structure
- Test JSONPath expressions with sample messages
- Use json:$ to store the entire message if you're unsure of the structure
Handle missing fields:
- Use nullable types for optional fields: Nullable(String)
- Provide default values in JSONPath: json:$.field DEFAULT ''
- Consider using a schemaless approach with data String json:$ and extract fields later
Verify format configuration:
- Use json_without_schema for plain JSON messages
- Use json_with_schema only if you're using Schema Registry for JSON schemas

Offset and consumer group errors¶

Error: Offset commit failed or consumer group conflict¶

Symptoms:

Data Source only receives messages from the last committed offset
Multiple Data Sources competing for the same consumer group
Errors about offset commit failures

Root causes:

Multiple Data Sources using the same KAFKA_TOPIC and KAFKA_GROUP_ID combination
Consumer group already in use by another app
Offset reset behavior not working as expected

Solutions:

Use unique consumer group IDs:
- Each Data Source must use a unique KAFKA_GROUP_ID for the same topic
- Use environment-specific group IDs: {{ tb_secret("KAFKA_GROUP_ID", "prod-group") }}
- For testing, use unique group IDs to avoid conflicts
Check for duplicate configurations:

SELECT
      datasource_id,
      topic,
      count(*) as group_count
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
GROUP BY datasource_id, topic
HAVING group_count > 1

This helps identify if multiple Data Sources are consuming from the same topic.

Reset offset behavior:
- KAFKA_AUTO_OFFSET_RESET=earliest only works for new consumer groups
- If a consumer group already has committed offsets, it resumes from the last committed offset
- To start from the beginning, use a new KAFKA_GROUP_ID or reset offsets in your Kafka cluster
Best practices:
- Use different KAFKA_GROUP_ID values for development, staging, and production
- Document which consumer groups are in use
- Monitor consumer group activity in your Kafka cluster

For managing consumer groups across environments, see the CI/CD and version control guide.

Error: Consumer lag continuously increasing¶

Symptoms:

Lag values in kafka_ops_log keep growing
Messages are not being processed fast enough
Throughput is lower than message production rate

Root causes:

Message production rate exceeds processing capacity
Consumer autoscaling not keeping up with load
Network latency or connectivity issues
Data Source schema or Materialized View performance issues

Solutions:

Monitor lag trends:

SELECT
      datasource_id,
      topic,
      partition,
      max(lag) as current_lag,
      avg(lag) as avg_lag
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
   AND partition >= 0
GROUP BY datasource_id, topic, partition
ORDER BY current_lag DESC

Verify autoscaling:
- Tinybird's serverless Kafka connector automatically scales consumers
- Monitor kafka_ops_log to see partition assignment changes
- If lag continues to increase, there may be a bottleneck in your Data Source or Materialized Views
Check Data Source performance:
- Review Materialized View queries that trigger on append
- Optimize complex Materialized View queries that may slow down ingestion
- Check for schema issues causing slow parsing
Analyze throughput:

SELECT
      datasource_id,
      topic,
      sum(processed_messages) as processed,
      sum(committed_messages) as committed,
      (sum(committed_messages) * 100.0 / sum(processed_messages)) as success_rate
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
GROUP BY datasource_id, topic

Low success rates indicate processing issues.

Review partitioning strategy:
- Check if partition distribution is even
- Review partition key design if lag is uneven across partitions
- See the partitioning strategies guide for optimization tips
Contact support:
- If lag continues to increase despite autoscaling, contact Tinybird support
- Provide kafka_ops_log queries showing the issue
- Include information about message production rates

For performance optimization strategies, see the performance optimization guide.

Data quality and schema errors¶

Error: Schema mismatch or type conversion failed¶

Symptoms:

Warnings in kafka_ops_log about type mismatches
Messages in Quarantine Data Source
Low committed_messages compared to processed_messages

Root causes:

Data type mismatch between message and Data Source schema
Missing required fields
Invalid data formats (for example, date strings that can't be parsed)
JSONPath expressions not matching message structure

Solutions:

Review schema definition:
- Verify column types match the data in messages
- Use appropriate ClickHouse® types (for example, DateTime for timestamps, Int64 for large integers)
- Check for nullable vs non-nullable field requirements
Test with sample messages:
- Use tb sql to test JSONPath expressions with sample data
- Verify date/time formats can be parsed correctly
- Check numeric formats and precision
Handle data quality issues:
- Use nullable types for fields that may be missing: Nullable(String)
- Provide default values: json:$.field DEFAULT 0
- Use type conversion functions if needed: toDateTime(JSONExtractString(data, 'timestamp'))
Inspect quarantined data:
- Regularly check Quarantine Data Source for patterns
- Identify common data quality issues
- Update schema or data producers to fix root causes

For detailed schema management guidance, see the schema management guide.

Error: Materialized View errors¶

Symptoms:

Warnings in kafka_ops_log mentioning Materialized View errors
Data ingested but Materialized Views not updating
Errors in Materialized View queries
Errors in Materialized View queries affecting ingestion

Root causes:

Materialized View query errors
Schema changes breaking Materialized View queries
Resource constraints (memory, CPU)
Circular dependencies between Materialized Views

Solutions:

Check Materialized View queries:
- Review Materialized View pipe definitions
- Test Materialized View queries independently
- Verify queries work with the current Data Source schema
Monitor Materialized View impact:
- Monitor overall ingestion throughput in kafka_ops_log to see if Materialized Views are slowing down ingestion
- Check for errors in Materialized View queries that may be blocking ingestion
- Review Materialized View query complexity and execution time
Optimize Materialized View queries:
- Simplify complex aggregations
- Add appropriate filters to reduce data volume
- Consider breaking complex Materialized Views into multiple steps
Handle schema evolution:
- Update Materialized View queries when Data Source schema changes
- Test Materialized View changes in development first
- Use FORWARD_QUERY to provide default values for new columns

Data Source operation errors¶

These errors occur during the data ingestion phase and are logged in datasources_ops_log. They represent problems that happen during actual data processing, even when the Kafka connection itself might be working.

Error: Unknown topic or partition¶

Error message:

"KafkaError[UNKNOWN_TOPIC_OR_PART]: Broker: Unknown topic or partition"

Symptoms:

Errors in datasources_ops_log with "Unknown topic or partition"
No messages being ingested
Topic name errors

Root causes:

Topic doesn't exist in Kafka cluster
Topic was deleted
Topic name typo in configuration
Topic retention policies caused data deletion

Solutions:

Verify topic exists:
- Check your Kafka cluster to confirm the topic exists
- Use Kafka tools: kafka-topics.sh --list --bootstrap-server <server>
- Verify topic name matches exactly (case-sensitive)
Check topic configuration:
- Ensure topic hasn't been deleted
- Verify topic retention policies haven't removed all data
- Check if topic was renamed
Verify Data Source configuration:

cat <datasource_name.datasource>

Check that KAFKA_TOPIC matches the actual topic name.

Create topic if needed:
- If topic doesn't exist, create it in your Kafka cluster
- Ensure proper replication factor and partitions
- Redeploy the Data Source after creating the topic

Note: If the error specifically mentions a partition (not the topic), see Unknown partition in the following section for partition-specific troubleshooting.

Monitor topic errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%UNKNOWN_TOPIC%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Authentication failure (during ingestion)¶

Error message:

"KafkaError[_AUTHENTICATION]: Local: Authentication failure"

Symptoms:

Errors in datasources_ops_log with authentication failures
Connection works initially but fails during ingestion
Credentials expired or rotated during operation

Root causes:

SASL credentials expired or invalid during ingestion
SSL certificates expired
Authentication settings changed on Kafka broker
Credentials rotated but not updated in Tinybird
Token-based authentication expired mid-operation

Solutions:

Verify credentials:

tb [--cloud] secret get KAFKA_KEY
tb [--cloud] secret get KAFKA_SECRET

Ensure secrets match your Kafka cluster credentials.

Check for credential expiration:
- Some credentials have expiration dates
- Rotate credentials if they've expired
- Update secrets and redeploy connection
- For token-based auth, ensure tokens are refreshed before expiration
Verify SSL certificates:
- Check certificate expiration dates
- Update certificates if expired
- Verify certificate format is correct
Test connection:

tb connection data <connection_name>

This validates authentication is working.

Check for intermittent auth failures:
- Monitor datasources_ops_log for authentication error patterns
- If errors occur periodically, credentials may be expiring
- Set up credential rotation before expiration

Note: This error occurs during data ingestion, not during initial connection. If you see authentication errors during connection setup, see Authentication failed in the Connectivity errors section.

Monitor authentication errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%AUTHENTICATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Group authorization failed¶

Error message:

"KafkaError[GROUP_AUTHORIZATION_FAILED]: Broker: Group authorization failed"

Symptoms:

Errors in datasources_ops_log with "Group authorization failed"
Consumer group lacks permissions
ACLs not configured correctly

Root causes:

Consumer group lacks proper authorization
Kafka ACLs not configured for the consumer group
Consumer group name doesn't match ACL configuration
Permissions changed on Kafka cluster

Solutions:

Check Kafka ACLs:
- Verify consumer group has read permissions
- Check ACLs for the specific consumer group ID
- Ensure ACLs allow operations on the topic
Verify consumer group ID:
- Check the KAFKA_GROUP_ID in your Data Source configuration
- Ensure it matches what's configured in Kafka ACLs
- Use consistent naming across environments
Update ACLs:
- Grant necessary permissions to the consumer group
- Ensure group has access to read from the topic
- Verify group can commit offsets
Test with different group ID:
- Try a different consumer group ID temporarily
- If it works, the issue is with ACLs for the original group
- Update ACLs for the original group

Monitor authorization errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%GROUP_AUTHORIZATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Topic authorization failed¶

Error message:

"KafkaError[TOPIC_AUTHORIZATION_FAILED]: Broker: Topic authorization failed"

Symptoms:

Errors in datasources_ops_log with "Topic authorization failed"
Cannot read from topic
ACLs not configured for topic access

Root causes:

Kafka client lacks permission to read from topic
Topic ACLs not configured
Permissions changed on Kafka cluster
Credentials don't have topic access

Solutions:

Check topic ACLs:
- Verify credentials have read permissions on the topic
- Check ACLs for the specific topic name
- Ensure ACLs allow consumer operations
Verify credentials:
- Ensure credentials have proper topic access
- Check if topic permissions have changed
- Update credentials if needed
Update ACLs:
- Grant read permissions to the topic
- Ensure consumer group has topic access
- Verify ACLs are applied correctly
Test connection:

tb connection data <connection_name>

Select the topic to verify access.

Monitor topic authorization errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%TOPIC_AUTHORIZATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Unknown partition¶

Error message:

"KafkaError[_UNKNOWN_PARTITION]: Local: Unknown partition"

Symptoms:

Errors in datasources_ops_log with "Unknown partition" (note: different from "Unknown topic or partition")
Specific partition no longer available
Topic reconfiguration issues
Partition-specific errors

Root causes:

Partition no longer exists in topic (topic was reconfigured)
Topic reconfiguration changed partition count
Broker failures affecting specific partition availability
Partition replication issues
Partition was deleted or reassigned

Solutions:

Check topic configuration:
- Verify current partition count for the topic
- Check if topic was reconfigured (partitions added/removed)
- Ensure partition assignments are correct
- Compare current partition count with what the connector expects
Check broker health:
- Verify all brokers are healthy
- Check for broker failures that might affect specific partitions
- Ensure partition replication is working
- Review partition leader assignments
Review partition assignments:
- Check if partition assignments changed
- Verify replication factors are correct
- Consider rebalancing if needed
- Check if partitions were reassigned to different brokers
Monitor partition availability:
- Use kafka_ops_log to see which partitions are being accessed
- Check for partition-specific errors
- Identify which specific partition is causing issues
- Contact support if partitions are consistently unavailable

Note: This error is different from "Unknown topic or partition" - this specifically indicates a partition issue when the topic exists. If you see "UNKNOWN_TOPIC_OR_PART", see Unknown topic or partition in the preceding section.

Monitor partition errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%UNKNOWN_PARTITION%'
  AND error NOT LIKE '%UNKNOWN_TOPIC%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Table in readonly mode¶

Error message:

"Table is in readonly mode: replica_path=..."

Symptoms:

Errors in datasources_ops_log with "readonly mode"
Data Source temporarily unavailable for writes
Replication or maintenance in progress

Root causes:

ClickHouse® table in readonly mode during replication
Ongoing maintenance operations
ClickHouse® cluster issues
Replication lag or issues

Solutions:

Wait for table to become writable:
- This is often a transient state
- Wait a few minutes and check again
- Monitor datasources_ops_log for resolution
Check ClickHouse® cluster:
- Verify cluster health
- Check for ongoing maintenance
Monitor for resolution:

SELECT
      timestamp,
      datasource_id,
      error,
      count(*) as occurrence_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
   AND result = 'error'
   AND error LIKE '%readonly%'
GROUP BY timestamp, datasource_id, error
ORDER BY timestamp DESC

Contact support:
- If issue persists for extended period
- Provide datasources_ops_log queries showing the issue
- Include timestamps and Data Source IDs

Note: Readonly mode errors are typically transient and resolve automatically. If they persist, contact Tinybird support.

Error: Timeout or memory limit exceeded¶

Error message:

"memory limit exceeded: would use ... GiB"
"Waiting timeout for memo"
Timeout errors during ingestion

Symptoms:

Errors in datasources_ops_log with timeout or memory errors
Large messages or complex transformations
Resource constraints

Root causes:

Message size too large
Complex Materialized View queries consuming too much memory
High message throughput
Resource constraints

Solutions:

Reduce message size:
- Use Kafka compression
- Split large messages into smaller chunks
- Move large data to external storage
Optimize Materialized View queries:
- Simplify complex aggregations
- Add filters to reduce data volume
- Break complex transformations into multiple steps
Monitor memory usage:

SELECT
      timestamp,
      datasource_id,
      error,
      elapsed_time
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
   AND result = 'error'
   AND (error LIKE '%memory%' OR error LIKE '%timeout%')
ORDER BY timestamp DESC

Optimize transformations:
- Reduce data processed per operation
- Use more efficient query patterns
- Consider batching operations
Contact support:
- If memory issues persist
- Discuss resource requirements
- Consider plan upgrades if needed

For more information on handling large messages, see the message size handling guide.

Performance and throughput issues¶

Error: Low throughput or processing stall¶

Symptoms:

processed_messages is zero or low
No recent activity in kafka_ops_log
Data Source not receiving new messages

Root causes:

Kafka topic has no new messages
Consumer has stopped or crashed
Network connectivity issues
Configuration errors preventing consumption

Solutions:

Verify topic has messages:
- Check your Kafka cluster to verify messages are being produced
- Use Kafka tools to verify topic has new messages
- Check producer metrics
Check connector activity:

SELECT
      datasource_id,
      topic,
      max(timestamp) as last_activity,
      now() - max(timestamp) as time_since_activity
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 7 day
GROUP BY datasource_id, topic
HAVING time_since_activity > INTERVAL 1 hour

Verify configuration:
- Run tb connection data <connection_name> to test the connection and preview data
- Verify all required settings are present
- Check for typos in topic names or connection names
Check for errors:
- Review recent errors in kafka_ops_log
- Check Quarantine Data Source for issues
- Review datasources_ops_log for Data Source operation errors

Error: Uneven partition processing¶

Symptoms:

Some partitions have high lag while others have low lag
Uneven message distribution across partitions
Some partitions processing faster than others

Root causes:

Uneven message distribution in Kafka topic
Partition key design causing hot partitions
Consumer assignment imbalance
Different message sizes across partitions

Solutions:

Analyze partition distribution:

SELECT
      datasource_id,
      topic,
      partition,
      max(lag) as max_lag,
      avg(lag) as avg_lag,
      sum(processed_messages) as total_messages
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
   AND partition >= 0
GROUP BY datasource_id, topic, partition
ORDER BY max_lag DESC

Review partition key strategy:
- Ensure partition keys distribute messages evenly
- Avoid using keys that create hot partitions
- Consider using random keys if even distribution is needed
Monitor autoscaling:
- Tinybird's connector automatically balances partition assignment
- Monitor kafka_ops_log to see partition assignment changes
- High lag should trigger additional consumer instances
Optimize at producer level:
- Review Kafka producer configuration
- Adjust partition key strategy if needed
- Consider increasing topic partitions if needed

For detailed partitioning strategies, see the partitioning strategies guide.

Compression and message format errors¶

Error: Compressed message handling¶

Symptoms:

Messages ingested as raw bytes instead of decompressed content
Warnings about message format

Root causes:

Messages compressed before being sent to Kafka producer
Kafka compression not configured correctly
Message format not recognized

Solutions:

Understand compression types:
- Kafka compression (configured in producer): Automatically decompressed by Kafka consumer
- App-level compression (compressed before producing): Not automatically decompressed
Use Kafka compression:
- Configure Kafka producer with compression.type=gzip (or snappy, lz4)
- Kafka consumer automatically decompresses these messages
- Messages arrive in Tinybird already decompressed
Handle app-level compression:
- If you compress messages before sending to Kafka, you need to handle decompression
- Consider storing compressed messages and decompressing in Materialized Views
- Or change producer to use Kafka compression instead
Verify message format:
- Check that KAFKA_VALUE_FORMAT matches your message format
- For JSON, use json_without_schema or json_with_schema
- For Avro, use avro with Schema Registry configured

Message size errors¶

Error: Message too large or quarantined due to size¶

Symptoms:

Messages sent to Quarantine Data Source
Errors about message size limits
Large messages not being ingested

Root causes:

Message exceeds Tinybird's 10 MB default limit
Large payloads causing memory issues
Compression not reducing message size effectively

Solutions:

Check message size:
- Review Quarantine Data Source for size-related errors
- Verify message sizes in your Kafka topic
- Use Kafka tools to inspect message sizes
Implement compression:
- Use Kafka compression to reduce message size
- Consider compressing large payloads before producing to Kafka
Split large messages:
- Break large messages into smaller chunks
- Use message headers to track message parts
- Reassemble in Materialized Views if needed
Alternative approaches:
- Store large payloads in object storage (S3, GCS) and reference them in Kafka messages
- Use external storage for large binary data

For detailed guidance on handling large messages, see the message size handling guide.

Quarantine Data Source issues¶

Understanding Quarantine Data Source¶

When messages fail to ingest into your main Data Source, they are automatically sent to a Quarantine Data Source. This prevents data loss and allows you to inspect problematic messages.

Common reasons for quarantine:

Schema mismatches
Invalid data formats
Type conversion errors
Missing required fields
Deserialization failures
Message size limits exceeded

How to inspect quarantined messages:

SELECT *
FROM your_datasource_quarantine
WHERE timestamp > now() - INTERVAL 24 hour
ORDER BY timestamp DESC
LIMIT 100

How to resolve:

Identify patterns in quarantined messages
Fix schema or data quality issues
Update Data Source schema if needed
Fix data producers to send correct formats
Consider reprocessing quarantined messages after fixes

For more information, see Quarantine Data Sources.

Getting help¶

If you've tried the preceding solutions and still experience issues:

Collect diagnostic information:
- Recent errors from kafka_ops_log
- Recent errors from datasources_ops_log
- Connection configuration (without secrets)
- Data Source schema
- Sample of problematic messages (if available)
Check monitoring:
- Review Kafka monitoring guide
- Check Service Data Sources for additional context
- Query both kafka_ops_log and datasources_ops_log for complete picture
Review related guides:
- Performance optimization guide for throughput issues
- Schema management guide for schema-related problems
Contact support:
- Provide error messages and timestamps from both logs
- Include relevant queries from kafka_ops_log and datasources_ops_log
- Share configuration details (sanitized)
- Describe steps to reproduce the issue

Prevention best practices¶

Use unique consumer group IDs for each Data Source and environment
Test schema changes in development before deploying to production
Monitor both kafka_ops_log and datasources_ops_log regularly to catch issues early
Set up automated alerts for high lag or error rates in both logs using monitoring tools
Review Quarantine Data Source periodically to identify data quality issues
Test connections using tb connection data <connection_name> to preview data before deploying
Document consumer group usage to avoid conflicts
Test with sample messages before connecting production topics
Use environment-specific configurations for development, staging, and production
Keep credentials secure using Tinybird secrets, never hardcode them
Regularly review Kafka ACLs to ensure proper permissions
Monitor for missing tables and recreate Data Sources if accidentally deleted
Verify topic and partition availability before deploying connectors
Optimize Materialized View queries to prevent timeout and memory errors
Set up monitoring for authorization errors to catch permission issues early

Integrate with your monitoring stack: Connect the monitoring queries in this guide to your existing monitoring tools. Query the ClickHouse® HTTP interface directly from Grafana, Datadog, PagerDuty, Slack, and other alerting systems. You can also create API endpoints from these queries, or export them in Prometheus format for Prometheus-compatible tools. This activates proactive monitoring and automated alerting for your Kafka connectors.

For comprehensive monitoring queries and alerts, see Monitor Kafka connectors.