Kafka connector troubleshooting guide

This guide helps you diagnose and resolve common issues with Tinybird's Kafka connector. Use the tinybird.kafka_ops_log Service Data Source to monitor errors and warnings in real time.

For setup instructions and configuration details, see the Kafka connector documentation.

Quick error lookup

Use this table to quickly find errors and their solutions. Errors may appear in kafka_ops_log (Kafka connector operations) or datasources_ops_log (Data Source ingestion operations).

Error message / symptomCategoryLog sourceSolution link
Connection timeout or broker unreachableConnectivitykafka_ops_logConnection timeout
Authentication failedAuthenticationkafka_ops_log, datasources_ops_logAuthentication failed
SSL handshake failedSSL/TLSkafka_ops_logSSL certificate validation
Schema Registry connection failedDeserializationkafka_ops_logSchema Registry
Deserialization failed - AvroDeserializationkafka_ops_logAvro deserialization
Deserialization failed - JSONDeserializationkafka_ops_logJSON deserialization
Offset commit failedConsumer groupkafka_ops_logOffset commit
Consumer lag continuously increasingPerformancekafka_ops_logConsumer lag
Schema mismatch or type conversion failedSchemakafka_ops_logSchema mismatch
Materialized View errorsSchemakafka_ops_log, datasources_ops_logMaterialized View errors
Low throughput or processing stallPerformancekafka_ops_logLow throughput
Uneven partition processingPerformancekafka_ops_logUneven partitions
Message too largeMessage sizekafka_ops_logMessage size
Compressed message handlingMessage formatkafka_ops_logCompression
Unknown topic or partitionKafkadatasources_ops_logUnknown topic
Group authorization failedAuthorizationdatasources_ops_logGroup authorization
Topic authorization failedAuthorizationdatasources_ops_logTopic authorization
Unknown partitionKafkadatasources_ops_logUnknown partition
Table in readonly modeData Sourcedatasources_ops_logReadonly mode
Timeout or memory limit exceededResourcedatasources_ops_logTimeout

How to diagnose errors

Use both kafka_ops_log and datasources_ops_log to diagnose Kafka connector issues:

Check Kafka connector operations

Query recent errors and warnings from kafka_ops_log:

SELECT
    timestamp,
    datasource_id,
    topic,
    partition,
    msg_type,
    msg,
    lag
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND msg_type IN ('warning', 'error')
ORDER BY timestamp DESC

Check Data Source ingestion errors

Query errors from datasources_ops_log to see issues during data ingestion:

SELECT
    timestamp,
    datasource_id,
    event_type,
    result,
    error,
    elapsed_time
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND result = 'error'
  AND event_type LIKE '%kafka%'
ORDER BY timestamp DESC

This shows errors that occur during the actual data processing phase, even when the Kafka connection itself might be working.

Set up automated monitoring: Connect these diagnostic queries to your monitoring and alerting tools. Query the ClickHouse® HTTP interface directly from tools like Grafana, Datadog, PagerDuty, and Slack. Alternatively, create API endpoints from these queries, or export them in Prometheus format for Prometheus-compatible tools. Configure your tools to poll these queries periodically and trigger alerts when errors are detected.

For detailed monitoring queries, see Monitor Kafka connectors.

Connectivity errors

Error: Connection timeout or broker unreachable

Symptoms:

  • No messages are being processed
  • Errors in kafka_ops_log with messages like "Connection timeout" or "Broker unreachable"
  • High lag values that continue to increase

Root causes:

  1. Incorrect KAFKA_BOOTSTRAP_SERVERS configuration
  2. Network connectivity issues between Tinybird and your Kafka cluster
  3. Firewall or security group rules blocking access
  4. Kafka broker is down or unreachable

Solutions:

  1. Verify bootstrap servers configuration:

    • Check that KAFKA_BOOTSTRAP_SERVERS in your .connection file includes the correct host and port
    • Ensure you're using the advertised listeners address, not the internal broker address
    • For multiple brokers, use comma-separated values: broker1:9092,broker2:9092,broker3:9092
    • For cloud providers, verify you're using the public endpoint provided by your Kafka service
  2. Test connectivity:

tb connection data <connection_name>

This command allows you to select a topic and consumer group ID, then returns preview data. This validates that Tinybird can reach your Kafka broker, authenticate, and consume messages.

  1. Check network configuration:

    • Verify firewall rules allow outbound connections from Tinybird to your Kafka cluster
    • For AWS MSK, ensure security groups allow inbound traffic on the Kafka port
    • For Confluent Cloud, verify network access settings
    • For PrivateLink setups (Enterprise), verify the PrivateLink connection is active
  2. Verify security protocol:

    • Ensure KAFKA_SECURITY_PROTOCOL matches your Kafka cluster configuration
    • For most cloud providers, use SASL_SSL
    • For local development, you may use PLAINTEXT

For vendor-specific network configuration help, see:

Error: Authentication failed

Symptoms:

  • Errors in kafka_ops_log with "Authentication failed" or "SASL authentication error"
  • Connection check fails with authentication errors

Root causes:

  1. Incorrect KAFKA_KEY or KAFKA_SECRET credentials
  2. Wrong KAFKA_SASL_MECHANISM configuration
  3. Expired credentials or tokens
  4. For AWS MSK with OAuthBearer, incorrect IAM role configuration

Solutions:

  1. Verify credentials:
tb [--cloud] secret get KAFKA_KEY
tb [--cloud] secret get KAFKA_SECRET

Ensure the secrets match your Kafka cluster credentials.

  1. Check SASL mechanism:

    • Verify KAFKA_SASL_MECHANISM matches your Kafka cluster (PLAIN, SCRAM-SHA-256, SCRAM-SHA-512, or OAUTHBEARER)
    • For Confluent Cloud, typically use PLAIN
    • For AWS MSK with IAM, use OAUTHBEARER with KAFKA_SASL_OAUTHBEARER_METHOD AWS
    • For Redpanda, check your cluster's configured SASL mechanism
  2. For AWS MSK OAuthBearer:

    • Verify the IAM role ARN is correct: tb [--cloud] secret get AWS_ROLE_ARN
    • Check that the IAM role has the correct trust policy allowing Tinybird to assume the role
    • Verify the external ID matches between your connection configuration and IAM trust policy
    • Ensure the IAM role has the required Kafka cluster permissions (see AWS IAM permissions)
  3. Rotate credentials if needed:

    • If credentials have expired, update them using tb secret set
    • Redeploy your connection after updating secrets

For detailed authentication setup, see:

Error: SSL/TLS certificate validation failed

Symptoms:

  • Errors mentioning "SSL handshake failed" or "certificate validation error"
  • Connection failures when using SASL_SSL security protocol

Root causes:

  1. Missing or incorrect CA certificate
  2. Self-signed certificate not provided
  3. Certificate expired or invalid

Solutions:

  1. Provide CA certificate:
tb [--cloud] secret set --multiline KAFKA_SSL_CA_PEM

Paste your CA certificate in PEM format.

  1. Add certificate to connection file:
KAFKA_SSL_CA_PEM >
   {{ tb_secret("KAFKA_SSL_CA_PEM") }}

Note: This is a multiline setting.

  1. Verify certificate format:
    • Ensure the certificate is in PEM format (starts with -----BEGIN CERTIFICATE-----)
    • Include the full certificate chain if required
    • For Aiven Kafka, download the CA certificate from the Aiven console

Deserialization errors

Error: Schema Registry connection failed

Symptoms:

  • Errors in kafka_ops_log mentioning "Schema Registry" or "Failed to fetch schema"
  • Messages not being ingested when using Avro or JSON with schema

Root causes:

  1. Incorrect KAFKA_SCHEMA_REGISTRY_URL configuration
  2. Missing or incorrect Schema Registry credentials
  3. Schema Registry is unreachable
  4. Schema not found in Schema Registry

Solutions:

  1. Verify Schema Registry URL:

    • Check that KAFKA_SCHEMA_REGISTRY_URL in your .connection file is correct
    • For Basic Auth, use format: https://<username>:<password>@<registry_host>
    • Ensure the URL is accessible from Tinybird's network
  2. Check schema exists:

    • Verify the schema exists in your Schema Registry for the topic
    • Ensure the schema subject name matches your topic naming convention
    • For Confluent Schema Registry, check subject names like {topic-name}-value or {topic-name}-key
  3. Test Schema Registry access:

    • Use curl or similar tool to verify Schema Registry is reachable
    • Verify credentials work with Schema Registry API

For more information on schema management, see the schema management guide.

Error: Deserialization failed - Avro

Symptoms:

  • Warnings in kafka_ops_log with "Deserialization failed" or "Avro parsing error"
  • Messages sent to Quarantine Data Source
  • processed_messages > committed_messages in monitoring queries

Root causes:

  1. Schema mismatch between message and Schema Registry
  2. Schema evolution incompatibility
  3. Incorrect KAFKA_VALUE_FORMAT or KAFKA_KEY_FORMAT configuration
  4. Corrupted message data

Solutions:

  1. Verify format configuration:

    • Ensure KAFKA_VALUE_FORMAT is set to avro for Avro messages
    • Ensure KAFKA_KEY_FORMAT is set to avro if keys are Avro-encoded
    • Verify KAFKA_SCHEMA_REGISTRY_URL is configured
  2. Check schema compatibility:

    • Verify the message schema matches the schema in Schema Registry
    • Check for schema evolution issues (backward/forward compatibility)
    • Review Quarantine Data Source to see the actual message that failed
  3. Inspect quarantined messages:

SELECT *
FROM your_datasource_quarantine
WHERE timestamp > now() - INTERVAL 1 hour
ORDER BY timestamp DESC
LIMIT 100

This helps you see the actual message content and identify the issue.

  1. Schema evolution:
    • Ensure schema changes are backward compatible
    • Consider using schema versioning strategies
    • Test schema changes in a development environment first

For detailed schema evolution guidance, see the schema management guide.

Error: Deserialization failed - JSON

Symptoms:

  • Warnings in kafka_ops_log with "JSON parsing error" or "Invalid JSON"
  • Messages in Quarantine Data Source
  • Low success rate in throughput monitoring

Root causes:

  1. Invalid JSON format in message payload
  2. Schema mismatch with JSONPath expressions
  3. Missing required fields in JSON
  4. Incorrect KAFKA_VALUE_FORMAT configuration

Solutions:

  1. Verify JSON format:

    • Check that messages are valid JSON
    • Use a JSON validator to test sample messages
    • Review Quarantine Data Source for examples of failed messages
  2. Check JSONPath expressions:

    • Verify JSONPath expressions in your Data Source schema match the message structure
    • Test JSONPath expressions with sample messages
    • Use json:$ to store the entire message if you're unsure of the structure
  3. Handle missing fields:

    • Use nullable types for optional fields: Nullable(String)
    • Provide default values in JSONPath: json:$.field DEFAULT ''
    • Consider using a schemaless approach with data String json:$ and extract fields later
  4. Verify format configuration:

    • Use json_without_schema for plain JSON messages
    • Use json_with_schema only if you're using Schema Registry for JSON schemas

Offset and consumer group errors

Error: Offset commit failed or consumer group conflict

Symptoms:

  • Data Source only receives messages from the last committed offset
  • Multiple Data Sources competing for the same consumer group
  • Errors about offset commit failures

Root causes:

  1. Multiple Data Sources using the same KAFKA_TOPIC and KAFKA_GROUP_ID combination
  2. Consumer group already in use by another app
  3. Offset reset behavior not working as expected

Solutions:

  1. Use unique consumer group IDs:

    • Each Data Source must use a unique KAFKA_GROUP_ID for the same topic
    • Use environment-specific group IDs: {{ tb_secret("KAFKA_GROUP_ID", "prod-group") }}
    • For testing, use unique group IDs to avoid conflicts
  2. Check for duplicate configurations:

SELECT
      datasource_id,
      topic,
      count(*) as group_count
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
GROUP BY datasource_id, topic
HAVING group_count > 1

This helps identify if multiple Data Sources are consuming from the same topic.

  1. Reset offset behavior:

    • KAFKA_AUTO_OFFSET_RESET=earliest only works for new consumer groups
    • If a consumer group already has committed offsets, it resumes from the last committed offset
    • To start from the beginning, use a new KAFKA_GROUP_ID or reset offsets in your Kafka cluster
  2. Best practices:

    • Use different KAFKA_GROUP_ID values for development, staging, and production
    • Document which consumer groups are in use
    • Monitor consumer group activity in your Kafka cluster

For managing consumer groups across environments, see the CI/CD and version control guide.

Error: Consumer lag continuously increasing

Symptoms:

  • Lag values in kafka_ops_log keep growing
  • Messages are not being processed fast enough
  • Throughput is lower than message production rate

Root causes:

  1. Message production rate exceeds processing capacity
  2. Consumer autoscaling not keeping up with load
  3. Network latency or connectivity issues
  4. Data Source schema or Materialized View performance issues

Solutions:

  1. Monitor lag trends:
SELECT
      datasource_id,
      topic,
      partition,
      max(lag) as current_lag,
      avg(lag) as avg_lag
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
   AND partition >= 0
GROUP BY datasource_id, topic, partition
ORDER BY current_lag DESC
  1. Verify autoscaling:

    • Tinybird's serverless Kafka connector automatically scales consumers
    • Monitor kafka_ops_log to see partition assignment changes
    • If lag continues to increase, there may be a bottleneck in your Data Source or Materialized Views
  2. Check Data Source performance:

    • Review Materialized View queries that trigger on append
    • Optimize complex Materialized View queries that may slow down ingestion
    • Check for schema issues causing slow parsing
  3. Analyze throughput:

SELECT
      datasource_id,
      topic,
      sum(processed_messages) as processed,
      sum(committed_messages) as committed,
      (sum(committed_messages) * 100.0 / sum(processed_messages)) as success_rate
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
GROUP BY datasource_id, topic

Low success rates indicate processing issues.

  1. Review partitioning strategy:

    • Check if partition distribution is even
    • Review partition key design if lag is uneven across partitions
    • See the partitioning strategies guide for optimization tips
  2. Contact support:

    • If lag continues to increase despite autoscaling, contact Tinybird support
    • Provide kafka_ops_log queries showing the issue
    • Include information about message production rates

For performance optimization strategies, see the performance optimization guide.

Data quality and schema errors

Error: Schema mismatch or type conversion failed

Symptoms:

  • Warnings in kafka_ops_log about type mismatches
  • Messages in Quarantine Data Source
  • Low committed_messages compared to processed_messages

Root causes:

  1. Data type mismatch between message and Data Source schema
  2. Missing required fields
  3. Invalid data formats (for example, date strings that can't be parsed)
  4. JSONPath expressions not matching message structure

Solutions:

  1. Review schema definition:

    • Verify column types match the data in messages
    • Use appropriate ClickHouse® types (for example, DateTime for timestamps, Int64 for large integers)
    • Check for nullable vs non-nullable field requirements
  2. Test with sample messages:

    • Use tb sql to test JSONPath expressions with sample data
    • Verify date/time formats can be parsed correctly
    • Check numeric formats and precision
  3. Handle data quality issues:

    • Use nullable types for fields that may be missing: Nullable(String)
    • Provide default values: json:$.field DEFAULT 0
    • Use type conversion functions if needed: toDateTime(JSONExtractString(data, 'timestamp'))
  4. Inspect quarantined data:

    • Regularly check Quarantine Data Source for patterns
    • Identify common data quality issues
    • Update schema or data producers to fix root causes

For detailed schema management guidance, see the schema management guide.

Error: Materialized View errors

Symptoms:

  • Warnings in kafka_ops_log mentioning Materialized View errors
  • Data ingested but Materialized Views not updating
  • Errors in Materialized View queries
  • Errors in Materialized View queries affecting ingestion

Root causes:

  1. Materialized View query errors
  2. Schema changes breaking Materialized View queries
  3. Resource constraints (memory, CPU)
  4. Circular dependencies between Materialized Views

Solutions:

  1. Check Materialized View queries:

    • Review Materialized View pipe definitions
    • Test Materialized View queries independently
    • Verify queries work with the current Data Source schema
  2. Monitor Materialized View impact:

    • Monitor overall ingestion throughput in kafka_ops_log to see if Materialized Views are slowing down ingestion
    • Check for errors in Materialized View queries that may be blocking ingestion
    • Review Materialized View query complexity and execution time
  3. Optimize Materialized View queries:

    • Simplify complex aggregations
    • Add appropriate filters to reduce data volume
    • Consider breaking complex Materialized Views into multiple steps
  4. Handle schema evolution:

    • Update Materialized View queries when Data Source schema changes
    • Test Materialized View changes in development first
    • Use FORWARD_QUERY to provide default values for new columns

Data Source operation errors

These errors occur during the data ingestion phase and are logged in datasources_ops_log. They represent problems that happen during actual data processing, even when the Kafka connection itself might be working.

Error: Unknown topic or partition

Error message:

  • "KafkaError[UNKNOWN_TOPIC_OR_PART]: Broker: Unknown topic or partition"

Symptoms:

  • Errors in datasources_ops_log with "Unknown topic or partition"
  • No messages being ingested
  • Topic name errors

Root causes:

  1. Topic doesn't exist in Kafka cluster
  2. Topic was deleted
  3. Topic name typo in configuration
  4. Topic retention policies caused data deletion

Solutions:

  1. Verify topic exists:

    • Check your Kafka cluster to confirm the topic exists
    • Use Kafka tools: kafka-topics.sh --list --bootstrap-server <server>
    • Verify topic name matches exactly (case-sensitive)
  2. Check topic configuration:

    • Ensure topic hasn't been deleted
    • Verify topic retention policies haven't removed all data
    • Check if topic was renamed
  3. Verify Data Source configuration:

cat <datasource_name.datasource>

Check that KAFKA_TOPIC matches the actual topic name.

  1. Create topic if needed:
    • If topic doesn't exist, create it in your Kafka cluster
    • Ensure proper replication factor and partitions
    • Redeploy the Data Source after creating the topic

Note: If the error specifically mentions a partition (not the topic), see Unknown partition in the following section for partition-specific troubleshooting.

Monitor topic errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%UNKNOWN_TOPIC%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Authentication failure (during ingestion)

Error message:

  • "KafkaError[_AUTHENTICATION]: Local: Authentication failure"

Symptoms:

  • Errors in datasources_ops_log with authentication failures
  • Connection works initially but fails during ingestion
  • Credentials expired or rotated during operation

Root causes:

  1. SASL credentials expired or invalid during ingestion
  2. SSL certificates expired
  3. Authentication settings changed on Kafka broker
  4. Credentials rotated but not updated in Tinybird
  5. Token-based authentication expired mid-operation

Solutions:

  1. Verify credentials:
tb [--cloud] secret get KAFKA_KEY
tb [--cloud] secret get KAFKA_SECRET

Ensure secrets match your Kafka cluster credentials.

  1. Check for credential expiration:

    • Some credentials have expiration dates
    • Rotate credentials if they've expired
    • Update secrets and redeploy connection
    • For token-based auth, ensure tokens are refreshed before expiration
  2. Verify SSL certificates:

    • Check certificate expiration dates
    • Update certificates if expired
    • Verify certificate format is correct
  3. Test connection:

tb connection data <connection_name>

This validates authentication is working.

  1. Check for intermittent auth failures:
    • Monitor datasources_ops_log for authentication error patterns
    • If errors occur periodically, credentials may be expiring
    • Set up credential rotation before expiration

Note: This error occurs during data ingestion, not during initial connection. If you see authentication errors during connection setup, see Authentication failed in the Connectivity errors section.

Monitor authentication errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%AUTHENTICATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Group authorization failed

Error message:

  • "KafkaError[GROUP_AUTHORIZATION_FAILED]: Broker: Group authorization failed"

Symptoms:

  • Errors in datasources_ops_log with "Group authorization failed"
  • Consumer group lacks permissions
  • ACLs not configured correctly

Root causes:

  1. Consumer group lacks proper authorization
  2. Kafka ACLs not configured for the consumer group
  3. Consumer group name doesn't match ACL configuration
  4. Permissions changed on Kafka cluster

Solutions:

  1. Check Kafka ACLs:

    • Verify consumer group has read permissions
    • Check ACLs for the specific consumer group ID
    • Ensure ACLs allow operations on the topic
  2. Verify consumer group ID:

    • Check the KAFKA_GROUP_ID in your Data Source configuration
    • Ensure it matches what's configured in Kafka ACLs
    • Use consistent naming across environments
  3. Update ACLs:

    • Grant necessary permissions to the consumer group
    • Ensure group has access to read from the topic
    • Verify group can commit offsets
  4. Test with different group ID:

    • Try a different consumer group ID temporarily
    • If it works, the issue is with ACLs for the original group
    • Update ACLs for the original group

Monitor authorization errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%GROUP_AUTHORIZATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Topic authorization failed

Error message:

  • "KafkaError[TOPIC_AUTHORIZATION_FAILED]: Broker: Topic authorization failed"

Symptoms:

  • Errors in datasources_ops_log with "Topic authorization failed"
  • Cannot read from topic
  • ACLs not configured for topic access

Root causes:

  1. Kafka client lacks permission to read from topic
  2. Topic ACLs not configured
  3. Permissions changed on Kafka cluster
  4. Credentials don't have topic access

Solutions:

  1. Check topic ACLs:

    • Verify credentials have read permissions on the topic
    • Check ACLs for the specific topic name
    • Ensure ACLs allow consumer operations
  2. Verify credentials:

    • Ensure credentials have proper topic access
    • Check if topic permissions have changed
    • Update credentials if needed
  3. Update ACLs:

    • Grant read permissions to the topic
    • Ensure consumer group has topic access
    • Verify ACLs are applied correctly
  4. Test connection:

tb connection data <connection_name>

Select the topic to verify access.

Monitor topic authorization errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%TOPIC_AUTHORIZATION%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Unknown partition

Error message:

  • "KafkaError[_UNKNOWN_PARTITION]: Local: Unknown partition"

Symptoms:

  • Errors in datasources_ops_log with "Unknown partition" (note: different from "Unknown topic or partition")
  • Specific partition no longer available
  • Topic reconfiguration issues
  • Partition-specific errors

Root causes:

  1. Partition no longer exists in topic (topic was reconfigured)
  2. Topic reconfiguration changed partition count
  3. Broker failures affecting specific partition availability
  4. Partition replication issues
  5. Partition was deleted or reassigned

Solutions:

  1. Check topic configuration:

    • Verify current partition count for the topic
    • Check if topic was reconfigured (partitions added/removed)
    • Ensure partition assignments are correct
    • Compare current partition count with what the connector expects
  2. Check broker health:

    • Verify all brokers are healthy
    • Check for broker failures that might affect specific partitions
    • Ensure partition replication is working
    • Review partition leader assignments
  3. Review partition assignments:

    • Check if partition assignments changed
    • Verify replication factors are correct
    • Consider rebalancing if needed
    • Check if partitions were reassigned to different brokers
  4. Monitor partition availability:

    • Use kafka_ops_log to see which partitions are being accessed
    • Check for partition-specific errors
    • Identify which specific partition is causing issues
    • Contact support if partitions are consistently unavailable

Note: This error is different from "Unknown topic or partition" - this specifically indicates a partition issue when the topic exists. If you see "UNKNOWN_TOPIC_OR_PART", see Unknown topic or partition in the preceding section.

Monitor partition errors:

SELECT
    timestamp,
    datasource_id,
    error,
    count(*) as error_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
  AND result = 'error'
  AND error LIKE '%UNKNOWN_PARTITION%'
  AND error NOT LIKE '%UNKNOWN_TOPIC%'
GROUP BY timestamp, datasource_id, error
ORDER BY error_count DESC

Error: Table in readonly mode

Error message:

  • "Table is in readonly mode: replica_path=..."

Symptoms:

  • Errors in datasources_ops_log with "readonly mode"
  • Data Source temporarily unavailable for writes
  • Replication or maintenance in progress

Root causes:

  1. ClickHouse® table in readonly mode during replication
  2. Ongoing maintenance operations
  3. ClickHouse® cluster issues
  4. Replication lag or issues

Solutions:

  1. Wait for table to become writable:

    • This is often a transient state
    • Wait a few minutes and check again
    • Monitor datasources_ops_log for resolution
  2. Check ClickHouse® cluster:

    • Verify cluster health
    • Check for ongoing maintenance
  3. Monitor for resolution:

SELECT
      timestamp,
      datasource_id,
      error,
      count(*) as occurrence_count
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
   AND result = 'error'
   AND error LIKE '%readonly%'
GROUP BY timestamp, datasource_id, error
ORDER BY timestamp DESC
  1. Contact support:
    • If issue persists for extended period
    • Provide datasources_ops_log queries showing the issue
    • Include timestamps and Data Source IDs

Note: Readonly mode errors are typically transient and resolve automatically. If they persist, contact Tinybird support.

Error: Timeout or memory limit exceeded

Error message:

  • "memory limit exceeded: would use ... GiB"
  • "Waiting timeout for memo"
  • Timeout errors during ingestion

Symptoms:

  • Errors in datasources_ops_log with timeout or memory errors
  • Large messages or complex transformations
  • Resource constraints

Root causes:

  1. Message size too large
  2. Complex Materialized View queries consuming too much memory
  3. High message throughput
  4. Resource constraints

Solutions:

  1. Reduce message size:

    • Use Kafka compression
    • Split large messages into smaller chunks
    • Move large data to external storage
  2. Optimize Materialized View queries:

    • Simplify complex aggregations
    • Add filters to reduce data volume
    • Break complex transformations into multiple steps
  3. Monitor memory usage:

SELECT
      timestamp,
      datasource_id,
      error,
      elapsed_time
FROM tinybird.datasources_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
   AND result = 'error'
   AND (error LIKE '%memory%' OR error LIKE '%timeout%')
ORDER BY timestamp DESC
  1. Optimize transformations:

    • Reduce data processed per operation
    • Use more efficient query patterns
    • Consider batching operations
  2. Contact support:

    • If memory issues persist
    • Discuss resource requirements
    • Consider plan upgrades if needed

For more information on handling large messages, see the message size handling guide.

Performance and throughput issues

Error: Low throughput or processing stall

Symptoms:

  • processed_messages is zero or low
  • No recent activity in kafka_ops_log
  • Data Source not receiving new messages

Root causes:

  1. Kafka topic has no new messages
  2. Consumer has stopped or crashed
  3. Network connectivity issues
  4. Configuration errors preventing consumption

Solutions:

  1. Verify topic has messages:

    • Check your Kafka cluster to verify messages are being produced
    • Use Kafka tools to verify topic has new messages
    • Check producer metrics
  2. Check connector activity:

SELECT
      datasource_id,
      topic,
      max(timestamp) as last_activity,
      now() - max(timestamp) as time_since_activity
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 7 day
GROUP BY datasource_id, topic
HAVING time_since_activity > INTERVAL 1 hour
  1. Verify configuration:

    • Run tb connection data <connection_name> to test the connection and preview data
    • Verify all required settings are present
    • Check for typos in topic names or connection names
  2. Check for errors:

    • Review recent errors in kafka_ops_log
    • Check Quarantine Data Source for issues
    • Review datasources_ops_log for Data Source operation errors

Error: Uneven partition processing

Symptoms:

  • Some partitions have high lag while others have low lag
  • Uneven message distribution across partitions
  • Some partitions processing faster than others

Root causes:

  1. Uneven message distribution in Kafka topic
  2. Partition key design causing hot partitions
  3. Consumer assignment imbalance
  4. Different message sizes across partitions

Solutions:

  1. Analyze partition distribution:
SELECT
      datasource_id,
      topic,
      partition,
      max(lag) as max_lag,
      avg(lag) as avg_lag,
      sum(processed_messages) as total_messages
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 24 hour
   AND partition >= 0
GROUP BY datasource_id, topic, partition
ORDER BY max_lag DESC
  1. Review partition key strategy:

    • Ensure partition keys distribute messages evenly
    • Avoid using keys that create hot partitions
    • Consider using random keys if even distribution is needed
  2. Monitor autoscaling:

    • Tinybird's connector automatically balances partition assignment
    • Monitor kafka_ops_log to see partition assignment changes
    • High lag should trigger additional consumer instances
  3. Optimize at producer level:

    • Review Kafka producer configuration
    • Adjust partition key strategy if needed
    • Consider increasing topic partitions if needed

For detailed partitioning strategies, see the partitioning strategies guide.

Compression and message format errors

Error: Compressed message handling

Symptoms:

  • Messages ingested as raw bytes instead of decompressed content
  • Warnings about message format

Root causes:

  1. Messages compressed before being sent to Kafka producer
  2. Kafka compression not configured correctly
  3. Message format not recognized

Solutions:

  1. Understand compression types:

    • Kafka compression (configured in producer): Automatically decompressed by Kafka consumer
    • App-level compression (compressed before producing): Not automatically decompressed
  2. Use Kafka compression:

    • Configure Kafka producer with compression.type=gzip (or snappy, lz4)
    • Kafka consumer automatically decompresses these messages
    • Messages arrive in Tinybird already decompressed
  3. Handle app-level compression:

    • If you compress messages before sending to Kafka, you need to handle decompression
    • Consider storing compressed messages and decompressing in Materialized Views
    • Or change producer to use Kafka compression instead
  4. Verify message format:

    • Check that KAFKA_VALUE_FORMAT matches your message format
    • For JSON, use json_without_schema or json_with_schema
    • For Avro, use avro with Schema Registry configured

Message size errors

Error: Message too large or quarantined due to size

Symptoms:

  • Messages sent to Quarantine Data Source
  • Errors about message size limits
  • Large messages not being ingested

Root causes:

  1. Message exceeds Tinybird's 10 MB default limit
  2. Large payloads causing memory issues
  3. Compression not reducing message size effectively

Solutions:

  1. Check message size:

    • Review Quarantine Data Source for size-related errors
    • Verify message sizes in your Kafka topic
    • Use Kafka tools to inspect message sizes
  2. Implement compression:

    • Use Kafka compression to reduce message size
    • Consider compressing large payloads before producing to Kafka
  3. Split large messages:

    • Break large messages into smaller chunks
    • Use message headers to track message parts
    • Reassemble in Materialized Views if needed
  4. Alternative approaches:

    • Store large payloads in object storage (S3, GCS) and reference them in Kafka messages
    • Use external storage for large binary data

For detailed guidance on handling large messages, see the message size handling guide.

Quarantine Data Source issues

Understanding Quarantine Data Source

When messages fail to ingest into your main Data Source, they are automatically sent to a Quarantine Data Source. This prevents data loss and allows you to inspect problematic messages.

Common reasons for quarantine:

  • Schema mismatches
  • Invalid data formats
  • Type conversion errors
  • Missing required fields
  • Deserialization failures
  • Message size limits exceeded

How to inspect quarantined messages:

SELECT *
FROM your_datasource_quarantine
WHERE timestamp > now() - INTERVAL 24 hour
ORDER BY timestamp DESC
LIMIT 100

How to resolve:

  1. Identify patterns in quarantined messages
  2. Fix schema or data quality issues
  3. Update Data Source schema if needed
  4. Fix data producers to send correct formats
  5. Consider reprocessing quarantined messages after fixes

For more information, see Quarantine Data Sources.

Getting help

If you've tried the preceding solutions and still experience issues:

  1. Collect diagnostic information:

    • Recent errors from kafka_ops_log
    • Recent errors from datasources_ops_log
    • Connection configuration (without secrets)
    • Data Source schema
    • Sample of problematic messages (if available)
  2. Check monitoring:

    • Review Kafka monitoring guide
    • Check Service Data Sources for additional context
    • Query both kafka_ops_log and datasources_ops_log for complete picture
  3. Review related guides:

  4. Contact support:

    • Provide error messages and timestamps from both logs
    • Include relevant queries from kafka_ops_log and datasources_ops_log
    • Share configuration details (sanitized)
    • Describe steps to reproduce the issue

Prevention best practices

  1. Use unique consumer group IDs for each Data Source and environment
  2. Test schema changes in development before deploying to production
  3. Monitor both kafka_ops_log and datasources_ops_log regularly to catch issues early
  4. Set up automated alerts for high lag or error rates in both logs using monitoring tools
  5. Review Quarantine Data Source periodically to identify data quality issues
  6. Test connections using tb connection data <connection_name> to preview data before deploying
  7. Document consumer group usage to avoid conflicts
  8. Test with sample messages before connecting production topics
  9. Use environment-specific configurations for development, staging, and production
  10. Keep credentials secure using Tinybird secrets, never hardcode them
  11. Regularly review Kafka ACLs to ensure proper permissions
  12. Monitor for missing tables and recreate Data Sources if accidentally deleted
  13. Verify topic and partition availability before deploying connectors
  14. Optimize Materialized View queries to prevent timeout and memory errors
  15. Set up monitoring for authorization errors to catch permission issues early

Integrate with your monitoring stack: Connect the monitoring queries in this guide to your existing monitoring tools. Query the ClickHouse® HTTP interface directly from Grafana, Datadog, PagerDuty, Slack, and other alerting systems. You can also create API endpoints from these queries, or export them in Prometheus format for Prometheus-compatible tools. This activates proactive monitoring and automated alerting for your Kafka connectors.

For comprehensive monitoring queries and alerts, see Monitor Kafka connectors.

Updated