Performance optimization¶
This guide covers strategies for optimizing your Kafka connector performance, focusing on schema design, Materialized View optimization, and best practices.
Schema optimization¶
Use explicit schemas¶
Explicit schemas are faster and more efficient than schemaless:
Recommended:
SCHEMA >
`user_id` String `json:$.user_id`,
`event_type` LowCardinality(String) `json:$.event_type`,
`timestamp` DateTime `json:$.timestamp`
Avoid (slower):
SCHEMA >
`data` String `json:$` -- Requires parsing at query time
Optimize data types¶
- Use
LowCardinality(String)for enum-like fields - Use smallest integer type needed (
Int32vsInt64) - Use
DateTimefor timestamps (notString) - Use
Nullable()only when needed
Example:
SCHEMA >
`user_id` String `json:$.user_id`,
`event_type` LowCardinality(String) `json:$.event_type`,
`timestamp` DateTime `json:$.timestamp`,
`count` Int32 `json:$.count`,
`metadata` Nullable(String) `json:$.metadata` -- Only if needed
Materialized View optimization¶
Complex Materialized Views can slow down ingestion. Materialized Views that trigger on append operations from Kafka data sources can impact ingestion performance, especially if they perform expensive aggregations or joins.
Optimization strategies¶
- Simplify aggregations - Keep aggregations efficient
- Add filters - Reduce data volume processed
- Optimize joins - Use appropriate join strategies
- Avoid cascade MVs - Don't create multiple Materialized Views from the same Kafka data source, as this increases ingestion latency
- Limit MVs per data source - Too many Materialized Views reading from the same Kafka data source can slow down ingestion
Partition distribution¶
Ensure even partition distribution to maximize throughput. Monitor partition lag:
SELECT
partition,
max(lag) as max_lag,
avg(lag) as avg_lag,
sum(processed_messages) as total_processed
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
AND partition >= 0
GROUP BY partition
ORDER BY max_lag DESC
Uneven distribution may indicate:
- Poor partition key design
- Hot partitions
- Need for more partitions
See the partitioning strategies guide for detailed guidance.
Common performance bottlenecks¶
Schema parsing¶
Symptoms:
- High CPU usage
- Slow message processing
- Low throughput
Solutions:
- Use explicit schemas instead of schemaless
- Optimize JSONPath expressions
- Reduce schema complexity
- Use appropriate data types
Materialized Views¶
Symptoms:
- Slow ingestion
- High memory usage
- Timeouts in Materialized Views
Solutions:
- Simplify Materialized View queries
- Add filters to reduce data volume
- Avoid cascade MVs or multiple MVs from the same Kafka data source
- Optimize aggregations
Partition imbalance¶
Symptoms:
- Uneven lag across partitions
- Some partitions slow
- Overall throughput limited
Solutions:
- Review partition key strategy
- Redistribute messages more evenly
- Increase partitions if needed
- Monitor partition distribution
Best practices¶
- Use explicit schemas - Faster parsing and better performance
- Optimize data types - Use smallest types needed,
LowCardinalityfor enums - Simplify Materialized Views - Keep MVs efficient to avoid slowing ingestion
- Ensure even partition distribution - Monitor and optimize partition keys
- Monitor performance - Track lag, throughput, and error rates regularly
Related documentation¶
- Monitor Kafka connectors - Comprehensive monitoring queries and metrics
- Partitioning strategies guide - Optimize partition distribution
- Troubleshooting guide - Resolve performance issues