Performance optimization

This guide covers strategies for optimizing your Kafka connector performance, focusing on schema design, Materialized View optimization, and best practices.

Schema optimization

Use explicit schemas

Explicit schemas are faster and more efficient than schemaless:

Recommended:

SCHEMA >
    `user_id` String `json:$.user_id`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `timestamp` DateTime `json:$.timestamp`

Avoid (slower):

SCHEMA >
    `data` String `json:$`  -- Requires parsing at query time

Optimize data types

  • Use LowCardinality(String) for enum-like fields
  • Use smallest integer type needed (Int32 vs Int64)
  • Use DateTime for timestamps (not String)
  • Use Nullable() only when needed

Example:

SCHEMA >
    `user_id` String `json:$.user_id`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `timestamp` DateTime `json:$.timestamp`,
    `count` Int32 `json:$.count`,
    `metadata` Nullable(String) `json:$.metadata`  -- Only if needed

Materialized View optimization

Complex Materialized Views can slow down ingestion. Materialized Views that trigger on append operations from Kafka data sources can impact ingestion performance, especially if they perform expensive aggregations or joins.

Optimization strategies

  1. Simplify aggregations - Keep aggregations efficient
  2. Add filters - Reduce data volume processed
  3. Optimize joins - Use appropriate join strategies
  4. Avoid cascade MVs - Don't create multiple Materialized Views from the same Kafka data source, as this increases ingestion latency
  5. Limit MVs per data source - Too many Materialized Views reading from the same Kafka data source can slow down ingestion

Partition distribution

Ensure even partition distribution to maximize throughput. Monitor partition lag:

SELECT
    partition,
    max(lag) as max_lag,
    avg(lag) as avg_lag,
    sum(processed_messages) as total_processed
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND partition >= 0
GROUP BY partition
ORDER BY max_lag DESC

Uneven distribution may indicate:

  • Poor partition key design
  • Hot partitions
  • Need for more partitions

See the partitioning strategies guide for detailed guidance.

Common performance bottlenecks

Schema parsing

Symptoms:

  • High CPU usage
  • Slow message processing
  • Low throughput

Solutions:

  1. Use explicit schemas instead of schemaless
  2. Optimize JSONPath expressions
  3. Reduce schema complexity
  4. Use appropriate data types

Materialized Views

Symptoms:

  • Slow ingestion
  • High memory usage
  • Timeouts in Materialized Views

Solutions:

  1. Simplify Materialized View queries
  2. Add filters to reduce data volume
  3. Avoid cascade MVs or multiple MVs from the same Kafka data source
  4. Optimize aggregations

Partition imbalance

Symptoms:

  • Uneven lag across partitions
  • Some partitions slow
  • Overall throughput limited

Solutions:

  1. Review partition key strategy
  2. Redistribute messages more evenly
  3. Increase partitions if needed
  4. Monitor partition distribution

Best practices

  1. Use explicit schemas - Faster parsing and better performance
  2. Optimize data types - Use smallest types needed, LowCardinality for enums
  3. Simplify Materialized Views - Keep MVs efficient to avoid slowing ingestion
  4. Ensure even partition distribution - Monitor and optimize partition keys
  5. Monitor performance - Track lag, throughput, and error rates regularly
Updated