Performance optimization¶

This guide covers strategies for optimizing your Kafka connector performance, focusing on schema design, Materialized View optimization, and best practices.

Schema optimization¶

Use explicit schemas¶

Explicit schemas are faster and more efficient than schemaless:

Recommended:

SCHEMA >
    `user_id` String `json:$.user_id`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `timestamp` DateTime `json:$.timestamp`

Avoid (slower):

SCHEMA >
    `data` String `json:$`  -- Requires parsing at query time

Optimize data types¶

Use LowCardinality(String) for enum-like fields
Use smallest integer type needed (Int32 vs Int64)
Use DateTime for timestamps (not String)
Use Nullable() only when needed

Example:

SCHEMA >
    `user_id` String `json:$.user_id`,
    `event_type` LowCardinality(String) `json:$.event_type`,
    `timestamp` DateTime `json:$.timestamp`,
    `count` Int32 `json:$.count`,
    `metadata` Nullable(String) `json:$.metadata`  -- Only if needed

Materialized View optimization¶

Complex Materialized Views can slow down ingestion. Materialized Views that trigger on append operations from Kafka data sources can impact ingestion performance, especially if they perform expensive aggregations or joins.

Optimization strategies¶

Simplify aggregations - Keep aggregations efficient
Add filters - Reduce data volume processed
Optimize joins - Use appropriate join strategies
Avoid cascade MVs - Don't create multiple Materialized Views from the same Kafka data source, as this increases ingestion latency
Limit MVs per data source - Too many Materialized Views reading from the same Kafka data source can slow down ingestion

Partition distribution¶

Ensure even partition distribution to maximize throughput. Monitor partition lag:

SELECT
    partition,
    max(lag) as max_lag,
    avg(lag) as avg_lag,
    sum(processed_messages) as total_processed
FROM tinybird.kafka_ops_log
WHERE timestamp > now() - INTERVAL 1 hour
  AND partition >= 0
GROUP BY partition
ORDER BY max_lag DESC

Uneven distribution may indicate:

Poor partition key design
Hot partitions
Need for more partitions

See the partitioning strategies guide for detailed guidance.

Common performance bottlenecks¶

Schema parsing¶

Symptoms:

High CPU usage
Slow message processing
Low throughput

Solutions:

Use explicit schemas instead of schemaless
Optimize JSONPath expressions
Reduce schema complexity
Use appropriate data types

Materialized Views¶

Symptoms:

Slow ingestion
High memory usage
Timeouts in Materialized Views

Solutions:

Simplify Materialized View queries
Add filters to reduce data volume
Avoid cascade MVs or multiple MVs from the same Kafka data source
Optimize aggregations

Partition imbalance¶

Symptoms:

Uneven lag across partitions
Some partitions slow
Overall throughput limited

Solutions:

Review partition key strategy
Redistribute messages more evenly
Increase partitions if needed
Monitor partition distribution

Best practices¶

Use explicit schemas - Faster parsing and better performance
Optimize data types - Use smallest types needed, LowCardinality for enums
Simplify Materialized Views - Keep MVs efficient to avoid slowing ingestion
Ensure even partition distribution - Monitor and optimize partition keys
Monitor performance - Track lag, throughput, and error rates regularly

Monitor Kafka connectors - Comprehensive monitoring queries and metrics
Partitioning strategies guide - Optimize partition distribution
Troubleshooting guide - Resolve performance issues

Get started

Ingest data

Work with data

AI agents

Test and deploy

Monitor your data

Administration

Pricing

Deployment options

Reference

Performance optimization¶

Schema optimization¶

Use explicit schemas¶

Optimize data types¶

Materialized View optimization¶

Optimization strategies¶

Partition distribution¶

Common performance bottlenecks¶

Schema parsing¶

Materialized Views¶

Partition imbalance¶

Best practices¶

Performance optimization¶

Schema optimization¶

Use explicit schemas¶

Optimize data types¶

Materialized View optimization¶

Optimization strategies¶

Partition distribution¶

Common performance bottlenecks¶

Schema parsing¶

Materialized Views¶

Partition imbalance¶

Best practices¶

Related documentation¶