Dec 09, 2021

Performance and Kafka compression

The **unmodified** support message we sent to one of our clients outlining potential performance gains through Kafka compression
David Manzanares
Software Engineer

This is the unmodified (well, I removed some references and names) message we sent to one of our clients (who uses Kafka heavily) via support after migrating them to our newest version of the Kafka connector. In this case the exchange took place through Slack, which is becoming the premium support channel for developers.

I wanted to share it because we spend quite a lot of time researching and it’s easy to forget how hard these things are and the amount of effort involved in providing outstanding support. Here it is:

Regarding performance, today’s migration includes optimizations that will allow us to sustain much higher loads with Kafka.

However, benchmarking with your topic using the tinybird-XXXXXX groupID has shown that the optimizations on our system won’t be able to deliver significant improvements right now, as your Kafka cluster reading throughput is the limiting factor.

Nevertheless, we have tested the behavior with all Kafka compression codecs (snappy, lz4, gzip and zstd) and with different compression levels, using a significant sample of your data. We have seen a massive improvement when setting the producer’s compression to zstd and high compression levels.

Max throughput:

  • current: ~9 million records / minute
  • zstd with compression level 12: ~19 million records / minute
  • zstd with compression level 8: ~18 million records / minute

Of course, using level 12 would use a higher CPU load than 8. However, zstd is pretty fast, my laptop is able to produce (and compress) at a rate of 4.3 million records per minute using level 12. With level 8, the producing rate improves to 5.3.

On top of that, there are other configuration parameters that can affect Kafka cluster brokers by loading their broker’s CPU, and thus reducing the reading performance. We are able to test the maximum reading throughput, but during regular producing throughput. A real peak will imply a higher load at your Kafka cluster servers. Confluent has an optimization guide that could improve things: https://docs.confluent.io/cloud/current/client-apps/optimizing/throughput.html

Lastly, we keep seeing degraded performance on 3 of the 24 partitions, partitions number 3, 8, and 10. Although they are able to keep up during regular loads, they are not able to keep up under heavy loads, significantly lagging behind the other partitions.

In summary:

  • We highly recommend to compress on producers using zstd with a level between 8 and 12.
  • We recommend to change other configuration parameters following https://docs.confluent.io/cloud/current/client-apps/optimizing/throughput.html
  • We should keep investigating the problem with the 3 partitions.
Do you like this post?

Related posts

Real-time Data Visualization: How to build faster dashboards
A new way to create intermediate Data Sources in Tinybird
Tinybird
Team
Jun 15, 2023
Export data from Tinybird to Amazon S3 with the S3 Sink
Tinybird
Team
Mar 21, 2024
Tinybird: A ksqlDB alternative when stateful stream processing isn't enough
To the limits of SQL... and beyond
Automating data workflows with plaintext files and Git
Chatting GraphQL with Jamie Barton of Grafbase
Tinybird
Team
Apr 24, 2023
What it takes to build a real-time recommendation system
We launched an open source ClickHouse Knowledge Base
Tinybird
Team
Oct 11, 2022
The definition of real-time data

Build fast data products, faster.

Try Tinybird and bring your data sources together and enable engineers to build with data in minutes. No credit card required, free to get started.
Need more? Contact sales for Enterprise support.