💧 Raindrop migrated from Postgres to Tinybird in 1 week.
Read how.
Back
Dec 09, 2021

Performance and Kafka compression

The **unmodified** support message we sent to one of our clients outlining potential performance gains through Kafka compression
David ManzanaresSoftware Engineer

This is the unmodified (well, I removed some references and names) message we sent to one of our clients (who uses Kafka heavily) via support after migrating them to our newest version of the Kafka connector. In this case the exchange took place through Slack, which is becoming the premium support channel for developers.

I wanted to share it because we spend quite a lot of time researching and it’s easy to forget how hard these things are and the amount of effort involved in providing outstanding support. Here it is:

Regarding performance, today’s migration includes optimizations that will allow us to sustain much higher loads with Kafka.

However, benchmarking with your topic using the tinybird-XXXXXX groupID has shown that the optimizations on our system won’t be able to deliver significant improvements right now, as your Kafka cluster reading throughput is the limiting factor.

Nevertheless, we have tested the behavior with all Kafka compression codecs (snappy, lz4, gzip and zstd) and with different compression levels, using a significant sample of your data. We have seen a massive improvement when setting the producer’s compression to zstd and high compression levels.

Max throughput:

Start building with Tinybird!
If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.
  • current: ~9 million records / minute
  • zstd with compression level 12: ~19 million records / minute
  • zstd with compression level 8: ~18 million records / minute

Of course, using level 12 would use a higher CPU load than 8. However, zstd is pretty fast, my laptop is able to produce (and compress) at a rate of 4.3 million records per minute using level 12. With level 8, the producing rate improves to 5.3.

On top of that, there are other configuration parameters that can affect Kafka cluster brokers by loading their broker’s CPU, and thus reducing the reading performance. We are able to test the maximum reading throughput, but during regular producing throughput. A real peak will imply a higher load at your Kafka cluster servers. Confluent has an optimization guide that could improve things: https://docs.confluent.io/cloud/current/client-apps/optimizing/throughput.html

Lastly, we keep seeing degraded performance on 3 of the 24 partitions, partitions number 3, 8, and 10. Although they are able to keep up during regular loads, they are not able to keep up under heavy loads, significantly lagging behind the other partitions.

In summary:

  • We highly recommend to compress on producers using zstd with a level between 8 and 12.
  • We recommend to change other configuration parameters following https://docs.confluent.io/cloud/current/client-apps/optimizing/throughput.html
  • We should keep investigating the problem with the 3 partitions.
Subscribe to our newsletter
Get 10 links weekly to the Data and AI articles the Tinybird team is reading.
Do you like this post? Spread it!

Related posts

Skip the infra work. Ship your first API today.

Read the docs
Tinybird wordmark