Tinybird Customer Story
Marc GonzĂĄlezDirector of Data at Factorial
18.853TBprocessed per month
7,675requests per day
Founded in 2016, Factorial took aim at the stagnant HR software market with their own unique twist. Factorial is present in over 65 countries, having raised $220 million in venture funding, and employs over 950 people. The companyâs goal is to bring modern HR solutions to more than 8,000 businesses worldwide, automating mundane HR tasks so that People leaders can focus on people, not paperwork.
Using Tinybird, Factorial has improved its data freshness and reduced query latency, leading to significantly faster user feature launches. Factorialâs decision enabled them to accelerate their time to market and build great new customer experiences like Job Catalog, Audit Log, and Attendance without sacrificing reliability.
Like many companies, Factorial began with a traditional batch pipeline using MySQL, Parquet, AWS Glue, Amazon S3, and Amazon Athena that was easy to set up in its early days, guided by the teamâs previous experience building scalable data architectures.
This was a simple and effective setup, and at the time, their data team consisted of a single engineer. However, as the product evolved, developers increasingly needed to use this data to build user-facing features, and this architecture did not satisfy two non-negotiable requirements: data freshness and low query latency.
Although the lake house architecture proved easy to implement and served internal reporting use cases perfectly, the schedule-driven pipelines to load the data made it difficult for developers to work with. When interacting with user-facing analytical features, users demand up-to-date data. In many cases, the data available in the lake house was stale, in some cases days, or at best, hours, old. This dramatically reduced the value of the data to an end user and thus made it unattractive for developers to use the data to power new product features.
The developers made their requirements clear: they needed fresh data.
And they needed access to their data with low latency.
Thatâs what brought them to Tinybird.
Marc GonzĂĄlezDirector of Data at Factorial
To solve for data freshness, the team decided to switch to capturing changes in real-time from MySQL rather than a batch process like running a snapshot on a schedule. They used the MySQL CDC Source (Debezium) Connector for Confluent Cloud to implement Change Data Capture (CDC) over their production MySQL. The MySQL CDC Source (Debezium) Connector captures changes from a database and writes the changes to Apache KafkaÂź.
Additionally, Factorialâs data team saw that Kafka could serve as a reliable buffer for data; if there were any failures downstream, data could buffer in Kafka and be retried. They assessed alternative tools, such as Amazon Kinesis and Google Pub/Sub. Still, they discovered that the offset semantics in Kafka proved more flexible, allowing them to more easily resume data consumers from a previous message in the stream.
Ultimately, the Factorial team chose to use Confluent Cloud, a fully managed Kafka implementation built and operated by the original creators of Kafka.
By sending the MySQL CDC stream to Kafka, Factorial eliminated the batch process that introduced significant latency to their data. To complement their streaming pipeline, they also needed a system that would allow their developers to combine their fresh Kafka streams with historical data to power user-facing applications and which could handle analytical queries in the order of milliseconds.
To make real-time analytical queries over data streams available to developers, Factorial chose Tinybird. Tinybird took away all of the operational overhead of managing real-time analytics data infrastructure at scale in production while fulfilling the strict requirements for latency and freshness. Tinybird allows Factorial to ingest from Confluent Cloud in real-time, using Tinybirdâs native Confluent connector.
Data streams from Confluent arrive in Tinybird and are enriched with historical data that lives in the platform, avoiding the limitations of stateful stream processors. This means that nearly all data processing can be performed at the time of ingestion, with the result being materialized and ready for developers to push into production features. With only two data engineers, Factorial reduced the average query time for production queries from minutes to sub-50 milliseconds. Tinybird APIs are integrated directly into Factorialâs user-facing product, greatly simplifying their application architecture and saving on further infrastructure and tooling costs.
Factorial completed a POC and launched their first production feature in one month, and over the next six months launched more than 12 user-facing product features that are powered by this real-time pipeline, with many more to come.
Marc GonzĂĄlezDirector of Data at Factorial
Spain
Calle del Dr. Fourquet, 27
28012 Madrid
USA
41 East 11th Street 11th floor
New York, NY 10003
Copyright © 2024 Tinybird. All rights reserved
|