🚀
Introducing Versions: Develop data products using Git. Join the waitlist

Tinybird Customer Story

Factorial builds real-time data products with Tinybird

Factorial turned to Tinybird to build 12 new user-facing analytics features in just a few months.

No credit card needed

"For Factorial, it is key that our customers can use the data that we process for them to gain live insights about how they are running their business and empower fast decision making. Tinybird provides exactly the set of tools we need to very quickly deliver new user-facing data products over the data investments we’ve already made."

Marc Gonzalez - Director of Data at Factorial HR

~6.6 TB

processed per month

+80

requests per day

Founded in 2016, Factorial took aim at the stagnant HR software market with their own unique twist. Factorial is present in over 65 countries, having raised $220 million in venture funding, and employs over 950 people. The company’s goal is to bring modern HR solutions to more than 8,000 businesses worldwide, automating mundane HR tasks so that People leaders can focus on people, not paperwork.

1

Enabling new scenarios with real-time data

Using Tinybird, Factorial has improved its data freshness and reduced query latency, leading to significantly faster user feature launches. Factorial’s decision enabled them to accelerate their time to market and build great new customer experiences like Job Catalog, Audit Log, and Attendance without sacrificing reliability.

2

Real-time analytics and more use cases

Like many companies, Factorial began with a traditional batch pipeline using MySQL, Parquet, AWS Glue, Amazon S3, and Amazon Athena that was easy to set up in its early days, guided by the team’s previous experience building scalable data architectures. 

This was a simple and effective setup, and at the time, their data team consisted of a single engineer. However, as the product evolved, developers increasingly needed to use this data to build user-facing features, and this architecture did not satisfy two non-negotiable requirements: data freshness and low query latency.

Although the lake house architecture proved easy to implement and served internal reporting use cases perfectly, the schedule-driven pipelines to load the data made it difficult for developers to work with. When interacting with user-facing analytical features, users demand up-to-date data. In many cases, the data available in the lake house was stale, in some cases days, or at best, hours, old. This dramatically reduced the value of the data to an end user and thus made it unattractive for developers to use the data to power new product features. 

The developers made their requirements clear: they needed fresh data.

And they needed access to their data with low latency.

That’s what brought them to Tinybird.

"Our existing data engineering team is relatively small given the size of our company. We don’t have the time or manpower to worry about setting up and maintaining a complex streaming data mesh architecture. Tinybird handles our real-time infrastructure at scale and allows us to build new real-time applications using our existing skills."

Marc Gonzalez - Director of Data at Factorial HR

3

Unifying batch and streaming data using Tinybird

To solve for data freshness, the team decided to switch to capturing changes in real-time from MySQL rather than a batch process like running a snapshot on a schedule. They used the MySQL CDC Source (Debezium) Connector for Confluent Cloud to implement Change Data Capture (CDC) over their production MySQL. The MySQL CDC Source (Debezium) Connector captures changes from a database and writes the changes to Apache Kafka®.

Additionally, Factorial’s data team saw that Kafka could serve as a reliable buffer for data; if there were any failures downstream, data could buffer in Kafka and be retried. They assessed alternative tools, such as Amazon Kinesis and Google Pub/Sub. Still, they discovered that the offset semantics in Kafka proved more flexible, allowing them to more easily resume data consumers from a previous message in the stream.

Ultimately, the Factorial team chose to use Confluent Cloud, a fully managed Kafka implementation built and operated by the original creators of Kafka.

By sending the MySQL CDC stream to Kafka, Factorial eliminated the batch process that introduced significant latency to their data. To complement their streaming pipeline, they also needed a system that would allow their developers to combine their fresh Kafka streams with historical data to power user-facing applications and which could handle analytical queries in the order of milliseconds.

To make real-time analytical queries over data streams available to developers, Factorial chose Tinybird. Tinybird took away all of the operational overhead of managing real-time analytics data infrastructure at scale in production while fulfilling the strict requirements for latency and freshness. Tinybird allows Factorial to ingest from Confluent Cloud in real-time, using Tinybird’s native Confluent connector.

Data streams from Confluent arrive in Tinybird and are enriched with historical data that lives in the platform, avoiding the limitations of stateful stream processors. This means that nearly all data processing can be performed at the time of ingestion, with the result being materialized and ready for developers to push into production features. With only two data engineers, Factorial reduced the average query time for production queries from minutes to sub-50 milliseconds. Tinybird APIs are integrated directly into Factorial’s user-facing product, greatly simplifying their application architecture and saving on further infrastructure and tooling costs.

4

Speed wins: Tinybird fuels time-to-market, a key differentiator for Factorial

Factorial completed a POC and launched their first production feature in one month, and over the next six months launched more than 12 user-facing product features that are powered by this real-time pipeline, with many more to come.

“When we switched to Tinybird, we ran a PoC and shipped our first feature to production in a month. Since then, we’ve shipped 12 new user-facing features in just a few months. There’s no way we could have done this without Tinybird.”

Marc Gonzalez - Director of Data at Factorial HR

For developers

1

How is data ingested?

Data is ingested from MySQL using Confluent Cloud's MySQL CDC Source, then forwarded on to Tinybird via Tinybird's Confluent Connector.

2

How is data consumed?

Factorial engineers build real-time, SQL-based API endpoints in Tinybird that they consume within the frontend application to enable user-facing analytics.

3

How does Factorial manage data pipelines?

The data engineering team controls data ingestion and creates "source-of-truth" views from which domain experts and developers can build SQL-based APIs within Tinybird.