Real-time data analytics, a definitive guide
We are well into the data-driven era of business, where nearly all decisions are informed in some measure by accurate, quantitative information about how the business is currently performing. The fresher the data, the more effective decision-making becomes. The days of generating dashboards in the morning based on data processed overnight are over. Now, we look to get information and make decisions in real time.
Real-time data analytics is the discipline of capturing data as it happens and immediately generating insights that spur action.
Real-time data analytics is the discipline of capturing data as it happens and immediately using it to make better decisions, optimize operations, improve customer experiences, drive revenue growth, and more. Software creators need access to fresher data when producing dashboards for business analysts or building dynamic user-facing applications. Real-time data analytics can often be the game-changer and competitive differentiator that they seek.
Businesses that can harness real-time data and build low-latency analytics into their applications stand to gain much in the coming years. But, real-time data analytics can come at a cost if approached the wrong way, something today’s cost-conscious engineering teams are forced to grapple with. Everybody likes the idea of real-time data analytics, but many carry the belief that the costs will outweigh the benefits. How can you use real-time data analytics while staying within your people, hardware, and cloud budgets?
Tinybird is the real-time data platform that helps you build amazing user-facing applications based on the freshest possible data. With Tinybird, you can ingest relentless amounts of real-time data at scale from multiple sources, query and shape it using SQL, and publish your queries as high-concurrency, low-latency HTTP APIs. Use Tinybird to build dynamic new experiences, improve decision-making, and - perhaps most importantly - save time, money, and headaches as you build real-time analytics.
This is the definitive guide to real-time data analytics.
This is the definitive guide to real-time data analytics. If you want to learn more about real-time data analytics - why it’s helpful, where it’s being used, what technology to choose, and how to maintain control of your costs - then you’ve come to the right place.
This guide will answer the following questions:
- What is real-time data?
- What is the state of real-time data analytics in 2023?
- What are the benefits of real-time data platforms?
- What are some example use cases for real-time data analytics?
- What challenges will you face when building real-time data analytics?
- What are the essential tools for building real-time data applications?
- How do you reduce costs when building real-time data analytics?
- What to look for in a real-time data platform?
- How do you get started with real-time data analytics?
What is real-time data?
Real-time data is data that is available for analysis as soon as it is generated. With real-time analytics, people and software can make immediate decisions based on real-time data. Data is most valuable when it’s fresh, and real-time data analytics maximizes data freshness when generating insight.
Real-time data analytics demands a shift from traditional ways of approaching business analytics. In the past decade, data warehouses were brilliant for building long-running analytics that power business intelligence reports, and the data engineers that have built and maintained them have become prized members of engineering teams across many industries.
Real-time data analytics demands a shift away from traditional ways of thinking about data processing.
In more recent times, however, the rise of real-time data analytics has been influenced by the growing desire to embed the kinds of analytics pipelines that data engineers have created into user-facing applications. Now, data engineers and software engineers must come together to build real-time data infrastructure that not only generates insights but infuses them into user experiences.
It is in these use cases where the predominance of data warehouses and batch processing has not kept pace due to their technical limitations.
What’s the difference between real-time data analytics and batch processing?
In contrast to real-time data analytics, batch processing and batch data analytics are functionally designed to answer queries made repeatedly and on a schedule.
Batch analytics was borne out of the “Big Data” movement and is useful in long-range business decision-making, measuring performance against goals across time horizons like months, quarters, and years. Batch analytics looks at the past to make decisions about the long-term future. It answers questions like “Will we hit our quarterly revenue numbers?” or “What was the product we sold the most of in Mexico last month?”
Likewise, batch analytics has proven tremendously useful alongside the adoption of data science methodologies, as it can be used to train models that need to crunch and re-crunch large amounts of data over time.
Batch processing takes advantage of many technical approaches - most notably data warehousing - that understand that its functional requirement is to inform long-term business decision-making, most often at the executive and management levels.
Real-time data analytics, on the other hand, relies on real-time data processing to help with the tangible, day-to-day, hour-to-hour, and minute-to-minute decisions that materially impact how a business operates. Where batch focuses on measuring the past to predict or inform the future, real-time data analytics focuses on the present. It answers questions like “Do we need to order or redistribute stock today?” or “Which offer should we show this customer right now?”
Real-time data is increasingly used to generate insights that don't only inform humans, but also automate software.
Real-time data analytics can inform decisions made by humans (via real-time data visualizations), but increasingly it’s used to automate decision-making within applications and services, driving second-to-second course corrections that previously took weeks or months to make. New real-time data streaming architectures and real-time data platforms have arisen to help developers and data teams meet this need.
Is real-time analytics the same as streaming analytics?
Sometimes real-time data analytics is confused with streaming analytics. There are several streaming analytics products available today. They work great for some streaming use cases, but they all fall short when handling the high-concurrency, low-latency demands of real-time applications.
That's because they don't leverage a full OLAP database, like ClickHouse for example, that enables arbitrary time spans (vs. window functions), advanced joins for complex use cases, managed materialized views for rollups, and many other real-time data requirements.
Streaming analytics answers questions about a particular thing at a particular moment. Questions like “Is this thing A or B?” or “Does this piece of data have A in it?” as data streams through. Streaming data analytics allows you to ask simple questions about a few things very close together in time. It can offer very low latency, but it comes with a catch: it has limited “memory.”
Real-time data analytics, in contrast, has a long memory. It focuses on very quickly inserting data - and retaining all historical data - to answer questions about current and historical events.
By retaining historical data, and updating the full record very quickly, real-time data analytics lets you ask questions about data that is happening right now compared to data that happened in the past.
For example, consider an online retailer. They want to show a visitor the best possible offer so that they’ll buy something. With real-time data, the retailer can compare that visitor’s current browsing behavior during the session with historical browsing behavior and conversion metrics by past visitors within the same cohort.
The result is a personalized offer based on real-time information that boosts conversion rates and increases average order value.
This is the power of real-time data: influencing things that are happening right now based on the deep insights available from analyzing historical and current data.
The state of real-time data analytics in 2023
Modern real-time data analytics applications power more than just dashboards. While faster dashboards certainly are a tangible byproduct of the modern movement towards real time, they don’t capture the complete state of the art for real-time data analytics in 2023.
In addition to powering real-time dashboards, real-time data applications are often - and increasingly - directly connected to other user-facing apps.
Today, a new class of real-time data platforms enables such use cases.
These platforms are built around three core tenets:
- High-frequency ingestion of events and dimensions from multiple data sources. Real-time data analytics demands a database that can handle writes at hundreds of megabytes per second magnitude and with very low latency. In general, this requires real-time databases that are optimized for real-time data ingestion. ClickHouse, for example, claims insert throughput performance of 50-200 MB/s and up to 1M+ rows per second in common use cases, so it fits the bill for real-time analytics use cases.
- Real-time data processing and transformation. As a class of data analytics, real-time data analytics involves real-time data processing: aggregating (almost always), filtering (usually), and enriching (sometimes) data as it streams in. In some real-time architectures, these transformations happen at query time, but in some cases, transformations must happen as data as ingested using materialized views or snapshots so that transformations themselves are persisted into storage.
- Low-latency, high-concurrency publication layer. Finally, real-time data analytics offers an API or query language that exposes analytics metrics to dashboards and user-facing applications. Request latency should be measured in hundreds of milliseconds or less to avoid a subpar user experience.
Real-time data platforms have 3 core tenets: high-frequency database inserts, real-time data transformations, and a low-latency publication layer.
Real-time data platforms enable a whole new class of applications and use cases, such as:
- Cybersecurity applications that can intelligently detect patterns in real time and take automated action, such as adjusting DNS or updating firewall deny-lists.
- Personalized travel booking experiences that put the best offer in front of potential hotel patrons based on their current session data.
- Centralized cryptocurrency trading platforms that optimize trades for crypto market makers.
- Conversion-optimized eCommerce stores that display products most likely to be purchased by a specific visitor, and track the immediate performance of single-day flash sales as they’re happening.
- In-product analytics for content creators that shows them up-to-date data on how users are interacting with what they’ve made.
- High-precision, privacy-focused web analytics applications that track user behavior across a website.
- Stock management systems that identify when inventory needs to be diverted to alternative warehouses based on current user purchasing trends.
The benefits of real-time data platforms
Here are a handful of the advantages of real-time data platforms:
- Faster decision making. Real-time data analytics answers complex questions within milliseconds, a feat that batch processing cannot achieve. In doing so, it allows for time-sensitive reactions and interventions (for example, in healthcare, manufacturing, or retail settings) made by humans who can interpret data more quickly to spur faster decisions.
- Automated, intelligent software. Real-time data doesn’t just boost human decision making, but increasingly enables automated decisions within software. Software applications and services can interact with the outputs of real-time analytics systems to automate functions based on real-time metrics.
- Improved user experiences. Real-time data can provide insights into customer behavior, preferences, and sentiment as they use products and services. Applications can then provide interactive tools that respond to customer usage, share information with customers through transparent in-product visualizations, or personalize their product experience within an active session.
- Better cost and process efficiencies. Real-time data can be used to optimize business processes, reducing costs and improving efficiency. This could include identifying and acting on cost-saving opportunities, such as reducing energy consumption in manufacturing processes. Real-time data analytics can also help identify performance bottlenecks or identify testing problems early, enabling developers to quickly optimize application performance both before and after moving systems to production.
- Powerful differentiation. Real-time data can create a competitive moat for businesses that build it well. It gives a two-pronged speed advantage: Faster time to market and faster response times to customer needs. These two things make real-time data analytics a powerful differentiator.
Real-time data enables faster human decision making, intelligent software automation, better cost efficiencies, and powerful differentiation.
Use cases for real-time data analytics
Here are several examples of real-time data analytics use cases that can improve customer experiences, unlock new business value, and optimize systems:
- Sports betting and gaming. Real-time data can help sports betting and gaming companies reduce time-to-first-bet, improve the customer experience through real-time personalization, maintain leaderboards in real-time based on in-game events, segment users in real-time for personalized marketing campaigns, and reduce the risk of fraud. (learn more about Sports Betting and Gaming Solutions)
- Inventory and stock management. Real-time data can help online retailers optimize their fulfillment centers and retail location inventory to reduce costs, provide a modern customer experience with real-time inventory and availability, improve operational efficiency by streamlining supply chains, and make better decisions using real-time data and insights into trends and customer behavior. (learn more about Smart Inventory Management)
- Website analytics. Real-time data can help website owners monitor user behavior as it happens, enabling them to make data-driven decisions that can improve user engagement and conversion rates even during active sessions. (related: How an eCommerce giant replaced Google Analytics)
- Personalization. Real-time data can help companies personalize user experiences as a customer is using a product or service, based on up-to-the-second user behavior, preferences, history, cohort analysis, and much more. (learn more about Real-time Personalization)
- In-product analytics. Real-time data can give product owners the power to inform their end users with up-to-date and relevant metrics related to product usage and adoption, which can help users understand the value of the product and reduce churn. (learn more about In-Product Analytics)
- Operational intelligence. Real-time data can help companies monitor and optimize operational performance, enabling them to detect and remediate issues the moment they happen and improve overall efficiency. (learn more about Operational Intelligence)
- Anomaly detection and alerts. Real-time data can be used to detect real-time anomalies, for example from Internet of Things (IoT) sensors, and not only trigger alerts but build self-healing infrastructure. (learn more about Anomaly Detection & Alerts)
- Software log analytics. Real-time data can help software developers build solutions over application logs, enabling them to increase their development velocity, identify issues, and remediate them before they impact end users. (Related: Cutting CI pipeline execution time by 60%)
- Trend forecasting. Across broad industry categories, real-time data analytics can be used as predictive analytics to forecast trends based on the most recent data available. (Related: Using SQL and Python to create alerts from predictions)
- Usage-based pricing. Real-time data can help companies implement usage-based pricing models, enabling them to offer personalized pricing based on real-time usage data. (learn more about Usage-Based Pricing)
- Logistics management. Real-time data can help logistics companies optimize routing and scheduling, enabling them to improve delivery times and reduce costs.
- Security information and event management. Real-time data can help companies detect security threats and trigger automated responses, enabling them to mitigate risk and protect sensitive data.
- Financial services. Real-time data can be used for real-time fraud detection. Fraudulent transactions can be compared to historical trends so that such transactions can be stopped before they go through.
- Customer 360s. Real-time data can help companies build a comprehensive and up-to-date view of their customers, enabling them to offer personalized experiences and improve customer satisfaction.
- Artificial intelligence and machine learning. Real-time data can power online feature stores to help AI and ML models learn from accurate, fresh data, enabling them to improve accuracy and predictive performance in data science projects over time.
The challenges you’ll face when building real-time data analytics
Building a real-time data application can feel daunting. In particular, 7 key challenges arise when building real-time analytics:
- Using the right tools for the job
- Adopting a real-time mindset
- Managing cross-team collaboration
- Handling scale
- Enabling real-time observability
- Evolving data projects in production
- Controlling costs
Using the right tools for real-time data
Real-time data analytics demands a different toolset than do traditional data pipelines or app development. Instead of data warehouses, batch ETLs, DAGs, and OLTP or document-store app databases, engineers building real-time analytics need to use streaming technologies, real-time databases, and API layers effectively.
And because speed is so critical in real-time analytics, engineers must bridge these components with minimal latency, or turn to a real-time data platform that integrates each function.
Either way, developers and data teams must adopt new tools when building real-time applications.
Adopting a real-time mindset
Of course, using new tools won’t help if you’re stuck in a batch mindset.
Batch processing (and batch tooling like dbt or Airflow) often involves running the same query on a regular basis over data to constantly recalculate certain results based on new data. In effect, much of the same data gets processed many times.
But if you need to have access to those results in real-time (or over fresh data), that way of thinking does not help you.
Engineers comfortable with batch processes need to think differently when building real-time data analytics.
A real-time mindset focuses on minimizing data processing - optimizing to process raw data only once - to both improve performance and keep costs low.
In order to minimize query latencies and process data at scale while it’s still fresh, you have to:
- Filter out and avoid processing anything that’s not absolutely essential to your use case, to keep things light and fast.
- Consider materializing or enriching data at ingestion time rather than query time, so that you make your downstream queries more performant (and avoid constantly scanning the same data).
- Keep an optimization mindset at all times: the less data you have to scan or process, the lower the latency you’ll be able to provide to within your applications, and the more queries that you’ll be able to push through each CPU core.
Real-time data analytics combines the scale of “Big Data” with the performance and uptime requirements of user-facing applications.
Batch processes are also less prone to the negative effects caused by spikes in data production. Like a dam, they can control the flow of data. But real-time applications must be able to handle and process ingestion peaks in real-time. Consider an eCommerce store on Black Friday. To support use cases like in-session personalization during traffic surges, your real-time infrastructure must respond to and scale with massive data spikes.
To succeed with real-time data, engineers need to be able to manage and maintain, data projects at scale and in production. This can be difficult without adding additional tooling and resources.
Enabling real-time observability
Failures in real-time infrastructure happen fast. Detecting and remediating scenarios that can negatively impact production requires real-time observability that can keep up with real-time infrastructure.
If you’re building real-time data analytics in applications, it’s not enough for those applications to serve low-latency APIs. Your observability and alerting tools need to have similarly fast response times so that you can detect user-affecting problems quickly.
Evolving data projects in production
In a batch context, schema migrations and failed data pipelines might only affect internal consumers, and the effects appear more slowly. But in real-time applications, these changes will have immediate and often external ramifications.
For example, changing a schema in a dbt pipeline that runs every hour gives you exactly one hour to deploy and test new changes without affecting any business process.
Schema migrations in real-time have zero margin for error.
Changes in real-time infrastructure, on the other hand, only offer milliseconds before downstream processes are affected. In real-time applications, schema evolutions and business logic changes are more akin to changes in software backend applications, where an introduced bug will have an immediate and user-facing effect.
In other words, changing a schema while you are writing and querying over 200,000 records per second is challenging, so good migration strategy and tooling around deployments is critical.
Managing cross-team collaboration
Up until recently, data engineers and software developers often focused on different objectives. Data engineers and data platform teams built infrastructure and pipelines to serve business intelligence needs. Software developers and product teams designed and built applications for external users.
With real-time data analytics, these two functions must come together. Companies pursuing real-time analytics must lean on data engineers and platform teams to build real-time infrastructure or APIs that developers can easily discover and build with. Developers must understand how to use these APIs to build real-time data applications.
As you and your data grow, managing this collaboration becomes critical. You need systems and workflows in place that let developers and engineers “flow” in their work while still enabling effective cross-team work.
This shift in workflows may feel unfamiliar and slow. Still, data engineers and software developers will have to work closely to succeed with real-time data analytics.
Controlling the cost of real-time data
This final challenge is ultimately a culmination of the prior six. New tools, new ways of working, increased collaboration, added scale, and complex deployment models all introduce new dependencies and requirements that, depending on your design, can either create massive cost efficiency savings, or - if you get it wrong - serious cost sinks.
If you’re not careful, added costs can appear anywhere and in many ways: more infrastructure and maintenance, more SREs, slower time to market, added tooling. Many are concerned that the cost of real-time data analytics will outweigh the benefits.
There is always a cost associated with change, but if you do it right, you can achieve an impressive ROI. With the right tools, mindset, and architecture, real-time data can simultaneously cut the cost of building new user-facing features while boosting revenue through powerful differentiation.
Despite its challenges, real-time data analytics not only increases cost efficiency but also boosts revenue - if approached the right way.
The essential components of a real-time data platform
Real-time data architectures consist of 3 core components:
- Streaming technology
- Real-time OLAP databases
- Real-time API layers
Streaming technology for real-time data
Since real-time data analytics requires high-frequency ingestion of events data, you’ll need a reliable way to capture streams of data generated by applications and other systems.
The most commonly used technology is Apache Kafka, an open-source distributed event streaming platform used by many. Within the Kafka ecosystem exist many “flavors” of Kafka offered as a service or with alternative client-side libraries. Notable options here include:
- Amazon MSK
While Kafka and its offshoots are broadly favored in this space, a few alternatives have been widely adopted, for example:
- Google Pub/Sub
- Amazon Kinesis
- Tinybird Events API
Regardless of which streaming platform you choose, the ability to capture streaming data is fundamental to the real-time data stack.
Streaming technology is fundamental to real-time data analytics, capturing and transporting data as soon as it's generated.
Real-time OLAP databases
Real-time data architectures include a columnar, OLAP database that can store incoming and historical events data and make it available for low-latency querying.
Real-time databases should offer high throughput on inserts, columnar storage for compression and low-latency reads, and functional integrations with publication layers.
Critically, most standard transactional and document-store databases are not suitable for real-time analytics, so a column-oriented OLAP should be the database of choice.
The following databases have emerged as the most popular open-source real-time databases:
Real-time databases are built for high-frequency inserts, complex analytics over large amounts of data, and low-latency querying.
Real-time API layers
To make use of data that has been stored in real-time databases, developers need a publication layer to expose queries made on that database to external applications, services, or real-time data visualizations. This often takes the form of an ORM or an API framework.
One particular challenge with building real-time data architectures is that analytical application databases tend to have less robust ecosystems than their OLTP counterparts, so there are often fewer options to choose from here, and those that exist tend to be less mature and with smaller communities.
So, publication layers for real-time data analytics generally require that you build your own custom backend to meet the needs of your application. This means building yet another HTTP API using tools like:
- FastAPI (Python)
- Hyper (Rust)
- Gin (Go)
A real-time analytics publication layer turns databases queries into low-latency APIs to be consumed by user-facing applications.
Each of the 3 core components - streaming technology, OLAP database, and publication layer - matters when building the ideal real-time data architecture, and while such an architecture can be constructed piecemeal, beware of technical handoffs that inevitably introduce latency and complexity.
Tinybird is the leading real-time data platform in 2023
The next wave of real-time applications and systems require extraordinary processing speed and storage and have been historically difficult and expensive to build.
But that changes with Tinybird.
Tinybird is the industry-leading real-time data platform. With Tinybird, developers and data teams can harness the power of real-time data to quickly and cost-effectively build real-time data analytics and the applications they power.
Tinybird combines the 3 components of real-time data architectures discussed above - streaming ingestion, OLAP database, and publication layer - into a single platform with a delightful developer experience.
Tinybird combines streaming, OLAP, and publication in a single, integrated platform with a delightful developer experience.
Use Tinybird to ingest data from multiple sources at streaming throughput, query and shape that data using the 100% pure SQL you already know and love, and publish your queries as low-latency, high-concurrency REST APIs to consume in your applications.
Put simply, with Tinybird, developers can create fast APIs, faster over real-time data at scale. What used to take hours and days now only takes minutes. Tinybird is the indispensable tool data engineers and software developers have been waiting for.
With Tinybird, developers can build fast APIs, faster, over streaming data at scale.
What makes Tinybird the top real-time data platform?
Tinybird is a force multiplier for data teams and developers building real-time data analytics. Here are the factors that influence Tinybird’s position as the top real-time data platform in 2023.
- Performance. Performance is critical when it comes to real-time data platforms. Tinybird is a serverless platform built on top of ClickHouse, the world’s fastest OLAP database for real-time analytics. Tinybird can handle massive volumes of data at streaming scale, supports a wide array of SQL joins, offers materialized views and the performance advantages they offer, and still maintains API latencies under a second for even very complex queries and use cases.
- Developer experience. Developer experience is critical when building analytics into user experiences. Developers need to be able to build quickly, fail fast, and safely maintain production systems. Tinybird’s developer experience is unparalleled amongst real-time data platforms. Tinybird offers a straightforward, flexible, and familiar experience, with both UI and CLI workflows, SQL-based queries, and familiar CI/CD workflows when working with OLAP databases. Things like schema migrations and database branching can all be managed in code with version control. These things combined reduce the learning curve for developers, streamline the development process, and enhance the overall developer experience.
- Faster speed to market. Of course, developer comfort is only a part of the story. Perhaps more critically, a better developer experience shortens the time to market. Tinybird empowers developers to push enterprise-grade systems into production more quickly and with more confidence. Through a combination of intuitive interfaces, built-in observability, and enterprise support, Tinybird enables developers to ship production code much faster than alternatives - which shortens the time between scoping and monetization.
- Fewer moving parts. Simplicity is key when it comes to real-time data platforms. Whereas other real-time analytics solutions require piecemeal integrations to bridge ingestion, querying, and publication layers, Tinybird integrates the entire real-time data stack into a single platform. This eliminates the need for multiple tools and components and provides a simple, integrated, and performant approach to help you reduce costs and improve efficiency.
- Works well with everything you already have. Interoperability is also important when evaluating a real-time data platform. Tinybird easily integrates with existing tools and systems with open APIs, first-party data connectors, and plug-ins that enable you to integrate with popular tools such as databases, data warehouses, streaming platforms, data lakes, observability platforms, and business intelligence tools.
- Serverless scale. As a serverless real-time data platform, Tinybird allows product teams to focus on product development and release cycles, rather than scaling infrastructure. This not only improves product velocity, it also minimizes resource constraints. With Tinybird, engineering teams can hire fewer SREs that would otherwise be needed to manage, monitor, and scale databases and streaming infrastructure.
Tinybird is a high-performance, serverless real-time data platform that helps developers work faster so they can ship real-time data pipelines to production faster and with more confidence.
The economics of real-time data: how Tinybird saves you money
The operative phrase of 2023 is “do more with less.” You’re under pressure to reduce costs but maintain the same, if not greater, level of service. Fortunately, a real-time data platform like Tinybird can help you capture new value at a fraction of the cost.
Here are a few ways that Tinybird enables cost-effective development:
- Deliver impact across your business. Tinybird can ingest data from multiple sources and replace many existing solutions, from web analytics to inventory management to website personalization and much, much more.
- Developer productivity where it counts. Tinybird has a beautiful and intuitive developer experience. What used to take weeks can be accomplished in minutes. What used to take entire teams of engineers to build, debug, and maintain can now be accomplished by one industrious individual. Efficiency is the 2023 superpower, and Tinybird delivers.
- Reduced infrastructure and moving parts. Tinybird combines real-time ingestion, real-time data processing, querying, and publishing, enabling more use cases with less infrastructure to manage and fewer human resources required to manage said infrastructure. This means fewer hand-offs between teams and systems, faster time-to-market, and reduced costs.
- Automate your insights. React faster to opportunities and problems. No more waiting for those daily reports, you’ll be able to know in real time how your business is behaving and automate/operationalize your responses to it.
Tinybird reduces the cost of real-time data analytics by consolidating many tools into a single platform, reducing resource dependencies, and increasing developer velocity.
Getting started: Build real-time data analytics into your next project
So how do you begin to build real-time data analytics into your next development project? As this guide has demonstrated, there are 3 core steps to building real-time data analytics:
- Ingesting real-time data at streaming scale
- Querying the data to build analytics metrics
- Publishing the metrics to integrate into your apps
Tinybird is a real-time data platform that makes all of this possible. Below you’ll find practical steps on ingesting data from streaming platforms (and other sources), querying that data with SQL, and publishing low-latency, high-concurrency APIs for consumption within your applications.
If you’re new to Tinybird, you can try it out by signing up for a free-forever Build Plan, with no credit card required, no time restrictions, and generous free limits.
Ingesting real-time data into Tinybird
Tinybird supports ingestion from multiple sources, including streaming platforms, files, databases, and data warehouses. Here’s how to code ingestion from various sources using Tinybird.
Ingest real-time data from Kafka
Tinybird enables real-time data ingestion from Kafka using the native Kafka connector. You can use the Tinybird UI to set up your Kafka connection, choose your topics, and define your ingestion schema in a few clicks. Or, you can use the Tinybird CLI to develop Kafka ingestion pipelines from your terminal.
To learn more about building real-time data analytics on top of Kafka data, check out these resources:
- Docs - Tinybird Kafka Connector
- Screencast - Create REST APIs from Kafka streams in minutes
- Blog - From Kafka streams to data products
- Live Coding Session - Build low-latency analytics APIs on top of Kafka data
Note that this applies to any Kafka-compatible platform such as Confluent, Redpanda, Upstash, Aiven, or Amazon MSK.
Ingest real-time data from your Data Warehouse
Tinybird works as a real-time publication layer for data stored in data warehouses. With Tinybird, you can synchronize tables in your data warehouses - such as BigQuery, Redshift, or Snowflake - develop metrics in SQL, and publish those metrics as low-latency, high-concurrency APIs.
Tinybird’s Connector Development Kit has made it possible to quickly ingest data from many data warehouses into Tinybird.
Check out these resources below to learn how to build real-time data analytics on top of your data warehouse:
- Docs - Tinybird Snowflake Connector
- Docs - Tinybird BigQuery Connector
- Screencast - Sync BigQuery tables to Tinybird with the BigQuery Connector
- Blog - Transforming real-time applications with Tinybird and BigQuery
- Live Coding Session - Building real-time analytics with BigQuery
Ingest real-time data from CSV, NDJSON, and Parquet files
Tinybird enables data ingestion from CSV, NDJSON, and Parquet files, either locally on your machine or remotely in cloud storage such as GCP or S3 buckets. While data stored in files is often not generated in real time, it can be beneficial as dimensional data to join with data ingested through streaming platforms. Tinybird has wide coverage of SQL joins to make this possible.
You can ingest real-time data from files using the Tinybird UI, using the CLI, or using the Data Sources API.
Here are some resources to learn how to ingest data from local or remote files:
- Docs - Ingest data from CSV files into Tinybird
- Docs - How to ingest NDJSON data into Tinybird
- Docs - The Tinybird Datasources API
- Blog - Querying large CSVs online with SQL
- Screencast - Ingest data from a file into Tinybird
Ingest from your applications via HTTP
Perhaps the simplest way to capture real-time data into Tinybird is using the Events API, a simple HTTP endpoint that enables high-frequency ingestion of JSON records into Tinybird.
Because it’s just an HTTP endpoint, you can invoke the API from any application code. The Events API can handle ingestion at up to 1000 requests and 20+ MB per second, making it super scalable for most streaming use cases.
Check out the code snippets below for example usage in your favorite language.
For more info on building real-time data analytics on top of application data using the Events API, check out these resources:
- Docs - The Tinybird Events API
- Guide - Ingest data into Tinybird with an HTTP request
- Screencast - Stream data with the Tinybird Events API
Query and shape real-time data with SQL
Tinybird offers a delightful interface for building real-time analytics metrics using the SQL you know and love.
With Tinybird Pipes you can chop up more complex queries into chained, composable nodes of SQL. This simplifies development flow and makes it easy to identify queries that impede performance or increase latency.
Tinybird Pipes also include a robust templating language to extend your query logic beyond SQL and publish dynamic, parameterized endpoints from your queries.
Below are some example code snippets of SQL queries written in Tinybird Pipes for simple real-time analytics use cases.
For more info on building real-time data analytics metrics with Tinybird Pipes, check out these resources:
- Docs - What are Tinybird Pipes?
- Guide - Best practices for faster SQL queries
- Screencast - Create a Tinybird Pipe
- Blog - To the limits of SQL… and beyond
Publish real-time data APIs
Tinybird shines in its publication layer. Whereas other real-time data platforms or technologies may still demand that you build a custom backend to support user-facing applications, Tinybird massively simplifies application development with instant REST API publication from SQL queries.
Every API published from a Tinybird Pipe includes automatically generated, OpenAPI-compatible documentation, security through auth token management, and built-in observability dashboards and APIs to monitor endpoint performance and usage.
Furthermore, Tinybird APIs can be parameterized using a simple templating language. By utilizing the templating language in your SQL queries, you can build robust logic for dynamic API endpoints.
To learn more about how to build real-time data analytics APIs with Tinybird, check out these resources:
- Docs - Create APIs in Tinybird
- Screencast - Publish an API from an SQL Pipe
- Guide - Add advanced features to your Tinybird APIs
- Blog - Publish SQL-based API endpoints on NGINX log analytics
Ready to experience the industry-leading real-time data platform? Try Tinybird today, for free. Get started with the Build Plan - which is more than enough for most simple projects and has no time limit - and upgrade as you scale.