---
title: "How to do real-time data processing for modern analytics 2026"
excerpt: "Discover what real-time data processing is, how it works, and which tools you need to analyze and act on data instantly."
authors: "Cameron Archer"
categories: "Scalable Analytics Architecture"
createdOn: "2023-09-25 00:00:00"
publishedOn: "2023-09-07 00:00:00"
updatedOn: "2025-11-10 00:00:00"
status: "published"
---

<p>Data analytics is changing. Batch is out. Real-time is in. And with this shift comes a new mindset, new tools, and new terminology that data engineers have to master.</p><p>Real-time data processing is growing in both importance and adoption among data teams. Its value can’t be understated, and data engineering and data platform teams are turning to tech and tools that can help them achieve it.</p><p>In this post, I’ll explain what real-time data processing is, why it isn’t what you <em>think </em>it is, and show you some useful reference architectures to help you plan, manage, and build a real-time data processing engine.</p><h2 id="what-is-real-time-data-processing">What is Real-Time Data Processing?</h2><p>Real-time data processing is the practice of filtering, aggregating, enriching, and otherwise transforming real-time data as quickly as it is generated. It follows <a href="https://www.tinybird.co/blog-posts/event-driven-architecture-best-practices-for-databases-and-files" rel="noreferrer">event-driven architecture</a> principles to initiate data processing rules upon event creation. </p><p>Real-time data processing is but one gear in the machinations of real-time data and <a href="https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide">real-time analytics</a>. Sitting squarely between <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">real-time data ingestion</a> and <a href="https://www.tinybird.co/blog-posts/real-time-data-visualization">real-time visualization</a> (or real-time data automation!), real-time data processing links the engine and the caboose of a real-time data train.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc49d49bb4f291074277_w05vO8lDmKQPZ_aPmq0AjJ5OlH6oVMds0TE49SftWSx-unZsqFiD2phaz4llNCr1qGoFA8_NehAa3YPqgvvE8vEx4u55T_50Wji_GP7ADs7wH_94g2lpZ2PS0fPBPhLt2vHqEfkRztalia5Wo0x1G4s-7.png" class="kg-image" alt="A 3-car train. The engine is labeled &quot;real-time ingestion&quot;, the middle car is labeled &quot;real-time processing&quot;, and the caboose is labeled &quot;real-time visualization&quot;" loading="lazy" width="1600" height="585" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc49d49bb4f291074277_w05vO8lDmKQPZ_aPmq0AjJ5OlH6oVMds0TE49SftWSx-unZsqFiD2phaz4llNCr1qGoFA8_NehAa3YPqgvvE8vEx4u55T_50Wji_GP7ADs7wH_94g2lpZ2PS0fPBPhLt2vHqEfkRztalia5Wo0x1G4s-7.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc49d49bb4f291074277_w05vO8lDmKQPZ_aPmq0AjJ5OlH6oVMds0TE49SftWSx-unZsqFiD2phaz4llNCr1qGoFA8_NehAa3YPqgvvE8vEx4u55T_50Wji_GP7ADs7wH_94g2lpZ2PS0fPBPhLt2vHqEfkRztalia5Wo0x1G4s-7.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc49d49bb4f291074277_w05vO8lDmKQPZ_aPmq0AjJ5OlH6oVMds0TE49SftWSx-unZsqFiD2phaz4llNCr1qGoFA8_NehAa3YPqgvvE8vEx4u55T_50Wji_GP7ADs7wH_94g2lpZ2PS0fPBPhLt2vHqEfkRztalia5Wo0x1G4s-7.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">All aboard the real-time train!</span></figcaption></figure><p>But don’t assume that just because it’s the middle child of the real-time data family it should go unnoticed by you and your team.</p><p>Mixed metaphors aside, the systems and tools that enable real-time data processing within <a href="https://www.tinybird.co/blog-posts/real-time-streaming-data-architectures-that-scale">real-time streaming data architectures</a> can quickly become bottlenecks. They’re tasked with maintaining data freshness on incoming data, ultra-low query latency on outgoing data, and high user concurrency while processing bigger and bigger “big data”.</p><p>If you’re <a href="https://www.tinybird.co/blog-posts/real-time-dashboard-step-by-step">building a real-time analytics dashboard</a> that needs to display milliseconds-old data with millisecond query latency for thousands of concurrent users, your real-time data processing infrastructure must be able to scale.</p><h3 id="what-is-real-time-data">What is real-time data?</h3><p>Real-time data has 3 qualities:</p><ol><li><strong>It’s fresh. </strong>Real-time data should be made available to downstream use cases and consumers within seconds (if not milliseconds) of its creation. This is sometimes referred to as “end-to-end latency”.</li><li><strong>It’s fast. </strong>Real-time data queries must have a “query response latency” in milliseconds regardless of complexity. Filters, aggregates, and joins are all on the table when you’re building <a href="https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide">real-time analytics</a>, and complex queries can’t slow you down. Why? Because real-time analytics often integrate with user-facing products, and queries that take seconds or more will dramatically degrade the user experience.</li><li><strong>It’s highly concurrent. </strong>Real-time data will almost always be accessed by many users at once. We aren’t building data pipelines for a handful of executives browsing Looker dashboards. We’re building in-product analytics, <a href="https://www.tinybird.co/blog-posts/real-time-personalization">real-time personalization</a>, <a href="https://www.tinybird.co/blog-posts/how-to-build-a-real-time-fraud-detection-system">real-time fraud detection</a>, and many more user-facing features. Real-time data is meant for the masses, so it needs to scale.</li></ol><h3 id="real-time-vs-batch-processing">Real-time vs. Batch Processing</h3><p>Real-time data processing and batch processing are fundamentally different ways of handling data. Real-time data processing handles data as soon as possible, ingesting, transforming, and exposing data products as soon as new data events are generated.</p><p>In contrast, batch processing handles data on some periodic schedule, using ETL/ELT workflows to occasionally extract data from source systems, transform it, and load it into things like <a href="https://www.tinybird.co/blog-posts/why-data-warehouses">cloud data warehouses</a>.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc493a185fef0e6e948f_mIz4fmcfBlP9vpvQ0QPIbdJ4Snto51yaNq6IuiPiUoL0AcJy8AviytoX0QE1zHTr4wbwFKhKqaxeEN53qpytTXVvwfY4qxSkHLAZctBjUrIaTpMyWz9UNFsSttpuQWX28TKRSp3hZvvDbYuBlTV2Xbk-8.png" class="kg-image" alt="A diagram showing how data processing is different for real-time analytics vs batch analytics." loading="lazy" width="1600" height="920" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc493a185fef0e6e948f_mIz4fmcfBlP9vpvQ0QPIbdJ4Snto51yaNq6IuiPiUoL0AcJy8AviytoX0QE1zHTr4wbwFKhKqaxeEN53qpytTXVvwfY4qxSkHLAZctBjUrIaTpMyWz9UNFsSttpuQWX28TKRSp3hZvvDbYuBlTV2Xbk-8.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc493a185fef0e6e948f_mIz4fmcfBlP9vpvQ0QPIbdJ4Snto51yaNq6IuiPiUoL0AcJy8AviytoX0QE1zHTr4wbwFKhKqaxeEN53qpytTXVvwfY4qxSkHLAZctBjUrIaTpMyWz9UNFsSttpuQWX28TKRSp3hZvvDbYuBlTV2Xbk-8.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc493a185fef0e6e948f_mIz4fmcfBlP9vpvQ0QPIbdJ4Snto51yaNq6IuiPiUoL0AcJy8AviytoX0QE1zHTr4wbwFKhKqaxeEN53qpytTXVvwfY4qxSkHLAZctBjUrIaTpMyWz9UNFsSttpuQWX28TKRSp3hZvvDbYuBlTV2Xbk-8.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Real-time analytics and batch analytics use different methods of data processing.</span></figcaption></figure><p>The differences between real-time data processing and batch data processing should be apparent simply by name: One happens in “real-time”, and the other happens in “batches”. Beyond their names, some important points draw a thick line between the two:</p><ol><li><strong>Data Ingestion. </strong>Real-time data processing and batch processing use very different data ingestion patterns. <br><br>Real-time data processing follows <a href="https://www.tinybird.co/blog-posts/event-driven-architecture-best-practices-for-databases-and-files">event-driven architectural patterns</a>. Data processing is triggered as soon as events are generated. As much as possible, real-time data processing systems avoid data ingestion patterns that temporarily store event data in upstream systems (though <a href="https://www.tinybird.co/blog-posts/real-time-change-data-capture">real-time change data capture</a> workflows can sometimes be a unique exception). <br><br>Batch data processing, on the other hand, generally requires that events be placed in an upstream database, data warehouse, or object storage system. This data is occasionally retrieved and processed on a schedule.</li><li><strong>Data Tooling. </strong>Real-time data processing and batch processing use very different toolsets to achieve their aims.<br><br>Real-time data processing relies on event streaming platforms like <a href="https://github.com/apache/kafka">Apache Kafka</a> and <a href="https://www.confluent.io/">Confluent</a>, <a href="https://www.tinybird.co/blog-posts/how-to-set-up-event-based-ingestion-of-files-in-s3-for-free#creating-event-driven-file-ingestion-with-s3-and-lambda">serverless functions like AWS Lambdas</a>, or <a href="https://www.tinybird.co/blog-posts/real-time-change-data-capture">real-time change data capture</a> for data ingestion and to trigger processing workflows. They utilize stream processing engines like <a href="https://github.com/apache/flink">Apache Flink</a> (or derivatives thereof, e.g., <a href="https://www.decodable.co/">Decodable</a>) to process data in motion and/or real-time databases (specifically those that support <a href="https://www.tinybird.co/blog-posts/what-are-materialized-views-and-why-do-they-matter-for-realtime">real-time materialized views</a>) like <a href="https://github.com/clickhouse/clickhouse">ClickHouse®</a> to perform complex filters, aggregations, and enrichments with minimal latency.<br><br>Batch data processing, on the other hand, uses orchestrators or schedulers like <a href="https://github.com/apache/airflow">Airflow</a>, <a href="https://dagster.io/">Dagster</a>, or <a href="https://www.prefect.io/">Prefect</a> to occasionally run Python code or Spark jobs that retrieves data from a source system (sometimes <a href="https://www.tinybird.co/blog-posts/event-driven-architecture-best-practices-for-databases-and-files">inefficiently</a>), load the data into a cloud data warehouse, and transform it within the warehouse using tools like <a href="https://github.com/dbt-labs/dbt-core">dbt</a>.</li><li><strong>Access Modalities. </strong>Real-time data processing and batch data processing serve different purposes for different users, and the way they expose data to downstream consumers is quite different.<br><br>Real-time data processing generally supports user-facing features that demand low-latency data access for many concurrent users. It’s designed for operational decision-making, <a href="https://www.tinybird.co/blog-posts/real-time-data-visualization">real-time visualizations</a>, <a href="https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide">real-time analytics</a>, and automation.<br><br>Batch processing supports long-running analytical queries that don’t require low latency and for must serve only a few business intelligence or data science consumers. It’s designed for strategic decision-making and long-term forecasting.<br></li></ol><h3 id="real-time-data-processing-vs-stream-processing">Real-Time Data Processing vs. Stream Processing</h3><p>Real-time data processing and stream processing are not the same. Stream processing is a subset of real-time data processing that deals with limited state and short time windows. Real-time data processing encompasses data processing with large state over unbounded time windows using <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time databases</a> that support <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">high-frequency ingestion</a>, <a href="https://www.tinybird.co/blog-posts/what-are-materialized-views-and-why-do-they-matter-for-realtime">incremental materialized views</a>, and low-latency queries.</p><p>The core difference between real-time data processing and stream processing is that real-time data processing is optimized for large volumes of data stored over long periods. </p><p>Stream processing engines like Apache Flink or ksqlDB <a href="https://www.tinybird.co/blog-posts/ksqldb-alternative">struggle to transform data over unbounded time windows or with high cardinality</a>. Real-time data processing leverages a full OLAP to run transformations over unbounded time windows on data with many fields that have potentially high cardinality.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4ae1401b7d47ad84b2_xgch4y-j5xzoL_0exywb0o-AjrweWH2ubvAWDJfBvdX8K0zf6z7q0bhqPIXhEL96WssA74Rbngt2OQQCc5I1BceA_tjs15F_3Dem4b48T-aWhz1poOpA3zeyBGnkOqF-inY-Je_-6MDoideCXpOXNR0-8.png" class="kg-image" alt="" loading="lazy" width="1600" height="1130" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc4ae1401b7d47ad84b2_xgch4y-j5xzoL_0exywb0o-AjrweWH2ubvAWDJfBvdX8K0zf6z7q0bhqPIXhEL96WssA74Rbngt2OQQCc5I1BceA_tjs15F_3Dem4b48T-aWhz1poOpA3zeyBGnkOqF-inY-Je_-6MDoideCXpOXNR0-8.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc4ae1401b7d47ad84b2_xgch4y-j5xzoL_0exywb0o-AjrweWH2ubvAWDJfBvdX8K0zf6z7q0bhqPIXhEL96WssA74Rbngt2OQQCc5I1BceA_tjs15F_3Dem4b48T-aWhz1poOpA3zeyBGnkOqF-inY-Je_-6MDoideCXpOXNR0-8.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4ae1401b7d47ad84b2_xgch4y-j5xzoL_0exywb0o-AjrweWH2ubvAWDJfBvdX8K0zf6z7q0bhqPIXhEL96WssA74Rbngt2OQQCc5I1BceA_tjs15F_3Dem4b48T-aWhz1poOpA3zeyBGnkOqF-inY-Je_-6MDoideCXpOXNR0-8.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Streaming analytics utilizes stream processing engines, whereas real-time analytics leverages a full OLAP.</span></figcaption></figure><p>Real-time data processing utilizes a <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time database</a> to either store transformations in real-time within <a href="https://www.tinybird.co/blog-posts/what-are-materialized-views-and-why-do-they-matter-for-realtime">materialized views</a> or maintain long histories of raw data sets that can be accessed at query time. The choice of a highly optimized, <a href="https://www.tinybird.co/blog-posts/what-is-a-columnar-database" rel="noreferrer">columnar, OLAP storage</a> enables low query latency even for complex analytics over large amounts of data.</p><h2 dir="ltr"><span>How Real-Time Data Processing Actually Works</span></h2>
<p dir="ltr">Real-time data processing isn’t magic. It’s a sequence of fast,
  predictable steps that happen the moment new data appears.&nbsp;</p>
<p dir="ltr">Every millisecond matters, so each component in the chain has to be
  tuned for speed and scale.</p>
<h3 dir="ltr">From event capture to action</h3>
<p dir="ltr">At its core, a real-time pipeline starts when an event is created.
  A user clicks a link, a sensor sends a reading, or a payment is
  approved.&nbsp;</p>
<p dir="ltr">Efficient<a
    href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">
    real-time data ingestion</a> ensures that these inputs are captured and
  normalized the instant they occur.</p>
<p dir="ltr">That event enters the system through an event bus or a connector
  and is instantly normalized—timestamped, structured, and ready for processing.
</p>
<p dir="ltr">Once standardized, events are enriched with context: metadata like
  user ID, location, or device type. This makes the data meaningful, not just
  fast.</p>
<p dir="ltr">From there, processing engines apply transformation logic in
  motion. They aggregate, filter, or join incoming streams before the data ever
  touches disk. The result is an end-to-end workflow that runs continuously, not
  periodically.</p>
<h3 dir="ltr">In-memory computation and ultra-low latency</h3>
<p dir="ltr">Real-time systems run almost entirely in memory. They don’t wait
  for writes, indexes, or stored procedures.&nbsp;</p>
<p dir="ltr">They use RAM as the active compute layer, keeping latency in the
  microseconds instead of seconds. Understanding<a
    href="https://www.cisco.com/site/us/en/learn/topics/cloud-networking/what-is-low-latency.html">
    what is low latency</a> is key to optimizing these real-time performance
  goals.</p>
<p dir="ltr">In-memory processing lets you evaluate millions of events per
  second while still supporting complex operations like time-windowed
  aggregations or anomaly detection. Hot data lives in memory for immediate
  analytics, while colder data moves to storage for historical queries.&nbsp;
</p>
<p dir="ltr">This “hot/cold” split is how modern systems keep both speed and
  cost under control.</p>
<h3 dir="ltr">Stateful correlation and composite events</h3>
<p dir="ltr">Events rarely matter on their own. What matters is how they relate
  to one another.</p>
<p dir="ltr">A single login attempt is harmless. Ten failed logins followed by a
  large transfer might not be. Real-time processors keep track of state so they
  can correlate related events over time.&nbsp;</p>
<p dir="ltr">When multiple signals combine into a pattern, the system emits a
  composite event—a higher-level insight that other services can act on
  instantly.</p>
<p dir="ltr">The same principle applies to missing data. If an expected
  heartbeat from a sensor doesn’t arrive, that absence itself becomes an event.
  Real-time pipelines can detect what happened, and what didn’t, equally fast.
</p>
<h3 dir="ltr">Filtering noise and enriching meaning</h3>
<p dir="ltr">Not all data deserves attention. Filtering reduces volume and
  noise, allowing critical signals to pass through while irrelevant ones drop
  away.</p>
<p dir="ltr">At the same time, enrichment adds context that raw streams
  lack.&nbsp;</p>
<p dir="ltr">Real-time systems can look up reference data, map IDs to names, or
  attach risk scores in motion. Together, filtering and enrichment keep
  pipelines both fast and intelligent.</p>
<h3 dir="ltr">Continuous decisions and instant reactions</h3>
<p dir="ltr">Once a rule or pattern triggers, action follows immediately. It
  might be an alert, an API call, or an automated workflow. These reactions are
  often programmatic, defined by business logic that can change on the fly
  without redeploying code.</p>
<p dir="ltr">Every processed event feeds into the next decision. The output of
  one rule can become the input of another, forming a self-reinforcing network
  of automation.&nbsp;</p>
<p dir="ltr">This is the heartbeat of modern data products: sense, decide, and
  act—all before the user notices.</p>
<h3 dir="ltr">Blending live and historical data</h3>
<p dir="ltr">Real-time doesn’t mean ignoring the past. Some of the most powerful
  systems combine streaming data with historical baselines.&nbsp;</p>
<p dir="ltr">They compare current activity against weeks or months of history to
  detect deviations or predict outcomes.</p>
<p dir="ltr">This hybrid view—fresh plus historical—enables use cases like live
  dashboards, demand forecasting, and fraud scoring.&nbsp;</p>
<p dir="ltr">Tools for<a
    href="https://www.tinybird.co/blog-posts/real-time-data-visualization">
    real-time data visualization</a> make these insights accessible and
  actionable, turning raw streams into clear understanding of what’s happening
  and why.</p>
<p dir="ltr">It’s the difference between seeing what’s happening and
  understanding why it’s happening.</p>
<p dir="ltr">To efficiently manage stored event data alongside streaming
  workloads, many teams rely on modern databases like<a
    href="https://www.ibm.com/think/topics/postgresql"> PostgreSQL</a>, which
  offer strong consistency, extensibility, and integration with real-time
  analytics frameworks.</p>
<p dir="ltr"></p>
<hr>
<p></p>
<h2 dir="ltr">Architectural Principles for Reliable Real-Time Platforms</h2>
<p dir="ltr">Real-time data systems don’t just need to be fast. They need to be
  reliable, consistent, and scalable under unpredictable workloads. Here’s what
  makes that possible.</p>
<h3 dir="ltr">From “store–analyze–act” to “analyze–act–store”</h3>
<p dir="ltr">Batch systems follow a simple order: collect, store, analyze, act.
  Real-time pipelines flip it around. They analyze and act first, then store
  results afterward for compliance or retrospection.</p>
<p dir="ltr">This inversion eliminates decision latency. Actions happen while
  the data is still in flight, not minutes or hours later. It’s the shift from
  after-the-fact insight to immediate response.</p>
<h3 dir="ltr">Elastic scaling and fault tolerance</h3>
<p dir="ltr">Traffic spikes. Sensors flood. Campaigns go viral. Real-time
  platforms absorb all of it through elastic scaling—adding compute and memory
  on demand as volume grows.</p>
<p dir="ltr">Designing<a
    href="https://www.tinybird.co/blog-posts/real-time-streaming-data-architectures-that-scale">
    real-time streaming data architectures that scale</a> is essential to
  maintain consistent performance under variable workloads.</p>
<p dir="ltr">Failures are inevitable, but downtime isn’t. Distributed
  coordination, automatic retries, and checkpointing keep streams alive even
  when nodes fail. Exactly-once delivery ensures that no event is processed
  twice, and none are lost along the way.</p>
<h3 dir="ltr">Ordering guarantees and consistency</h3>
<p dir="ltr">Order matters. Processing events out of sequence can destroy
  meaning. Real-time systems preserve temporal ordering through sequence IDs,
  partitioned topics, and consistent clocks.</p>
<p dir="ltr">Combined with idempotent operations, this guarantees that results
  stay accurate even during retries or restarts.</p>
<h3 dir="ltr">Observability and replay</h3>
<p dir="ltr">You can’t fix what you can’t see. Observability gives engineers
  real-time visibility into throughput, latency, and stream health. Tracing
  tools follow a single event through every hop in the pipeline, making
  debugging fast and precise.</p>
<p dir="ltr">Sometimes you need to go back in time. Replay mechanisms let teams
  reprocess historical streams to apply new logic or recover lost state.
  Checkpointing ensures the system can resume exactly where it left off after a
  failure.</p>
<h3 dir="ltr">Correlation and pattern detection at scale</h3>
<p dir="ltr">Real-time pipelines can detect patterns that span thousands of
  concurrent streams. They don’t just count or average—they understand context.
</p>
<p dir="ltr">By correlating events across time and source, they can surface
  anomalies, detect trends, and trigger actions automatically. This is how
  systems identify DDoS attacks, sudden traffic drops, or behavioral shifts as
  they happen, not after.</p>
<h3 dir="ltr">Event-driven automation loops</h3>
<p dir="ltr">Every event can lead to another. A system alert can trigger a
  scaling operation; a fraud flag can block a transaction; a new data point can
  refresh a dashboard.</p>
<p dir="ltr">These event-driven loops connect analytics directly to automation.
  Over time, they evolve into self-optimizing systems where outcomes
  continuously adjust to live feedback.</p>
<h3 dir="ltr">Security and governance in motion</h3>
<p dir="ltr">Processing live data requires strict control. Encryption,
  authentication, and fine-grained access permissions must operate as fast as
  the data itself.</p>
<p dir="ltr">Compliance demands auditability—the ability to trace how every
  event was used and when. Real-time doesn’t mean reckless; it means secure by
  design.</p>
<h3 dir="ltr">Predictive and proactive intelligence</h3>
<p dir="ltr">The logical next step after reacting in real time is predicting
  what comes next. By combining streaming data with online machine learning,
  systems can detect trends early and automate preemptive actions.</p>
<p dir="ltr">Predictive fraud detection, demand forecasting, and maintenance
  alerts are all outcomes of this approach. It’s the point where “real-time”
  becomes “ahead of time.”</p>
<h3 dir="ltr">Building trust in milliseconds</h3>
<p dir="ltr">Reliable real-time architectures build trust through consistency.
  Every query, every event, every user interaction happens fast and
  correctly—every single time.</p>
<p dir="ltr">This is how modern data teams deliver experiences that feel
  instantaneous and dependable, whether it’s a dashboard updating live or an API
  returning data that’s only milliseconds old.</p><br><h3 id="examples-of-real-time-data-processing">Examples of Real-Time Data Processing</h3><p>Some examples of real-time data processing include:</p><ul><li>Real-time fraud detection</li><li>Real-time personalization</li><li>Real-time marketing campaign optimization</li><li>Real-time anomaly detection</li></ul><p>Real-time fraud detection is a great example of real-time data processing. A <a href="https://www.tinybird.co/blog-posts/how-to-build-a-real-time-fraud-detection-system">real-time fraud detection engine</a> ingests a financial transaction event, compares transaction metadata against historical data (sometimes using online machine learning) to make a fraud determination, and exposes its determination to the point of sale or to ATMs within milliseconds.</p>
<!--kg-card-begin: html-->
<blockquote>Real-time fraud detection is a great example of real-time data processing at work.</blockquote>
<!--kg-card-end: html-->
<p>This is distinctly a real-time data processing example because of its requirement for maintaining state. Real-time fraud detection systems must maintain long histories of transaction data which are used to train and update <a href="https://www.tinybird.co/blog-posts/using-tinybird-as-a-serverless-online-feature-store">online feature stores</a> or <a href="https://www.tinybird.co/blog-posts/anomaly-detection">heuristical models</a> that perform the real-time data processing as transactions stream in. They support <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">streaming data ingestion</a> and low-latency, high-concurrency access.</p><h2 id="real-use-cases-for-real-time-data-processing">Real use cases for real-time data processing</h2><p>Real-time data processing has wide adoption across many industries. Some examples of real-time data processing can be found in:</p><ul><li>Real-time personalization on e-commerce websites</li><li>Real-time operational analytics dashboards for logistics companies</li><li>User-facing analytics dashboards in SaaS</li><li>Smart inventory management in retail</li><li>Anomaly detection in server management</li></ul><p>… and many more such use cases. Let’s dig into more detail on how real-time data processing plays a role in these industries.</p><h3 id="real-time-personalization-in-e-commerce">Real-time personalization in e-commerce</h3><p><a href="https://www.tinybird.co/blog-posts/real-time-personalization">Real-time personalization</a> is a means of customizing a user’s online experiences based on data that is collected in real time, including data from the current browsing session.</p><p>A classic real-time personalization example involves eCommerce websites placing customized offers in front of a visitor during browsing or checkout. The customized offers might include products that closely match items that the visitor viewed or added to their cart.</p><p>Achieving this use case requires real-time data processing. The data from the user’s browsing session must be captured and processed in real-time to determine what kinds of personalized offers to place back into the application during their active session.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4a4121207a80d3162f_IafX6Yjn1uXoX2UqiV2rl1jcGEU1sgJaRzEYcc6gxcZiiml9q3rhbnAd7V6_gNU96_Y0WBQMcoSFdEE9WVCzTCzWHqYKXYkwasfPHrz21HJGeGJGVVJazK60foX3r9MMNKVzqSU0ioO1Y_3NEgwBEAw-8.png" class="kg-image" alt="A diagram showing the real-time personalization process" loading="lazy" width="1600" height="885" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc4a4121207a80d3162f_IafX6Yjn1uXoX2UqiV2rl1jcGEU1sgJaRzEYcc6gxcZiiml9q3rhbnAd7V6_gNU96_Y0WBQMcoSFdEE9WVCzTCzWHqYKXYkwasfPHrz21HJGeGJGVVJazK60foX3r9MMNKVzqSU0ioO1Y_3NEgwBEAw-8.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc4a4121207a80d3162f_IafX6Yjn1uXoX2UqiV2rl1jcGEU1sgJaRzEYcc6gxcZiiml9q3rhbnAd7V6_gNU96_Y0WBQMcoSFdEE9WVCzTCzWHqYKXYkwasfPHrz21HJGeGJGVVJazK60foX3r9MMNKVzqSU0ioO1Y_3NEgwBEAw-8.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4a4121207a80d3162f_IafX6Yjn1uXoX2UqiV2rl1jcGEU1sgJaRzEYcc6gxcZiiml9q3rhbnAd7V6_gNU96_Y0WBQMcoSFdEE9WVCzTCzWHqYKXYkwasfPHrz21HJGeGJGVVJazK60foX3r9MMNKVzqSU0ioO1Y_3NEgwBEAw-8.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">The entire real-time personalization process should take only a few seconds at most.</span></figcaption></figure><p>The entire process, from ingestion to personalization, can take only a few seconds, at most.</p><h3 id="operational-analytics-in-logistics">Operational analytics in logistics</h3><p>Consider an airline company seeking to track luggage across a network of airplanes, airports, and everywhere in between. A <a href="https://www.tinybird.co/blog-posts/real-time-data-platforms">real-time data platform</a> can use real-time data processing to analyze streaming data from IoT sensors and “tags on bags”. This data is used to update <a href="https://www.tinybird.co/blog-posts/real-time-dashboard-step-by-step">real-time dashboards</a> that help airline staff keep track of baggage, handle passenger offloading events or gate changes, and other last-minute requirements in a busy airport terminal.</p><h3 id="user-facing-analytics-in-saas">User-facing analytics in SaaS</h3><p>One of the most common applications for real-time data processing is <a href="https://www.tinybird.co/blog-posts/user-facing-analytics" rel="noreferrer">user-facing analytics</a>, sometimes called “in-product analytics” or "embedded analytics".</p><p>SaaS users often want analytics on how they and their teams are utilizing the software, and user-facing analytics dashboards can provide that information.</p><p>However, if the data is out of date, then the user experience suffers. Real-time data processing makes it possible for SaaS users to get up-to-date analysis on their usage in real time.</p><h3 id="smart-inventory-management-in-retail">Smart inventory management in retail</h3><p>Related to real-time personalization, smart inventory management involves making faster, better decisions (perhaps even automated decisions) about how to appropriately route product inventory to meet consumer demand.</p><p>Retailers want to avoid stockout scenarios - where the product a user wants to buy is unavailable. Using real-time data processing, retailers can monitor demand and supply in real time and help forecast where new inventory is required.</p><p>This can result in a better shopping experience (for example by only showing products that are <em>definitely</em> in stock) while also allowing retailers to reroute inventory to regional distribution hubs where demand is highest.</p><h3 id="anomaly-detection-in-server-management">Anomaly detection in server management</h3><p>Direct Denial of Service (DDoS) attacks can bring down a server in a heartbeat. Companies that operate hosted services, ranging from cloud providers to SaaS builders, need to be able to detect and resolve DDoS attacks as quickly as possible to avoid the collapse of their application servers.</p><p>With real-time data processing, they can monitor server resources and real-time requests, quickly identify DDoS attacks (and other problems) through <a href="https://www.tinybird.co/blog-posts/anomaly-detection">real-time anomaly detection,</a> and shut down would-be attackers through automated control systems.</p><h2 id="real-time-data-processing-reference-architectures">Real-time Data Processing Reference Architectures</h2><p>If you’re trying to build a <a href="https://www.tinybird.co/blog-posts/real-time-data-platforms">real-time data platform</a>, real-time data processing will serve an essential role.</p><p>Every real-time data processing architecture will invariably include <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">real-time data ingestion</a>, often through the use of an event streaming platform like Kafka. In addition, some form of stream processing engine or <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time database</a> will be used to transform the data in real time. Finally, a low-latency API layer will be used to expose data in various formats and power user-facing analytics, automation, and <a href="https://www.tinybird.co/blog-posts/real-time-data-visualization">real-time visualization</a>.</p><p>Below you’ll find some common reference architectures for real-time data processing.</p><h3 id="the-user-facing-analytics-architecture">The user-facing analytics architecture</h3><p>In this reference architecture, events are captured through an event bus such as Apache Kafka and ingested into a <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time database</a>, which is responsible for real-time data processing. Application users interact with <a href="https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide">real-time analytics</a> produced by the database using a low-latency, high-concurrency API.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc49f672666330fcb500_3tu_g6OVXCH2FwbumkBlCk513uGvDiKCctfmqpOyJqFIAZfMxstobFUHyAAXxyk1YaWfDWVQkMPYi22vg3K1atzOsE5dpQVDqpR5GacWDwBXkyCCm8-XbvTOCoODzkhCn4Hvyh4GDqFN2ngY9mDvZw0-8.png" class="kg-image" alt="A diagram showing a user-facing real-time analytics use case." loading="lazy" width="1600" height="885" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc49f672666330fcb500_3tu_g6OVXCH2FwbumkBlCk513uGvDiKCctfmqpOyJqFIAZfMxstobFUHyAAXxyk1YaWfDWVQkMPYi22vg3K1atzOsE5dpQVDqpR5GacWDwBXkyCCm8-XbvTOCoODzkhCn4Hvyh4GDqFN2ngY9mDvZw0-8.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc49f672666330fcb500_3tu_g6OVXCH2FwbumkBlCk513uGvDiKCctfmqpOyJqFIAZfMxstobFUHyAAXxyk1YaWfDWVQkMPYi22vg3K1atzOsE5dpQVDqpR5GacWDwBXkyCCm8-XbvTOCoODzkhCn4Hvyh4GDqFN2ngY9mDvZw0-8.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc49f672666330fcb500_3tu_g6OVXCH2FwbumkBlCk513uGvDiKCctfmqpOyJqFIAZfMxstobFUHyAAXxyk1YaWfDWVQkMPYi22vg3K1atzOsE5dpQVDqpR5GacWDwBXkyCCm8-XbvTOCoODzkhCn4Hvyh4GDqFN2ngY9mDvZw0-8.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">User-facing analytics rely on real-time data processing in a real-time database.</span></figcaption></figure><h3 id="the-operational-analytics-architecture">The operational analytics architecture</h3><p>In this reference architecture, events are captured and stored in a <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time database</a> by the same method as above. In this example, however, it is not users who access <a href="https://www.tinybird.co/blog-posts/real-time-analytics-a-definitive-guide">real-time analytics</a> through an API, but rather operational automation systems that utilize real-time data processing to initiate software functions without human intervention. The real-time database is also used in a way that is functionally similar to (though not the same as) stream processing, preparing data for batch processing in a data warehouse.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4aa727aeaaa5767d99_vtmvH3kp0zocwhpzgJxaARNhRvWZH0rDBGkcokinUgR2wtYjh16IRYtDFE7bDPIthge5l0as4LKkOpq67J1GtHvDrsnmtCt3zutnUv8u5ozjsGDeamWM25a3UoFlv5D5986xz8AHIQ1UAQ5AwxThNQw-8.png" class="kg-image" alt="A diagram showing how operational automation can be powered by real-time data processing tools." loading="lazy" width="1600" height="1075" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64f8dc4aa727aeaaa5767d99_vtmvH3kp0zocwhpzgJxaARNhRvWZH0rDBGkcokinUgR2wtYjh16IRYtDFE7bDPIthge5l0as4LKkOpq67J1GtHvDrsnmtCt3zutnUv8u5ozjsGDeamWM25a3UoFlv5D5986xz8AHIQ1UAQ5AwxThNQw-8.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64f8dc4aa727aeaaa5767d99_vtmvH3kp0zocwhpzgJxaARNhRvWZH0rDBGkcokinUgR2wtYjh16IRYtDFE7bDPIthge5l0as4LKkOpq67J1GtHvDrsnmtCt3zutnUv8u5ozjsGDeamWM25a3UoFlv5D5986xz8AHIQ1UAQ5AwxThNQw-8.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64f8dc4aa727aeaaa5767d99_vtmvH3kp0zocwhpzgJxaARNhRvWZH0rDBGkcokinUgR2wtYjh16IRYtDFE7bDPIthge5l0as4LKkOpq67J1GtHvDrsnmtCt3zutnUv8u5ozjsGDeamWM25a3UoFlv5D5986xz8AHIQ1UAQ5AwxThNQw-8.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Sometimes real-time data processing serves automation, not visualization.</span></figcaption></figure><h3 id="the-real-time-data-platform-architecture">The real-time data platform architecture</h3><p>In this architecture event streams and fact tables are ingested into a <a href="https://www.tinybird.co/blog-posts/real-time-data-platforms">real-time data platform</a>, which integrates real-time data ingestion (through native data connectors), a real-time database, and a real-time API layer into a single functional interface.</p><p>In parallel, the data warehouse supports batch data processing for business intelligence and data science workloads.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://tinybird-blog.ghost.io/content/images/2023/09/64c1826c79a44a99c431e668_GslVAjXyeF2cUnixx-mUtM0MJNaC3aDUqhv85gsv1pKqrX-3SU3Bju9hOOCmDWunQgDhRS7NhZBvtSuaUCtM5MBbpXjuJMIK-Yw5S4vckqBCaqpwhtYgyycdCecMthdFU1o6QNJt-rju9OyQngxkqgY-18.png" class="kg-image" alt="Real-time data processing diagram containing a real-time data platform." loading="lazy" width="1600" height="627" srcset="https://tinybird-blog.ghost.io/content/images/size/w600/2023/09/64c1826c79a44a99c431e668_GslVAjXyeF2cUnixx-mUtM0MJNaC3aDUqhv85gsv1pKqrX-3SU3Bju9hOOCmDWunQgDhRS7NhZBvtSuaUCtM5MBbpXjuJMIK-Yw5S4vckqBCaqpwhtYgyycdCecMthdFU1o6QNJt-rju9OyQngxkqgY-18.png 600w, https://tinybird-blog.ghost.io/content/images/size/w1000/2023/09/64c1826c79a44a99c431e668_GslVAjXyeF2cUnixx-mUtM0MJNaC3aDUqhv85gsv1pKqrX-3SU3Bju9hOOCmDWunQgDhRS7NhZBvtSuaUCtM5MBbpXjuJMIK-Yw5S4vckqBCaqpwhtYgyycdCecMthdFU1o6QNJt-rju9OyQngxkqgY-18.png 1000w, https://tinybird-blog.ghost.io/content/images/2023/09/64c1826c79a44a99c431e668_GslVAjXyeF2cUnixx-mUtM0MJNaC3aDUqhv85gsv1pKqrX-3SU3Bju9hOOCmDWunQgDhRS7NhZBvtSuaUCtM5MBbpXjuJMIK-Yw5S4vckqBCaqpwhtYgyycdCecMthdFU1o6QNJt-rju9OyQngxkqgY-18.png 1600w" sizes="(min-width: 720px) 720px"><figcaption><span style="white-space: pre-wrap;">Real-time data platforms can simplify the real-time data processing workload.</span></figcaption></figure><h2 id="real-time-data-processing-tools">Real-Time Data Processing Tools</h2><p>As mentioned above, real-time data processing involves some combination of event streaming, stream processing, real-time databases, real-time APIs, and real-time data platforms.</p><h4 id="event-streaming-platform">Event Streaming Platform</h4><p>You can’t have real-time data processing without <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">real-time data ingestion</a>, and event streaming platforms are the go-to technology here.</p><p>Some examples of event streaming platforms include:</p><ul><li><a href="https://github.com/apache/kafka">Apache Kafka</a></li><li><a href="https://www.confluent.io/">Confluent Cloud</a></li><li><a href="https://redpanda.com/">Redpanda</a></li><li><a href="https://cloud.google.com/pubsub">Google Pub/Sub</a></li><li><a href="https://aws.amazon.com/kinesis/">Amazon Kinesis</a></li></ul><h4 id="stream-processing-engines">Stream Processing Engines</h4><p>Stream processing engines may or may not be utilized in real-time data processing. In the absence of a <a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">real-time database</a>, a stream processing engine can be used to transform and process data in motion.</p><p>Stream processing engines are able to maintain some amount of state, but they won’t be able to leverage a full OLAP as do real-time databases.</p><p>Still, they can be an important part of real-time data processing implementations both with and without real-time databases.</p><p>Some examples of stream processing engines include:</p><ul><li><a href="https://github.com/apache/flink">Apache Flink</a></li><li><a href="https://github.com/apache/spark">Apache Spark</a></li><li><a href="https://kafka.apache.org/documentation/streams/">Kafka Streams</a></li><li><a href="https://ksqldb.io/">ksqlDB</a></li><li><a href="https://www.decodable.co/">Decodable</a></li></ul><h4 id="real-time-databases">Real-time databases</h4><p><a href="https://www.tinybird.co/blog-posts/real-time-databases-what-developers-need-to-know">Real-time databases</a> enable real-time data processing over unbounded time windows. They often support <a href="https://www.tinybird.co/blog-posts/what-are-materialized-views-and-why-do-they-matter-for-realtime">incremental materialized views</a>. They support high write frequency, low-latency reads on filtered aggregates, and joins (to varying degrees of complexity).</p><p>Some examples of real-time databases include:</p><ul><li><a href="https://github.com/ClickHouse®/ClickHouse®">ClickHouse®</a></li><li><a href="https://github.com/apache/druid">Apache Druid</a></li><li><a href="https://github.com/apache/pinot">Apache Pinot</a></li></ul><h4 id="real-time-api-layer">Real-time API layer</h4><p>Real-time data processors must expose transformations to downstream consumers. In an ideal world, this happens through a real-time API layer that enables a wide variety of consumers to access and utilize the data concurrently.</p><h4 id="real-time-data-platforms">Real-time data platforms</h4><p><a href="https://www.tinybird.co/blog-posts/real-time-data-platforms">Real-time data platforms</a> generally combine some or all of the components of a real-time data processing engine.</p><p>For example, <a href="https://www.tinybird.co">Tinybird</a> is a <a href="https://www.tinybird.co/product">real-time data platform</a> that supports <a href="https://www.tinybird.co/blog-posts/real-time-data-ingestion">real-time data ingestion</a> through its <a href="https://www.tinybird.co/docs/ingest/events-api.html">HTTP streaming endpoint</a> or <a href="https://www.tinybird.co/docs/ingest/kafka.html">native Kafka connector</a>, real-time data processing through its <a href="https://www.tinybird.co/clickhouse">optimized ClickHouse® implementation</a>, and a <a href="https://www.tinybird.co/docs/concepts/apis.html">real-time, SQL-based API layer</a> for exposing real-time data products to consumers.</p><h2 id="how-to-get-started-with-real-time-data-processing">How to get started with real-time data processing</h2><p>If you’re looking for a platform to handle real-time data processing integrated with streaming ingestion and a real-time API layer, then consider <a href="https://www.tinybird.co" rel="noreferrer">Tinybird</a>.</p><p>Tinybird is a real-time data platform that handles real-time data ingestion and real-time data processing. With Tinybird, you can unify both streaming and batch data sources through native data connectors, process data in real-time using SQL, and expose your processed data as scalable, dynamic APIs that can integrate with myriad downstream systems.</p><p>For more information about Tinybird, dig into the <a href="https://www.tinybird.co/product">product</a>, check out the <a href="https://www.tinybird.co/docs">documentation</a>, or <a href="https://www.tinybird.co/signup?referrer=https%3A%2F%2Fwww.tinybird.co%2Fblog-posts%2Freal-time-data-processing">try it out</a>. It’s free to start, with no time limit or credit card needed. If you get stuck along the way, you can join our <a href="https://www.tinybird.co/community">active Slack community</a> to ask questions and get answers about real-time data processing and analytics.</p>
