Analyze the performance of your endpoints with pipe_stats

Intermediate

Tinybird is all about speed. We give you tools to make real-time queries really quickly, and then we give you even more tools to optimize those queries to make your endpoints faster.

Of course, before you optimize, you need to know what to optimize. That’s where the Tinybird {% code-line %}pipe_stats{% code-line-end %} and {% code-line %}pipe_stats_rt{% code-line-end %} Data Sources come in handy. Whether you’re trying to speed up your endpoints, track error rates, or reduce scan size (and subsequent usage costs), {% code-line %}pipe_stats{% code-line-end %} and {% code-line %}pipe_stats_rt{% code-line-end %} let you see how your endpoints are performing, so you can find performance offenders and get them up to speed.

These Service Data Sources provide performance data and consumption data for every single request, plus you can filter and sort results by tokens to see who is accessing your endpoints and how often.

This guide explains how to use {% code-line %}pipe_stats{% code-line-end %} and {% code-line %}pipe_stats_rt{% code-line-end %}, giving several practical examples that show what you can do with these Service Data Sources.

{% tip-box title="NOTE" %}Confused about the difference between {% code-line %}pipe_stats_rt{% code-line-end %} and {% code-line %}pipe_stats{% code-line-end %}? It’s simple: {% code-line %}pipe_stats{% code-line-end %} provides aggregate stats - like average request duration and total read bytes - per day, whereas {% code-line %}pipe_stats_rt{% code-line-end %} offers the same information but without aggregation. Every single request is stored in {% code-line %}pipe_stats_rt{% code-line-end %}. The examples in this guide use {% code-line %}pipe_stats_rt{% code-line-end %}, but you can use the same logic with {% code-line %}pipe_stats{% code-line-end %} if you need more than 7 days of lookback.{% tip-box-end %}

Understanding the core stats

In this guide, we will focus on the following fields in the {% code-line %}pipe_stats_rt{% code-line-end %} Service Data Source:

  • {% code-line %}pipe_name{% code-line-end %} (String): Pipe name as returned in Pipes API (“query_api” in the event it is a Query API request).
  • {% code-line %}duration{% code-line-end %} (Float): the duration in seconds of each specific request.
  • {% code-line %}read_bytes{% code-line-end %} (UInt64): How much data was scanned for this particular request.
  • {% code-line %}read_rows{% code-line-end %} (UInt64): How many rows were scanned.
  • {% code-line %}token_name{% code-line-end %} (String): The name of the token used in a particular request.
  • {% code-line %}status_code{% code-line-end %} (Int32): The HTTP status code returned for this particular request.

You can find the full schema for pipe_stats_rt in the API docs.

Example 1: Detecting errors in your API endpoints

If you want to monitor the number of errors per endpoint over the last hour, you could do the following:

In case you have errors, this would return something like:

Just like that, you can see in real time if your endpoints are experiencing errors, and investigate further if they are.

Example 2: Analyzing the performance of API Endpoints over time

You can also use {% code-line %}pipe_stats_rt{% code-line-end %} to track how long API calls take using the duration field, and seeing how that changes over time. API performance is directly related to how much data you are reading per request, so if your endpoint is dynamic (for instance, maybe it’s receiving start and end date parameters that alter how long a period is being read), request duration will vary.

Example 3: Finding the endpoints that process the most data

Commonly, you’ll want to find endpoints that repeatedly scan large amounts of data. These are your best candidates for optimization to reduce time and spend.

Here’s an example of using {% code-line %}pipe_stats_rt{% code-line-end %} to find the endpoints which have processed the most data as a percentage of all processed data in the last 24 hours.

Modifying to include consumption of the Query API

If you use Tinybird’s Query API to query your Data Sources directly, you probably want to include in your analysis which queries are consuming more. 

Whenever you use the Query API, the field “pipe_name” will contain the value “query_api”. actual query will be included as part of the q parameter in the url field. You can modify the query in the previous section to extract the actual SQL query that is processing the data.

Example 4: Monitoring usage of tokens

If you use your API Endpoint with different tokens, allowing different customers to check their own data, for example, you can track and control which tokens are being used to access these endpoints.

Here’s an example that shows, for the last 24 hours, the number and size of requests per token.

In order to obtain this information, you can request the token name (token_name column) or id (token column).

Visualizing stats with Karman

With {% code-line %}pipe_stats_rt{% code-line-end %} and {% code-line %}pipe_stats{% code-line-end %}, you have visibility and control over all the endpoint usage in your Workspace. To help you analyze the results of the various examples (and any more that you develop on your own), we offer a small visualization library called Karman. After you set your admin token (as you would with the CLI), you can visualize any time-series Data Source in your Workspace, including Service Data Sources, like {% code-line %}pipe_stats_rt{% code-line-end %}.

ON THIS GUIDE