Consume APIs in a Notebook

Easy

Notebooks are a great resource for exploring data and generating plots. Here we explore consuming Tinybird APIs in a notebook. This Colab notebook uses a Data Source of updates to Wikipedia to show how to consume data from queries using the Query API and from API endpoints using the Pipes API.

The preliminary step of creating a Data Source in your workspace from this 120 MB CSV file of updates to Wikipedia is detailed in the notebook.

For less than 100 MB of data, you can fetch all the data with a call to the query API or from an API endpoint, using parameters if you wish. When there is a lot of data you can't fetch all the data in one go. You need to query it little by little, with not more than 100 MB per API call. The solution is to get batches using Data Source sorting keys. Selecting the data by columns used in the sorting key ensures that it is fast.

In this example, the Data Source is sorted on the timestamp column, so we use batches of a fixed amount of time. In general, time is a good way to batch.

The functions fetch_table_streaming_query and fetch_table_streaming_endpoint in the notebook work as generators. They should always be used in a for loop or as the input for another generator.

You should process each batch as it arrives and discard unwanted fetched data. Only fetch the data you need in the processing. The idea here is not to recreate a Data Source in the notebook but to process each batch as it arrives and write less data to your DataFrame.

Fetch data with the Query API

Here we use the requests library for Python. The SQL query pulls in an hour less of data than the full Data Source. A DataFrame is created from the text part of the response.

Fetch data from an API Endpoint with Parameters

This endpoint node in the pipe endpoint_wiki selects from the Data Source within a range of dates, using the parameters for date_start and date_end.

These parameters are passed in the call to the API endpoint to select only the data within the range. A DataFrame is created from the text part of the response.

Fetch batches of data using the Query API

The function fetch_table_streaming_query in the notebook accepts more complex queries than a date range. Here you choose what you filter and sort by. This example reads in batches of 5 minutes to create a small DataFrame, which should then be processed, with the results of the processing appended to the final DataFrame.

5-minute batches of data using the index

Fetch batches of data from an API Endpoint with Parameters

The function fetch_table_streaming_endpoint in the notebook sends a call to the API with parameters for the batch size, start and end dates, and, optionally, filters on the 'bot' and 'server_name' columns. This example reads in batches of 5 minutes to create a small DataFrame, which should then be processed, with the results of the processing appended to the final DataFrame.

‍The endpoint 'wiki_stream_example' first selects data for the range of dates, then for the batch and then applies the filters on column values.

These parameters are passed in the call to the API endpoint to select only the data for the batch. A DataFrame is created from the text part of the response.

The full code for these examples is in this Colab notebook.