Ingest from a notebook¶
Easy
Using the Rest API¶
A straightforward approach is to write the contents of the DataFrame to a file and ingest the file using the Rest API.
Alternatively, the data can be kept in memory, converted to an array and ingested using the function ingest_from_array
in this notebook.
Through the UI¶
Of course, the DataFrame can simply be written to a CSV, NDJSON, or Parquet file and uploaded to the UI as a local file.
From the CLI¶
From within the notebook, you can install the tinybird-cli (!pip install tinybird-cli
) and run the commands. You can either add the token for your workspace using !tb auth
or set token = `your token’
and use $token
.
Write the DataFrame to a CSV file.
df_wiki.to_csv(""wiki_cli_csv.csv"", index=False)
The schema for the Data Source can be generated from the CSV file
!tb --token=$token datasource generate wiki_cli_csv.csv
or defined directly in a file.
Then push the schema to Tinybird to create an empty Data Source
!tb --token=$token push wiki_cli_csv.datasource
and append the contents of the CSV file.
!tb --token=$token datasource append wiki_cli_csv wiki_cli_csv.csv
Following the same pattern, the DataFrame can be written to a JSON file.
df_wiki.to_json(""wiki_cli_ndjson.ndjson"", orient=""records"", lines=True, force_ascii=0)
The schema for the Data Source generated from the JSON file
!tb --token=$token datasource generate wiki_cli_ndjson.ndjson
or defined directly
The schema is then pushed to Tinybird to create an empty Data Source and the contents of the file appended.
!tb --token=$token push wiki_cli_ndjson.datasource
!tb --token=$token datasource append wiki_cli_ndjson wiki_cli_ndjson.ndjson
Streaming with the Tinybird Events API¶
Here events are streamed directly to the Data Source from the Wikipedia stream using the Tinybird Events API. The data is not first written to a pandas DataFrame.
With mode='create'
the data types are inferred. For this example data, to avoid rows going into quarantine, a few more columns need to be Nullable
than inferred. Directly defining the schema after exploring the automatically created Data Source in the UI solves this issue.
The schema is pushed to Tinybird.
!tb --token=$token push wiki_hfi.datasource
Note that here the ‘meta’ dictionary is split into many columns. In the code used to write the DataFrame in this example notebook, df[df.index=='domain']
selected a single row for each event, where the columns within ‘meta’ were the index of the DataFrame for a single event.
This Python code sends each event individually to Tinybird but excludes events of type ‘log’. In your code, you can ensure you only write the data you need to Tinybird. If you don’t need it then don’t write it.
The full range of ingestion options is available from within a notebook. You can mix-and-match how you ingest data as your projects develop. The full code for these examples is in this Colab notebook.