How to ingest NDJSON data

Intermediate

In this guide you’ll learn how to ingest unstructured data, like NDJSON to Tinybird.

A typical scenario consists of having a document-based database, using nested records on your data warehouse or generatied events in JSON format from a web application

For cases like this, we used to export the JSON objects as if they were a String in a CSV file, ingest them to Tinybird and then use the builtin JSON functions to prepare the data for realtime analytics as it was being ingested.

But this is not needed anymore, Tinybird accepts JSON imports by default!

NDJSON and JSON support:

Although Tinybird allows you to ingest .json and .ndjson files, it only accepts the Newline Delimited JSON as content: each line must be a valid JSON object and every line has to end with `\n`. The APIwill return an error if each line isn’t a valid JSON value

Ingesting to Tinybird

As an example for this guide, we will make use of this 100k rows NDJSON file, which contains events from an ecommerce website with different properties.

With the API

Ingesting NDJSON files using the API is similar to the CSV way. There are only two differences that we need to manage in the query parameters:

  • format: it has to be “ndjson”

  • schema: we are used to providing the name and the type for every column but here we need to include another property, called the `jsonpath` (See the JSONPath syntax). Example: “schema=event_name String `json:$.event.name`”

You can guess the schema by first calling the Analyze API. It’s a very handy way to not have to remember the schema and jsonpath syntax, simply send a sample of your file and we’ll guess what’s inside: columns, types, schema, a preview, etc.

Just take the schema attribute in the response and either use it right away in the next API request to create the Data Source or modify as you wish: column names, types, remove any columns, etc.

Now that we’ve analyzed the file, let’s create the Data Source. In the example below, you will ingest the 100k rows NDJSON file only taking 3 columns from it: date, event, and product_id. The jsonpath allows Tinybird to match the Data Source column with the JSON property path.

With the Command Line Interface

There are no changes in the CLI in order to ingest an NDJSON file. Just run the command you are used to with CSV.

Once it is finished, it will automatically generate a .datasource file with all the columns with their proper types and jsonpaths.

Then you can push that .datasource to Tinybird and start using it in your Pipes or append new data to it.

With the User Interface

To create a new Data Source from an NDJSON file, you just need to go to your dashboard and click on the “”Add Data Source”” button.

The import modal will allow you to select an NDJSON formatted file (with either .ndjson or .json extension) from your computer or provide a URL such as the one we are using for this guide.

Import modal

Import modal

After selecting an NDJSON file or URL, Tinybird will analyze the content and provide a preview of your data. Apart from changing the name or the data type of a column, you can also discard it from being imported using the checkbox.

Preview step

Preview step

JSON tree view:

You can preview your JSON data with the tree view available in this step. Just click on the icon on the top right of the table.

Once your data is imported, you will have a Data Source with your JSON data structured in columns, which are easy to transform and consume in any Pipe.

Data Source created with data

Data Source created with data

Ingest just the columns you need:

After exploration of your data, always remember to create a Data Source that only has the columns needed for your analyses. That will help to make your ingestion, materialization and your realtime data project faster.

Dealing with new JSON fields

One of the features we have included with the new NDJSON import is to automatically detect if a new JSON property is being added when new data is coming in.

Using the Data Source we imported in the previous paragraph, we will include a new property to know the origin country of the event, complementing the city. Let’s append new JSON data with the extra property (example file).

After finishing the import, open the Data Source modal and check that a new blue banner appears, warning you about the new properties detected in the last ingestion.

Automatically suggesting new columns

Automatically suggesting new columns

Once you accept viewing those new columns, the application will allow you to add them, change the column types and the column names, as it did in the preview step in the import.

Accepting new columns

Accepting new columns

From now on, whenever you append new data where the new column is defined and has a value, it will appear in the Data Source and will be available to be consumed from your Pipes.

New column receiving data

New column receiving data

We automatically detect if there are new columns available. If you ingest data periodically into your NDJSON Data Source (from a file or a Kafka connection) and we see that new columns are coming in, you will see a blue dot in the Data Source icon that appears in the sidebar. Click on the Data Source to view the new columns and add them to the schema, following the steps above.

New columns detected, notified by a blue dot

New columns detected, notified by a blue dot