JSON data type
BETA
¶

The JSON data type is in private beta. If you are interested in using this type, contact Tinybird at support@tinybird.co or in the Community Slack.

The JSON data type stores JavaScript Object Notation (JSON) documents within a single column. It allows for flexible schema-on-read operations, where individual fields can be accessed as subcolumns.

Syntax¶

To declare a column of JSON type, use the following syntax:

<column_name> JSON([max_dynamic_paths=N, max_dynamic_types=M, some.path TypeName, SKIP path.to.skip, SKIP REGEXP 'paths_regexp']) [jsonPath]

Arguments¶

max_dynamic_paths: Optional. An integer indicating the maximum number of distinct paths that can be stored separately as subcolumns within a single data block (e.g., a single data part for a MergeTree table). If this limit is exceeded, additional paths will be grouped together in a single structure. Default value is 1024.
max_dynamic_types: Optional. An integer between 1 and 255 indicating the maximum number of different data types that can be stored within a single path column with a Dynamic type across a single data block. If this limit is exceeded, any new types will be converted to String. Default value is 32.
some.path TypeName: Optional. A type hint for a specific path within the JSON document. Paths specified with a type hint will always be stored as subcolumns with the given TypeName.
SKIP path.to.skip: Optional. A hint for a specific path that should be ignored during JSON parsing. Such paths will not be stored in the JSON column. If the specified path is a nested JSON object, the entire nested object will be skipped.
SKIP REGEXP 'paths_regexp': Optional. A regular expression hint used to skip paths during JSON parsing. Any path matching this regular expression will not be stored in the JSON column.

Returns¶

A column capable of storing JSON documents, allowing access to its fields as subcolumns.

Creating a Data Source with JSON columns¶

You can create a data source with JSON columns by using the JSON data type in your schema definition.

test.datasource

SCHEMA >
    `json` JSON `json:$`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "tuple()"

tb push datasources/test.datasource
echo '{"a" : {"b" : 42}, "c" : [1, 2, 3]}\n{"f" : "Hello, World!"}\n{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}' > test.ndjson
tb datasource append test test.ndjson
tb sql 'select * from test'

Result:

---------------------------------------------
| json                                      |
---------------------------------------------
| {'a': {'b': 42}, 'c': [1, 2, 3]}          |
| {'f': 'Hello, World!'}                    |
| {'a': {'b': 43, 'e': 10}, 'c': [4, 5, 6]} |
---------------------------------------------

You can also specify type hints and paths to skip directly in the data source schema:

test.datasource

SCHEMA >
    `json` JSON(a.b UInt32, SKIP a.e) `json:$`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "tuple()"

tb push datasources/test.datasource
echo '{"a" : {"b" : 42}, "c" : [1, 2, 3]}\n{"f" : "Hello, World!"}\n{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}' > test.ndjson
tb datasource append test test.ndjson
tb sql 'select * from test'

Result:

-----------------------------------------
| json                                  |
-----------------------------------------
| {'a': {'b': 42}, 'c': [1, 2, 3]}      |
| {'a': {'b': 0}, 'f': 'Hello, World!'} |
| {'a': {'b': 43}, 'c': [4, 5, 6]}      |
-----------------------------------------

Creating JSON in a Query¶

You can create a JSON value directly in a query by casting a String literal.

Example¶

SELECT '{"a" : {"b" : 42},"c" : [1, 2, 3], "d" : "Hello, World!"}'::JSON AS json

Result:

┌─json───────────────────────────────────────────┐
│ {"a":{"b":42},"c":[1,2,3],"d":"Hello, World!"} │
└────────────────────────────────────────────────┘

To cast from Tuple or Map to JSON, you first need to cast the column to a String containing the JSON object, and then deserialize it back to the JSON type.

Reading JSON paths as subcolumns¶

The JSON type allows you to access individual fields as subcolumns using dot notation. If a path's type was not explicitly specified in the JSON type declaration, its subcolumn will have a Dynamic type.

First, create and populate a data source for the examples:

test.datasource

SCHEMA >
    `json` JSON(a.b UInt32, SKIP a.e) `json:$`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "tuple()"

tb push datasources/test.datasource
echo '{"a" : {"b" : 42, "g" : 42.42}, "c" : [1, 2, 3], "d" : "2020-01-01"}\n{"f" : "Hello, World!", "d" : "2020-01-02"}\n{"a" : {"b" : 43, "e" : 10, "g" : 43.43}, "c" : [4, 5, 6]}' > test.ndjson
tb datasource append test test.ndjson

Example: Accessing subcolumns with dot notation¶

SELECT json.a.b, json.a.g, json.c, json.d FROM test

Result:

┌─json.a.b─┬─json.a.g─┬─json.c──┬─json.d─────┐
│       42 │ 42.42    │ [1,2,3] │ 2020-01-01 │
│        0 │ ᴺᵁᴸᴸ     │ ᴺᵁᴸᴸ    │ 2020-01-02 │
│       43 │ 43.43    │ [4,5,6] │ ᴺᵁᴸᴸ       │
└──────────┴──────────┴─────────┴────────────┘

If you encounter issues with dot notation, you can use the getSubcolumn(json,'path') syntax as an alternative.

Example: Accessing subcolumns with `getSubcolumn()`¶

SELECT getSubcolumn(json, 'a.b'), getSubcolumn(json, 'a.g'), getSubcolumn(json, 'c'), getSubcolumn(json, 'd') FROM test

Result:

┌─getSubcolumn(json, 'a.b')─┬─getSubcolumn(json, 'a.g')─┬─getSubcolumn(json, 'c')─┬─getSubcolumn(json, 'd')─────┐
│                        42 │                     42.42 │                 [1,2,3] │                  2020-01-01 │
│                         0 │                  ᴺᵁᴸᴸ     │                 ᴺᵁᴸᴸ    │                  2020-01-02 │
│                        43 │                  43.43    │                 [4,5,6] │                  ᴺᵁᴸᴸ       │
└───────────────────────────┴───────────────────────────┴─────────────────────────┴─────────────────────────────┘

If a requested path is not found in the data, it will be filled with NULL values.

Example: Accessing a non-existent path¶

SELECT json.non.existing.path FROM test

Result:

┌─json.non.existing.path─┐
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
└────────────────────────┘

You can inspect the data types of the returned subcolumns.

Example: Checking subcolumn types¶

SELECT toTypeName(json.a.b), toTypeName(json.a.g), toTypeName(json.c), toTypeName(json.d) FROM test

Result:

┌─toTypeName(json.a.b)─┬─toTypeName(json.a.g)─┬─toTypeName(json.c)─┬─toTypeName(json.d)─┐
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
└──────────────────────┴──────────────────────┴────────────────────┴────────────────────┘

In this example, json.a.b has type UInt32 as specified in the JSON type declaration, while other subcolumns are Dynamic.

You can also read Dynamic subcolumns by explicitly specifying a target type using the .:TypeName syntax.

Example: Reading `Dynamic` subcolumns with type hints¶

SELECT json.a.g.:Float64, dynamicType(json.a.g), json.d.:Date, dynamicType(json.d) FROM test

Result:

┌─json.a.g.:`Float64`─┬─dynamicType(json.a.g)─┬─json.d.:`Date`─┬─dynamicType(json.d)─┐
│               42.42 │ Float64               │     2020-01-01 │ Date                │
│                ᴺᵁᴸᴸ │ None                  │     2020-01-02 │ Date                │
│               43.43 │ Float64               │           ᴺᵁᴸᴸ │ None                │
└─────────────────────┴───────────────────────┴────────────────┴─────────────────────┘

Dynamic subcolumns can be cast to any data type. An exception will be thrown if the internal type within Dynamic cannot be cast to the requested type.

Example: Casting a `Dynamic` subcolumn¶

SELECT json.a.g::UInt64 as uint FROM test

Result:

┌─uint─┐
│   42 │
│    0 │
│   43 │
└──────┘

Reading JSON sub-objects as subcolumns¶

The JSON type supports reading nested objects as subcolumns of type JSON using the special syntax json.^some.path.

First, create and populate a data source for the examples:

test.datasource

SCHEMA >
    `json` JSON `json:$`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "tuple()"

tb push datasources/test.datasource
echo '{"a" : {"b" : {"c" : 42, "g" : 42.42}}, "c" : [1, 2, 3], "d" : {"e" : {"f" : {"g" : "Hello, World", "h" : [1, 2, 3]}}}}\n{"f" : "Hello, World!", "d" : {"e" : {"f" : {"h" : [4, 5, 6]}}}}\n{"a" : {"b" : {"c" : 43, "e" : 10, "g" : 43.43}}, "c" : [4, 5, 6]}' > test.ndjson
tb datasource append test test.ndjson

Example¶

SELECT json.^a.b, json.^d.e.f FROM test

Result:

┌─json.^`a`.b───────────────┬─json.^`d`.e.f────────────────────┐
│ {"c":42,"g":42.42}        │ {"g":"Hello, World","h":[1,2,3]} │
│ {}                        │ {"h":[4,5,6]}                    │
│ {"c":43,"e":10,"g":43.43} │ {}                               │
└───────────────────────────┴──────────────────────────────────┘

Reading sub-objects as subcolumns can be less efficient, as it may require a near-full scan of the JSON data.

Handling arrays of JSON objects¶

JSON paths containing an array of objects are parsed as type Array(JSON) and inserted into a Dynamic column for that path. To read an array of objects, you can extract it from the Dynamic column as a subcolumn.

First, create and populate a data source for the examples:

test.datasource

SCHEMA >
    `json` JSON `json:$`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "tuple()"

tb push datasources/test.datasource
echo '{"a" : {"b" : [{"c" : 42, "d" : "Hello", "f" : [[{"g" : 42.42}]], "k" : {"j" : 1000}}, {"c" : 43}, {"e" : [1, 2, 3], "d" : "My", "f" : [[{"g" : 43.43, "h" : "2020-01-01"}]],  "k" : {"j" : 2000}}]}}\n{"a" : {"b" : [1, 2, 3]}}\n{"a" : {"b" : [{"c" : 44, "f" : [[{"h" : "2020-01-02"}]]}, {"e" : [4, 5, 6], "d" : "World", "f" : [[{"g" : 44.44}]],  "k" : {"j" : 3000}}]}}' > test.ndjson
tb datasource append test test.ndjson

Example: Reading an array of JSON objects¶

SELECT json.a.b, dynamicType(json.a.b) FROM test

Result:

┌─json.a.b──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─dynamicType(json.a.b)────────────────────────────────────┐
│ ['{"c":"42","d":"Hello","f":[[{"g":42.42}]],"k":{"j":"1000"}}','{"c":"43"}','{"d":"My","e":["1","2","3"],"f":[[{"g":43.43,"h":"2020-01-01"}]],"k":{"j":"2000"}}'] │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
│ [1,2,3]                                                                                                                                                           │ Array(Nullable(Int64))                                   │
│ ['{"c":"44","f":[[{"h":"2020-01-02"}]]}','{"d":"World","e":["4","5","6"],"f":[[{"g":44.44}]],"k":{"j":"3000"}}']                                                  │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┘

Note that the max_dynamic_types and max_dynamic_paths parameters for nested JSON types are reduced from their default values to prevent an uncontrolled growth in the number of subcolumns for nested arrays of JSON objects.

Example: Reading subcolumns from a nested `JSON` array¶

SELECT json.a.b.:`Array(JSON)`.c, json.a.b.:`Array(JSON)`.f, json.a.b.:`Array(JSON)`.d FROM test

Result:

┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘

You can simplify the syntax by using [] after the path to indicate an array level, avoiding the explicit Array(JSON) subcolumn name.

Example: Simplified syntax for nested `JSON` arrays¶

SELECT json.a.b[].c, json.a.b[].f, json.a.b[].d FROM test

Result:

┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘

The number of [] after a path indicates the array level. For instance, json.path[][] is transformed to json.path.:Array(Array(JSON)).

Example: Reading subcolumns from deeply nested `JSON` arrays¶

SELECT json.a.b[].c.:Int64, json.a.b[].f[][].g.:Float64, json.a.b[].f[][].h.:Date FROM test

Result:

┌─json.a.b.:`Array(JSON)`.c.:`Int64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.g.:`Float64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.h.:`Date`─┐
│ [42,43,NULL]                       │ [[[42.42]],[],[[43.43]]]                                     │ [[[NULL]],[],[['2020-01-01']]]                            │
│ []                                 │ []                                                           │ []                                                        │
│ [44,NULL]                          │ [[[NULL]],[[44.44]]]                                         │ [[['2020-01-02']],[[NULL]]]                               │
└────────────────────────────────────┴──────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘

You can also read sub-object subcolumns from a nested JSON array.

Example: Reading sub-objects from a nested `JSON` array¶

SELECT json.a.b[].^k FROM test

Result:

┌─json.a.b.:`Array(JSON)`.^`k`─────────┐
│ ['{"j":"1000"}','{}','{"j":"2000"}'] │
│ []                                   │
│ ['{}','{"j":"3000"}']                │
└──────────────────────────────────────┘

Data types

Table engines

Functions

ClickHouse errors

Function-specific issues

Performance issues

Schema and typing

Data ingestion

JSON data type
BETA
¶

Syntax¶

Arguments¶

Returns¶

Creating a Data Source with JSON columns¶

test.datasource

test.datasource

Creating JSON in a Query¶

Example¶

Reading JSON paths as subcolumns¶

test.datasource

Example: Accessing subcolumns with dot notation¶

Example: Accessing subcolumns with `getSubcolumn()`¶

Example: Accessing a non-existent path¶

Example: Checking subcolumn types¶

Example: Reading `Dynamic` subcolumns with type hints¶

Example: Casting a `Dynamic` subcolumn¶

Reading JSON sub-objects as subcolumns¶

test.datasource

Example¶

Handling arrays of JSON objects¶

test.datasource

Example: Reading an array of JSON objects¶

Example: Reading subcolumns from a nested `JSON` array¶

Example: Simplified syntax for nested `JSON` arrays¶

Example: Reading subcolumns from deeply nested `JSON` arrays¶

Example: Reading sub-objects from a nested `JSON` array¶

JSON data type BETA¶

Syntax¶

Arguments¶

Returns¶

Creating a Data Source with JSON columns¶

test.datasource

test.datasource

Creating JSON in a Query¶

Example¶

Reading JSON paths as subcolumns¶

test.datasource

Example: Accessing subcolumns with dot notation¶

Example: Accessing subcolumns with getSubcolumn()¶

Example: Accessing a non-existent path¶

Example: Checking subcolumn types¶

Example: Reading Dynamic subcolumns with type hints¶

Example: Casting a Dynamic subcolumn¶

Reading JSON sub-objects as subcolumns¶

test.datasource

Example¶

Handling arrays of JSON objects¶

test.datasource

Example: Reading an array of JSON objects¶

Example: Reading subcolumns from a nested JSON array¶

Example: Simplified syntax for nested JSON arrays¶

Example: Reading subcolumns from deeply nested JSON arrays¶

Example: Reading sub-objects from a nested JSON array¶

JSON data type
BETA
¶

Example: Accessing subcolumns with `getSubcolumn()`¶

Example: Reading `Dynamic` subcolumns with type hints¶

Example: Casting a `Dynamic` subcolumn¶

Example: Reading subcolumns from a nested `JSON` array¶

Example: Simplified syntax for nested `JSON` arrays¶

Example: Reading subcolumns from deeply nested `JSON` arrays¶

Example: Reading sub-objects from a nested `JSON` array¶