Arrays in ClickHouse store multiple values in a single column, letting you keep related data together without joins or separate tables. This approach works particularly well for event properties, time series measurements, and any data where the number of elements varies from row to row.
This guide covers how to create and manipulate arrays in ClickHouse, from basic operations like filtering and transforming elements to advanced patterns for analytics queries and real-time APIs.
Why use arrays in ClickHouse analytics
ClickHouse treats arrays as first-class data types, which means you can store multiple values in a single column without creating separate tables or complex joins. This approach works well when you have related data that naturally belongs together, like a list of product tags, a series of timestamps, or a collection of user actions.
Arrays can improve query performance because related data stays in the same row. Instead of joining a million-row events table to get a user's action history, you can store those actions as an array and retrieve everything in one read. This matters when you're running analytics queries that need to scan large datasets quickly.
You'll see arrays used for storing event properties, time series measurements, nested JSON attributes, and any data where the number of elements varies from row to row. For example, one user might have three tags while another has ten, and arrays handle this naturally without wasting storage space.
Declaring and inserting array data
Create table with array columns
When you create a table with array columns, you specify the type of data the array holds using Array(Type)
syntax. The Type can be any ClickHouse data type like String
, Int32
, Float64
, or even another array for nested structures.
CREATE TABLE user_events (
user_id String,
event_names Array(String),
event_timestamps Array(DateTime),
event_values Array(Float64)
) ENGINE = MergeTree()
ORDER BY user_id;
All elements in an array have to be the same type. ClickHouse enforces this when you insert data, so you can't mix strings and numbers in the same array. Nested arrays like Array(Array(String))
are possible but less common because they add complexity.
Insert rows with literals and JSONEachRow
You can insert array data using bracket notation [1, 2, 3]
or the array()
function, which do the same thing. Both work in INSERT
statements and SELECT
queries.
INSERT INTO user_events VALUES
('user_123', ['login', 'view_page', 'logout'],
[toDateTime('2024-01-15 10:00:00'), toDateTime('2024-01-15 10:05:00'), toDateTime('2024-01-15 10:15:00')],
[1.0, 2.5, 0.0]);
When you're ingesting JSON data with nested arrays, the JSONEachRow
format automatically parses JSON arrays into ClickHouse array columns as long as your schema matches:
INSERT INTO user_events FORMAT JSONEachRow
{"user_id": "user_456", "event_names": ["signup", "purchase"], "event_timestamps": ["2024-01-16 14:30:00", "2024-01-16 14:35:00"], "event_values": [0.0, 99.99]}
Essential array functions every engineer needs to know
ArrayElement and indexOf
The arrayElement(arr, index)
function accesses elements by their position using 1-based indexing, where the first element is at position 1. Most people use the bracket notation arr[index]
instead because it's shorter and more familiar.
SELECT arrayElement(['apple', 'banana', 'orange'], 2); -- Returns 'banana'
SELECT ['apple', 'banana', 'orange'][1]; -- Returns 'apple'
Negative indices count backward from the end, so -1
gives you the last element and -2
gives you the second-to-last. The indexOf(arr, value)
function returns the 1-based position of the first match, or 0 if the value isn't found.
SELECT ['red', 'green', 'blue'][-1]; -- Returns 'blue'
SELECT indexOf(['red', 'green', 'blue'], 'green'); -- Returns 2
length and arraySlice
The length(arr)
function tells you how many elements are in an array. You can use this to filter rows by array size or calculate statistics about array dimensions.
SELECT length([10, 20, 30, 40]); -- Returns 4
SELECT user_id FROM user_events WHERE length(event_names) > 5;
The arraySlice(arr, offset, length)
function extracts a portion of an array starting at a given position (1-based) and continuing for a specified number of elements. If you ask for more elements than exist, you get everything from the offset to the end.
arrayConcat and arrayDistinct
The arrayConcat(arr1, arr2, ...)
function combines multiple arrays into one, keeping elements in order. You can pass as many arrays as you want, and they'll be joined sequentially.
SELECT arrayConcat([1, 2], [3, 4], [5]); -- Returns [1, 2, 3, 4, 5]
The arrayDistinct(arr)
function removes duplicate elements. The order of elements in the result isn't guaranteed, so if order matters, combine this with arraySort()
.
Filtering data with ClickHouse arrayFilter
The arrayFilter()
function applies a boolean condition to each element and returns a new array with only the elements where the condition is true. The syntax uses lambda expressions: arrayFilter(x -> condition, arr)
where x
represents each element.
SELECT arrayFilter(x -> x > 10, [5, 15, 8, 20, 3]); -- Returns [15, 20]
SELECT arrayFilter(x -> x LIKE '%error%', ['info', 'error_log', 'warning', 'error_fatal']);
-- Returns ['error_log', 'error_fatal']
This comes in handy when you're filtering event arrays based on conditions, like extracting only high-value purchases or specific event types. The lambda variable can have any name, though x
is what most people use.
Boolean predicates on nested events
When you're working with parallel arrays (like separate arrays for event names and values), you can filter based on conditions that reference multiple arrays at once. ClickHouse supports lambda functions with multiple parameters: arrayFilter((x, y) -> condition, arr1, arr2)
.
SELECT arrayFilter((event, value) -> value > 50,
['purchase', 'view', 'purchase', 'click'],
[25.0, 10.0, 75.0, 5.0]);
-- Returns ['purchase'] (the third event where value > 50)
Combining arrayFilter with arrayMap
You can chain array functions to filter and then transform elements in one query. The arrayMap()
function applies a transformation to each element, and when you combine it with arrayFilter()
, you get a complete data pipeline.
SELECT arrayMap(x -> x * 2, arrayFilter(x -> x > 10, [5, 15, 8, 20, 3]));
-- Returns [30, 40] (filters to [15, 20], then doubles each)
Unnesting arrays with ClickHouse array join
The ARRAY JOIN
clause expands arrays into separate rows, creating one output row for each array element.
SELECT user_id, event_name
FROM user_events
ARRAY JOIN event_names AS event_name;
If a row has an array with three elements, ARRAY JOIN
produces three output rows, each with one element. The LEFT ARRAY JOIN
variant keeps rows with empty arrays, producing a single row with NULL values for the array columns.
Lateral view versus array join performance
The ARRAY JOIN
operation in ClickHouse is optimized for expanding arrays and typically performs better than using subqueries with the arrayJoin()
function, with properly configured indexes reducing query time from 0.720 to 0.012 seconds on large datasets.
SELECT user_id, event_name, event_timestamp
FROM user_events
ARRAY JOIN event_names AS event_name, event_timestamps AS event_timestamp;
For queries that aggregate data after expansion, think about whether you actually need to unnest the arrays or if array functions like arraySum()
or arrayFilter()
can produce the same result more efficiently.
Transforming arrays with map reduce and other higher-order helpers
arrayMap for element-wise transformation
The arrayMap()
function applies a transformation to every element in an array, returning a new array with the transformed values. The lambda syntax is arrayMap(x -> expression, arr)
where the expression defines how each element changes.
SELECT arrayMap(x -> x * 2, [1, 2, 3, 4]); -- Returns [2, 4, 6, 8]
SELECT arrayMap(x -> upper(x), ['hello', 'world']); -- Returns ['HELLO', 'WORLD']
This works with any data type and any transformation, including type conversions. You can convert an array of integers to strings or extract specific fields from complex types.
arrayReduce for aggregation in place
The arrayReduce()
function applies any ClickHouse aggregate function to array elements without a GROUP BY
clause. The syntax is arrayReduce('aggregate_function', arr)
where the aggregate function name is a string.
SELECT arrayReduce('sum', [10, 20, 30, 40]); -- Returns 100
SELECT arrayReduce('avg', [5.5, 10.5, 15.5]); -- Returns 10.5
This is useful when you want to aggregate array values within a single row rather than across multiple rows. Any aggregate function that ClickHouse supports, including custom ones, works with arrayReduce()
.
arrayZip and arrayEnumerate
The arrayZip()
function combines multiple arrays element-by-element, creating an array of tuples. Each tuple contains one element from each input array at the same position, and the result length matches the shortest input array.
SELECT arrayZip(['a', 'b', 'c'], [1, 2, 3]);
-- Returns [('a', 1), ('b', 2), ('c', 3)]
The arrayEnumerate()
function adds sequential index numbers to array elements, returning an array of integers from 1 to the array length. This helps when you need to track element positions during transformations, including scenarios like forward and backfilling values using array functions.
Aggregating arrays for metrics and dashboards
groupArray and groupUniqArray
The groupArray()
aggregate function collects values from multiple rows into a single array during aggregation. This is useful for creating lists of related items grouped by a common dimension.
SELECT user_id, groupArray(event_name) AS all_events
FROM user_events
ARRAY JOIN event_names AS event_name
GROUP BY user_id;
You can limit the array size by passing a maximum length parameter: groupArray(100)(column)
collects up to 100 elements. The groupUniqArray()
function works the same way but only includes distinct values, automatically removing duplicates.
arraySum arrayAvg arrayMin arrayMax
ClickHouse provides built-in aggregate functions that work directly on array columns without arrayReduce()
. These functions are optimized for common operations and easier to read, contributing to faster dashboard performance.
SELECT user_id,
arraySum(event_values) AS total_value,
arrayAvg(event_values) AS avg_value,
arrayMin(event_values) AS min_value,
arrayMax(event_values) AS max_value
FROM user_events;
These functions only work with numeric arrays and return a single numeric value per row. For non-numeric aggregations or custom logic, use arrayReduce()
instead.
Performance tips for large arrays at scale
Skip indexes and sparse columns
When you filter on array contents using functions like has(arr, value)
, ClickHouse can use bloom filter skip indexes to avoid reading unnecessary data blocks. Create a bloom filter index on array columns that appear frequently in WHERE clauses.
CREATE TABLE events (
user_id String,
tags Array(String),
INDEX tags_idx tags TYPE bloom_filter GRANULARITY 1
) ENGINE = MergeTree()
ORDER BY user_id;
Arrays with many NULL or empty values can be stored efficiently using the Sparse
column codec, which compresses runs of default values. This reduces storage and improves query performance when most rows have empty arrays.
Avoiding arrayJoin on hot paths
The arrayJoin()
function and ARRAY JOIN
clause can multiply the number of rows processed, which impacts query performance on large datasets. When possible, use array functions like arrayFilter()
, arrayMap()
, or aggregate functions to work with arrays without expansion.
For frequently-executed queries, consider pre-computing unnested data in a materialized view rather than performing ARRAY JOIN
operations on every query. This trades storage space for query speed.
Create real-time array-processing anaytics APIs with Tinybird
Tinybird provides a managed ClickHouse service with infrastructure and tooling for local development, streaming ingestion, and API hosting. You can use Tinybird to deploy ClickHouse queries as hosted and scalable REST APIs that integrate with your app.
Install the Tinybird CLI and start a local ClickHouse container for development:
1. Set up local workspace
curl -L tinybird.co | sh
tb login
tb local start
The local container runs Tinybird's managed ClickHouse on your machine, letting you develop and test queries before deploying to Tinybird Cloud.
2. Create data source with NDJSON events
Create a sample data file called events.ndjson
with array columns:
{"user_id": "user_001", "event_names": ["login", "view_dashboard", "logout"], "event_values": [0.0, 1.5, 0.0]}
{"user_id": "user_002", "event_names": ["signup", "add_payment", "purchase"], "event_values": [0.0, 0.0, 99.99]}
{"user_id": "user_003", "event_names": ["login", "search", "view_product"], "event_values": [0.0, 0.0, 0.5]}
Create the data source and load the data:
tb datasource create events --file events.ndjson
tb build
tb datasource append events --file events.ndjson
3. Build pipe with arrayFilter and arrayJoin
Create a file called high_value_events.pipe
that filters and expands arrays:
TOKEN high_value_events_read READ
NODE filter_events
SQL >
%
SELECT
user_id,
arrayFilter((name, value) -> value > {{Float64(min_value, 1.0)}},
event_names,
event_values) AS filtered_names,
arrayFilter(value -> value > {{Float64(min_value, 1.0)}},
event_values) AS filtered_values
FROM events
WHERE length(filtered_names) > 0
NODE expand_events
SQL >
%
SELECT
user_id,
event_name,
event_value
FROM filter_events
ARRAY JOIN filtered_names AS event_name, filtered_values AS event_value
ORDER BY event_value DESC
LIMIT {{Int32(limit, 10)}}
TYPE endpoint
This pipe filters events based on a minimum value parameter, then expands the results into individual rows.
4. Deploy and test API endpoint
Deploy your data source and pipe to Tinybird Cloud:
tb --cloud deploy
Get the API endpoint URL with an example curl command:
curl "https://api.us-east.tinybird.co/v0/pipes/high_value_events.json?min_value=50&limit=5&token=p.ey..."
Test the API with different parameters:
curl "https://api.us-east.tinybird.co/v0/pipes/high_value_events.json?min_value=50&limit=5&token=p.ey..."
The response includes filtered and expanded events as JSON with metadata about the query execution.
Ship analytics faster with Tinybird managed ClickHouse
Tinybird handles ClickHouse infrastructure so developers can focus on building features rather than managing databases. The platform automatically scales compute and storage, optimizes query performance, and provides built-in monitoring without dedicated DevOps resources.
For workloads with heavy array processing, Tinybird's managed service handles memory management and optimization that would otherwise require manual tuning. The platform also provides streaming ingestion from sources like Kafka and a simple HTTP streaming endpoint, making it easy to get array data into ClickHouse in real time.
To get started, sign up for a free Tinybird account and follow the documentation to create your first data source with array columns. The CLI supports local development with instant feedback, and deployment to production takes seconds.
FAQs about ClickHouse array functions
What is the maximum array length in ClickHouse?
ClickHouse arrays are limited by available memory rather than a hard element count. Large arrays may impact query performance and storage efficiency, so consider alternative approaches like separate tables for arrays with thousands of elements.
How do higher-order functions perform compared to normal loops?
Higher-order functions like arrayMap()
and arrayFilter()
are vectorized and typically faster than equivalent subqueries or joins for array processing. ClickHouse optimizes these operations at the query execution level.
Can I index elements inside an array column?
ClickHouse supports bloom filter skip indexes on array contents using the has()
function with a default false positive rate of 2.5%, but individual element indexing requires array unnesting. Bloom filters help skip data blocks that don't contain specific array values.
Does array join always process elements sequentially?
ARRAY JOIN
operations can be parallelized across multiple threads when processing large datasets, though individual array expansion happens sequentially. Query performance depends on array sizes and the number of rows being processed.
How do I stream nested arrays from Kafka into ClickHouse?
Use JSONEachRow
format with properly typed array columns in your table schema, ensuring JSON structure matches expected array types. Tinybird provides managed Kafka connectors that handle schema validation and streaming ingestion automatically.