---
title: Split and Merge Functions reference
meta:
    description: Functions for splitting and merging data.
headingMaxLevels: 2
---

# Functions for splitting strings

The following functions are used to split strings into substrings.

## splitByChar

Splits a string into an array of substrings using a single specified character as the delimiter. Empty substrings are included if the separator appears at the beginning, end, or multiple times consecutively.

### Syntax

```sql
splitByChar(separator, s[, max_substrings])
```

### Arguments

- `separator`: String. The single character used to divide the string.
- `s`: String. The input string to be split.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, representing the split substrings. Array(String).

### Example

```sql
SELECT splitByChar(',', '1,2,3,abcde')
```

Result:

```result
┌─splitByChar(',', '1,2,3,abcde')─┐
│ ['1','2','3','abcde']           │
└─────────────────────────────────┘
```

## splitByString

Divides a string into an array of substrings using a specified string as the delimiter. If the `separator` string is empty, the function splits the input string into individual characters.

### Syntax

```sql
splitByString(separator, s[, max_substrings])
```

### Arguments

- `separator`: String. The string used to divide the input string.
- `s`: String. The input string to be split.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, representing the split substrings. Array(String).

{% callout type="info" %}
Setting `splitby_max_substrings_includes_remaining_string` (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.
{% /callout %}

### Example

```sql
SELECT splitByString(', ', '1, 2 3, 4,5, abcde')
```

Result:

```result
┌─splitByString(', ', '1, 2 3, 4,5, abcde')─┐
│ ['1','2 3','4,5','abcde']                 │
└───────────────────────────────────────────┘
```

```sql
SELECT splitByString('', 'abcde')
```

Result:

```result
┌─splitByString('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘
```

## splitByRegexp

Splits a string into an array of substrings based on matches to a regular expression. If the regular expression is empty, the string is split into individual characters. If no match is found, the string is not split.

### Syntax

```sql
splitByRegexp(regexp, s[, max_substrings])
```

### Arguments

- `regexp`: String or FixedString. The regular expression used as the delimiter. Must be a constant.
- `s`: String. The input string to be split.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, representing the split substrings. Array(String).

{% callout type="info" %}
Setting `splitby_max_substrings_includes_remaining_string` (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.
{% /callout %}

### Example

```sql
SELECT splitByRegexp('\\d+', 'a12bc23de345f')
```

Result:

```result
┌─splitByRegexp('\\d+', 'a12bc23de345f')─┐
│ ['a','bc','de','f']                    │
└────────────────────────────────────────┘
```

```sql
SELECT splitByRegexp('', 'abcde')
```

Result:

```result
┌─splitByRegexp('', 'abcde')─┐
│ ['a','b','c','d','e']      │
└────────────────────────────┘
```

## splitByWhitespace

Splits a string into an array of substrings, using any sequence of whitespace characters as the delimiter.

### Syntax

```sql
splitByWhitespace(s[, max_substrings])
```

### Arguments

- `s`: String. The input string to be split.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, representing the split substrings. Array(String).

{% callout type="info" %}
Setting `splitby_max_substrings_includes_remaining_string` (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.
{% /callout %}

### Example

```sql
SELECT splitByWhitespace('  1!  a,  b.  ')
```

Result:

```result
┌─splitByWhitespace('  1!  a,  b.  ')─┐
│ ['1!','a,','b.']                    │
└─────────────────────────────────────┘
```

## splitByNonAlpha

Splits a string into an array of substrings, using any non-alphabetic characters (whitespace or punctuation) as delimiters.

### Syntax

```sql
splitByNonAlpha(s[, max_substrings])
```

### Arguments

- `s`: String. The input string to be split.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, representing the split substrings. Array(String).

{% callout type="info" %}
Setting `splitby_max_substrings_includes_remaining_string` (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.
{% /callout %}

### Example

```sql
SELECT splitByNonAlpha('  1!  a,  b.  ')
```

Result:

```result
┌─splitByNonAlpha('  1!  a,  b.  ')─┐
│ ['1','a','b']                     │
└───────────────────────────────────┘
```

## arrayStringConcat

Concatenates all elements of an array into a single string, optionally separated by a specified delimiter.

### Syntax

```sql
arrayStringConcat(arr[, separator])
```

### Arguments

- `arr`: Array. The array of values to concatenate.
- `separator`: String. An optional string to insert between array elements. Defaults to an empty string if not provided.

### Returns

A single string resulting from the concatenation. String.

### Example

```sql
SELECT arrayStringConcat(['12/05/2021', '12:50:00'], ' ') AS DateString
```

Result:

```result
┌─DateString──────────┐
│ 12/05/2021 12:50:00 │
└─────────────────────┘
```

## alphaTokens

Extracts sequences of consecutive alphabetic characters (a-z, A-Z) from a string into an array of substrings.

### Syntax

```sql
alphaTokens(s[, max_substrings])
```

Alias: `splitByAlpha`

### Arguments

- `s`: String. The input string to extract tokens from.
- `max_substrings`: Int64. An optional parameter. If greater than 0, the function returns at most this many substrings. Otherwise, it returns all possible substrings.

### Returns

An array of strings, containing only the alphabetic tokens. Array(String).

{% callout type="info" %}
Setting `splitby_max_substrings_includes_remaining_string` (default: 0) controls if the remaining string is included in the last element of the result array when argument `max_substrings` > 0.
{% /callout %}

### Example

```sql
SELECT alphaTokens('abca1abc')
```

Result:

```result
┌─alphaTokens('abca1abc')─┐
│ ['abca','abc']          │
└─────────────────────────┘
```

## extractAllGroups

Extracts all non-overlapping matches of a regular expression from a string, returning each capturing group as a separate element within nested arrays.

### Syntax

```sql
extractAllGroups(text, regexp)
```

### Arguments

- `text`: String or FixedString. The input string to search within.
- `regexp`: String or FixedString. The regular expression to match. Must be a constant.

### Returns

An array of arrays of strings. Each inner array contains the captured groups for a single match. If no matches are found, an empty array is returned. Array(Array(String)).

### Example

```sql
SELECT extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')
```

Result:

```result
┌─extractAllGroups('abc=123, 8="hkl"', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','123'],['8','"hkl"']]                                         │
└───────────────────────────────────────────────────────────────────────┘
```

## ngrams

Splits a UTF-8 encoded string into an array of n-grams, where each n-gram is a substring of a specified length.

### Syntax

```sql
ngrams(string, ngramsize)
```

### Arguments

- `string`: String or FixedString. The input string to generate n-grams from.
- `ngramsize`: UInt. The desired length of each n-gram.

### Returns

An array of strings, where each string is an n-gram. Array(String).

### Example

```sql
SELECT ngrams('Tinybird', 3)
```

Result:

```result
┌─ngrams('Tinybird', 3)─────────────────┐
│ ['Tin','iny','nyb','ybi','bir','ird'] │
└───────────────────────────────────────┘
```

## tokens

Splits a string into an array of tokens, using any non-alphanumeric ASCII characters as delimiters.

### Syntax

```sql
tokens(input_string)
```

### Arguments

- `input_string`: String. The input string to tokenize.

### Returns

An array of strings, representing the extracted tokens. Array.

### Example

```sql
SELECT tokens('test1,;\\ test2,;\\ test3,;\\   test4') AS tokens
```

Result:

```result
┌─tokens────────────────────────────┐
│ ['test1','test2','test3','test4'] │
└───────────────────────────────────┘
```
