---
title: String Functions reference
meta:
    description: Functions for manipulating and analyzing string data.
headingMaxLevels: 2
---

# Functions for Working with Strings

Functions for [searching](string-search-functions) in strings and for [replacing](string-replace-functions) in strings are described separately.

## empty

Determines if a string is empty. A string is considered non-empty if it contains at least one byte, including spaces or null bytes. This function also works for arrays and UUIDs.

### Syntax

```sql
empty(x)
```

### Arguments

- `x`: The input value to check. [String](../data-types/string).

### Returns

`1` if the string is empty, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT empty('')
```

Result:

```result
┌─empty('')─┐
│         1 │
└───────────┘
```

## notEmpty

Determines if a string is not empty. A string is considered non-empty if it contains at least one byte, including spaces or null bytes. This function also works for arrays and UUIDs.

### Syntax

```sql
notEmpty(x)
```

### Arguments

- `x`: The input value to check. [String](../data-types/string).

### Returns

`1` if the string is not empty, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT notEmpty('text')
```

Result:

```result
┌─notEmpty('text')─┐
│                1 │
└──────────────────┘
```

## length

Calculates the length of a string in bytes. This function does not count characters or Unicode code points. It can also be used with arrays.

Alias: `OCTET_LENGTH`

### Syntax

```sql
length(s)
```

### Arguments

- `s`: The input string or array. [String](../data-types/string)/[Array](../data-types/array).

### Returns

The length of the string or array `s` in bytes. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT length('Hello, world!')
```

Result:

```result
┌─length('Hello, world!')─┐
│                      13 │
└─────────────────────────┘
```

```sql
SELECT length([1, 2, 3, 4])
```

Result:

```result
┌─length([1, 2, 3, 4])─┐
│                    4 │
└──────────────────────┘
```

## lengthUTF8

Calculates the length of a string in Unicode code points, assuming the string is valid UTF-8 encoded text. If the string is not valid UTF-8, the result is undefined.

Aliases:
- `CHAR_LENGTH`
- `CHARACTER_LENGTH`

### Syntax

```sql
lengthUTF8(s)
```

### Arguments

- `s`: The input string, expected to be valid UTF-8. [String](../data-types/string).

### Returns

The length of the string `s` in Unicode code points. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT lengthUTF8('Здравствуй, мир!')
```

Result:

```result
┌─lengthUTF8('Здравствуй, мир!')─┐
│                             16 │
└────────────────────────────────┘
```

## left

Extracts a substring from the beginning of a string, based on a specified byte offset.

### Syntax

```sql
left(s, offset)
```

### Arguments

- `s`: The source string. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `offset`: The number of bytes to extract from the left. A negative value extracts `length(s) - |offset|` bytes. [(U)Int*](../data-types/int-uint).

### Returns

A substring of `s`. [String](../data-types/string).

### Example

```sql
SELECT left('Hello', 3)
```

Result:

```result
Hel
```

```sql
SELECT left('Hello', -3)
```

Result:

```result
He
```

## leftUTF8

Extracts a substring from the beginning of a UTF-8 encoded string, based on a specified Unicode code point offset.

### Syntax

```sql
leftUTF8(s, offset)
```

### Arguments

- `s`: The source UTF-8 encoded string. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `offset`: The number of Unicode code points to extract from the left. A negative value extracts `length(s) - |offset|` code points. [(U)Int*](../data-types/int-uint).

### Returns

A substring of `s`. [String](../data-types/string).

### Example

```sql
SELECT leftUTF8('Привет', 4)
```

Result:

```result
Прив
```

```sql
SELECT leftUTF8('Привет', -4)
```

Result:

```result
Пр
```

## leftPad

Adds padding characters to the left side of a string until it reaches a specified total length. If the string is already longer than the target length, it will be truncated.

Alias: `LPAD`

### Syntax

```sql
leftPad(string, length[, pad_string])
```

### Arguments

- `string`: The input string to pad. [String](../data-types/string).
- `length`: The desired total length of the resulting string. If `length` is less than the input string's length, the input string is truncated. [UInt or Int](../data-types/int-uint).
- `pad_string`: The character(s) to use for padding. Optional. Defaults to a space character if not specified. [String](../data-types/string).

### Returns

The left-padded string. [String](../data-types/string).

### Example

```sql
SELECT leftPad('abc', 7, '*'), leftPad('def', 7)
```

Result:

```result
┌─leftPad('abc', 7, '*')─┬─leftPad('def', 7)─┐
│ ****abc                │     def           │
└────────────────────────┴───────────────────┘
```

## leftPadUTF8

Adds padding characters to the left side of a UTF-8 string until it reaches a specified total length, counting length in Unicode code points. If the string is already longer than the target length, it will be truncated.

### Syntax

```sql
leftPadUTF8(string, length[, pad_string])
```

### Arguments

- `string`: The input string to pad. [String](../data-types/string).
- `length`: The desired total length of the resulting string in Unicode code points. If `length` is less than the input string's length, the input string is truncated. [UInt or Int](../data-types/int-uint).
- `pad_string`: The character(s) to use for padding. Optional. Defaults to a space character if not specified. [String](../data-types/string).

### Returns

The left-padded string. [String](../data-types/string).

### Example

```sql
SELECT leftPadUTF8('абвг', 7, '*'), leftPadUTF8('дежз', 7)
```

Result:

```result
┌─leftPadUTF8('абвг', 7, '*')─┬─leftPadUTF8('дежз', 7)─┐
│ ***абвг                     │    дежз                │
└─────────────────────────────┴────────────────────────┘
```

## right

Extracts a substring from the end of a string, based on a specified byte offset.

### Syntax

```sql
right(s, offset)
```

### Arguments

- `s`: The source string. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `offset`: The number of bytes to extract from the right. A negative value extracts `length(s) - |offset|` bytes. [(U)Int*](../data-types/int-uint).

### Returns

A substring of `s`. [String](../data-types/string).

### Example

```sql
SELECT right('Hello', 3)
```

Result:

```result
llo
```

```sql
SELECT right('Hello', -3)
```

Result:

```result
lo
```

## rightUTF8

Extracts a substring from the end of a UTF-8 encoded string, based on a specified Unicode code point offset.

### Syntax

```sql
rightUTF8(s, offset)
```

### Arguments

- `s`: The source UTF-8 encoded string. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `offset`: The number of Unicode code points to extract from the right. A negative value extracts `length(s) - |offset|` code points. [(U)Int*](../data-types/int-uint).

### Returns

A substring of `s`. [String](../data-types/string).

### Example

```sql
SELECT rightUTF8('Привет', 4)
```

Result:

```result
ивет
```

```sql
SELECT rightUTF8('Привет', -4)
```

Result:

```result
ет
```

## rightPad

Adds padding characters to the right side of a string until it reaches a specified total length. If the string is already longer than the target length, it will be truncated.

Alias: `RPAD`

### Syntax

```sql
rightPad(string, length[, pad_string])
```

### Arguments

- `string`: The input string to pad. [String](../data-types/string).
- `length`: The desired total length of the resulting string. If `length` is less than the input string's length, the input string is truncated. [UInt or Int](../data-types/int-uint).
- `pad_string`: The character(s) to use for padding. Optional. Defaults to a space character if not specified. [String](../data-types/string).

### Returns

The right-padded string. [String](../data-types/string).

### Example

```sql
SELECT rightPad('abc', 7, '*'), rightPad('abc', 7)
```

Result:

```result
┌─rightPad('abc', 7, '*')─┬─rightPad('abc', 7)─┐
│ abc****                 │ abc                │
└─────────────────────────┴────────────────────┘
```

## rightPadUTF8

Adds padding characters to the right side of a UTF-8 string until it reaches a specified total length, counting length in Unicode code points. If the string is already longer than the target length, it will be truncated.

### Syntax

```sql
rightPadUTF8(string, length[, pad_string])
```

### Arguments

- `string`: The input string to pad. [String](../data-types/string).
- `length`: The desired total length of the resulting string in Unicode code points. If `length` is less than the input string's length, the input string is truncated. [UInt or Int](../data-types/int-uint).
- `pad_string`: The character(s) to use for padding. Optional. Defaults to a space character if not specified. [String](../data-types/string).

### Returns

The right-padded string. [String](../data-types/string).

### Example

```sql
SELECT rightPadUTF8('абвг', 7, '*'), rightPadUTF8('абвг', 7)
```

Result:

```result
┌─rightPadUTF8('абвг', 7, '*')─┬─rightPadUTF8('абвг', 7)─┐
│ абвг***                      │ абвг                    │
└──────────────────────────────┴─────────────────────────┘
```

## compareSubstrings

Compares two substrings lexicographically, starting from specified offsets and comparing a given number of bytes.

### Syntax

```sql
compareSubstrings(string1, string2, string1_offset, string2_offset, num_bytes)
```

### Arguments

- `string1`: The first string for comparison. [String](../data-types/string).
- `string2`: The second string for comparison. [String](../data-types/string).
- `string1_offset`: The zero-based starting position in `string1`. [UInt*](../data-types/int-uint).
- `string2_offset`: The zero-based starting position in `string2`. [UInt*](../data-types/int-uint).
- `num_bytes`: The maximum number of bytes to compare. If the specified range exceeds a string's length, `num_bytes` is adjusted. [UInt*](../data-types/int-uint).

### Returns

- `-1`: If the substring from `string1` is lexicographically less than the substring from `string2`.
- `0`: If the substrings are lexicographically equal.
- `1`: If the substring from `string1` is lexicographically greater than the substring from `string2`.

### Example

```sql
SELECT compareSubstrings('Saxony', 'Anglo-Saxon', 0, 6, 5) AS result
```

Result:

```result
┌─result─┐
│      0 │
└────────┘
```

## lower

Converts all ASCII Latin characters in a string to their lowercase equivalents.

Alias: `lcase`

### Syntax

```sql
lower(input)
```

### Arguments

- `input`: The string to convert. [String](../data-types/string).

### Returns

The input string with ASCII Latin characters converted to lowercase. [String](../data-types/string).

### Example

```sql
SELECT lower('TINYBIRD')
```

Result:

```result
┌─lower('TINYBIRD')─┐
│ tinybird           │
└───────────────────┘
```

## upper

Converts all ASCII Latin characters in a string to their uppercase equivalents.

Alias: `ucase`

### Syntax

```sql
upper(input)
```

### Arguments

- `input`: The string to convert. [String](../data-types/string).

### Returns

The input string with ASCII Latin characters converted to uppercase. [String](../data-types/string).

### Example

```sql
SELECT upper('tinybird')
```

Result:

```result
┌─upper('tinybird')─┐
│ TINYBIRD           │
└────────────────────┘
```

## lowerUTF8

Converts a string to lowercase, assuming it contains valid UTF-8 encoded text. If the UTF-8 encoding is invalid, the result is undefined.

{% callout type="info" %}
This function does not perform language-specific casing rules (e.g., for Turkish 'i' vs. 'İ'). If the byte length of a character changes between its uppercase and lowercase forms (e.g., German 'ẞ' to 'ß'), the result for that character might be incorrect.
{% /callout %}

### Syntax

```sql
lowerUTF8(input)
```

### Arguments

- `input`: The UTF-8 encoded string to convert. [String](../data-types/string).

### Returns

The input string with Unicode characters converted to lowercase. [String](../data-types/string).

### Example

```sql
SELECT lowerUTF8('MÜNCHEN') as Lowerutf8
```

Result:

```result
┌─Lowerutf8─┐
│ münchen   │
└───────────┘
```

## upperUTF8

Converts a string to uppercase, assuming it contains valid UTF-8 encoded text. If the UTF-8 encoding is invalid, the result is undefined.

{% callout type="info" %}
This function does not perform language-specific casing rules (e.g., for Turkish 'i' vs. 'İ'). If the byte length of a character changes between its uppercase and lowercase forms (e.g., German 'ẞ' to 'ß'), the result for that character might be incorrect.
{% /callout %}

### Syntax

```sql
upperUTF8(input)
```

### Arguments

- `input`: The UTF-8 encoded string to convert. [String](../data-types/string).

### Returns

The input string with Unicode characters converted to uppercase. [String](../data-types/string).

### Example

```sql
SELECT upperUTF8('München') as Upperutf8
```

Result:

```result
┌─Upperutf8─┐
│ MÜNCHEN   │
└───────────┘
```

## isValidUTF8

Checks if a byte sequence represents valid UTF-8 encoded text.

### Syntax

```sql
isValidUTF8(input)
```

### Arguments

- `input`: The string to validate. [String](../data-types/string).

### Returns

`1` if the string is valid UTF-8, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT isValidUTF8('\xc3\xb1') AS valid, isValidUTF8('\xc3\x28') AS invalid
```

Result:

```result
┌─valid─┬─invalid─┐
│     1 │       0 │
└───────┴─────────┘
```

## toValidUTF8

Replaces any invalid UTF-8 sequences in a string with the Unicode replacement character `` (U+FFFD). Multiple consecutive invalid characters are collapsed into a single replacement character.

### Syntax

```sql
toValidUTF8(input_string)
```

### Arguments

- `input_string`: The string containing potentially invalid UTF-8 bytes. [String](../data-types/string).

### Returns

A string with all invalid UTF-8 sequences replaced. [String](../data-types/string).

### Example

```sql
SELECT toValidUTF8('\x61\xF0\x80\x80\x80b')
```

Result:

```result
┌─toValidUTF8('ab')─┐
│ ab                   │
└───────────────────────┘
```

## repeat

Generates a new string by concatenating an input string with itself a specified number of times.

Alias: `REPEAT`

### Syntax

```sql
repeat(s, n)
```

### Arguments

- `s`: The string to repeat. [String](../data-types/string).
- `n`: The number of times to repeat the string. If `n` is less than or equal to 0, an empty string is returned. [UInt* or Int*](../data-types/int-uint).

### Returns

A new string consisting of `s` repeated `n` times. [String](../data-types/string).

### Example

```sql
SELECT repeat('abc', 10)
```

Result:

```result
┌─repeat('abc', 10)──────────────┐
│ abcabcabcabcabcabcabcabcabcabc │
└────────────────────────────────┘
```

## space

Generates a string consisting of a specified number of space characters.

Alias: `SPACE`.

### Syntax

```sql
space(n)
```

### Arguments

- `n`: The number of space characters to include in the resulting string. If `n` is less than or equal to 0, an empty string is returned. [UInt* or Int*](../data-types/int-uint).

### Returns

A string containing `n` space characters. [String](../data-types/string).

### Example

```sql
SELECT space(3)
```

Result:

```text
┌─space(3) ────┐
│              │
└──────────────┘
```

## reverse

Reverses the order of bytes within a string.

### Syntax

```sql
reverse(s)
```

### Arguments

- `s`: The input string. [String](../data-types/string).

### Returns

The string `s` with its bytes in reverse order. [String](../data-types/string).

### Example

```sql
SELECT reverse('hello')
```

Result:

```result
┌─reverse('hello')─┐
│ olleh            │
└──────────────────┘
```

## reverseUTF8

Reverses the order of Unicode code points within a string, assuming the string is valid UTF-8 encoded text. If the UTF-8 encoding is invalid, the result is undefined.

### Syntax

```sql
reverseUTF8(s)
```

### Arguments

- `s`: The input UTF-8 encoded string. [String](../data-types/string).

### Returns

The string `s` with its Unicode code points in reverse order. [String](../data-types/string).

### Example

```sql
SELECT reverseUTF8('Привет')
```

Result:

```result
┌─reverseUTF8('Привет')─┐
│ тевирП                │
└───────────────────────┘
```

## concat

Combines multiple input values into a single string. Non-string arguments are converted to strings using their default serialization.

### Syntax

```sql
concat(s1, s2, ...)
```

### Arguments

- `s1, s2, ...`: Any number of values of arbitrary types. For optimal performance, use [String](../data-types/string) or [FixedString](../data-types/fixedstring) types.

### Returns

A single string formed by joining all arguments. If any argument is `NULL`, the function returns `NULL`. [String](../data-types/string).

### Example

```sql
SELECT concat('Hello, ', 'World!')
```

Result:

```result
┌─concat('Hello, ', 'World!')─┐
│ Hello, World!               │
└─────────────────────────────┘
```

```sql
SELECT concat(42, 144)
```

Result:

```result
┌─concat(42, 144)─┐
│ 42144           │
└─────────────────┘
```

{% callout type="info" %}
The `||` operator can be used as a concise alternative to `concat()` for string concatenation. For example, `'Hello, ' || 'World!'` is equivalent to `concat('Hello, ', 'World!')`.
{% /callout %}

## concatAssumeInjective

Concatenates arguments like `concat()`, but includes an assumption that the concatenation operation is injective. This means different input combinations will always produce different results, which can enable query optimizations, especially for `GROUP BY` clauses.

### Syntax

```sql
concatAssumeInjective(s1, s2, ...)
```

### Arguments

- `s1, s2, ...`: Values of type [String](../data-types/string) or [FixedString](../data-types/fixedstring).

### Returns

A single string formed by joining all arguments. If any argument is `NULL`, the function returns `NULL`. [String](../data-types/string).

### Example

Input table:

```tb {%title="key_val.datasource"%}
SCHEMA >
    key1 String,
    key2 String,
    value UInt32

ENGINE "MergeTree"
```

```bash
tb push datasources/key_val.datasource

echo "\
'Hello, ','World',1\n\
'Hello, ','World',2\n\
'Hello, ','World!',3\n\
'Hello',', World!',2\
" > key_val.csv

tb datasource append key_val key_val.csv

tb sql "select * from key_val"
------------------------------
| key1    | key2     | value |
------------------------------
| Hello   | , World! |     2 |
| Hello,  | World    |     1 |
| Hello,  | World    |     2 |
| Hello,  | World!   |     3 |
------------------------------
```

Query:

```sql
SELECT concatAssumeInjective(key1, key2), sum(value) FROM key_val GROUP BY concatAssumeInjective(key1, key2)
```

Result:

```result
--------------------------------------------------
| concatAssumeInjective(key1, key2) | sum(value) |
--------------------------------------------------
| Hello, World!                     |          3 |
| Hello, World!                     |          2 |
| Hello, World                      |          3 |
--------------------------------------------------
```

## concatWithSeparator

Combines multiple input values into a single string, inserting a specified separator between each value. Non-string arguments are converted to strings using their default serialization.

Alias: `concat_ws`

### Syntax

```sql
concatWithSeparator(sep, expr1, expr2, expr3...)
```

### Arguments

- `sep`: The separator string to insert between concatenated expressions. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `exprN`: The expressions to concatenate. For optimal performance, use [String](../data-types/string) or [FixedString](../data-types/fixedstring) types.

### Returns

A single string formed by joining all expressions with the separator. If any argument is `NULL`, the function returns `NULL`. [String](../data-types/string).

### Example

```sql
SELECT concatWithSeparator('a', '1', '2', '3', '4')
```

Result:

```result
┌─concatWithSeparator('a', '1', '2', '3', '4')─┐
│ 1a2a3a4                                      │
└──────────────────────────────────────────────┘
```

## concatWithSeparatorAssumeInjective

Concatenates arguments with a separator like `concatWithSeparator()`, but includes an assumption that the operation is injective. This means different input combinations will always produce different results, which can enable query optimizations, especially for `GROUP BY` clauses.

### Syntax

```sql
concatWithSeparatorAssumeInjective(sep, expr1, expr2, expr3...)
```

### Arguments

- `sep`: The separator string to insert between concatenated expressions. [String](../data-types/string) or [FixedString](../data-types/fixedstring).
- `exprN`: The expressions to concatenate. [String](../data-types/string) or [FixedString](../data-types/fixedstring).

### Returns

A single string formed by joining all expressions with the separator. If any argument is `NULL`, the function returns `NULL`. [String](../data-types/string).

### Example

```sql
SELECT concatWithSeparatorAssumeInjective('-', 'apple', 'banana', 'cherry')
```

Result:

```result
┌─concatWithSeparatorAssumeInjective('-', 'apple', 'banana', 'cherry')─┐
│ apple-banana-cherry                                                  │
└──────────────────────────────────────────────────────────────────────┘
```

## substring

Extracts a portion of a string based on a starting byte index and an optional length. Byte indexing starts from 1.

Aliases:
- `substr`
- `mid`
- `byteSlice`

### Syntax

```sql
substring(s, offset[, length])
```

### Arguments

- `s`: The source string. [String](../data-types/string), [FixedString](../data-types/fixedstring) or [Enum](../data-types/enum).
- `offset`: The 1-based starting byte position of the substring. A negative `offset` counts from the end of the string. If `offset` is 0, an empty string is returned. [(U)Int*](../data-types/int-uint).
- `length`: The maximum number of bytes to include in the substring. Optional. [(U)Int*](../data-types/int-uint).

### Returns

The extracted substring. [String](../data-types/string).

### Example

```sql
SELECT 'database' AS db, substring(db, 5), substring(db, 5, 1)
```

Result:

```result
┌─db───────┬─substring('database', 5)─┬─substring('database', 5, 1)─┐
│ database │ base                     │ b                           │
└──────────┴──────────────────────────┴─────────────────────────────┘
```

## substringUTF8

Extracts a portion of a UTF-8 encoded string based on a starting Unicode code point index and an optional length. Code point indexing starts from 1. If the string is not valid UTF-8, the result is undefined.

### Syntax

```sql
substringUTF8(s, offset[, length])
```

### Arguments

- `s`: The source UTF-8 encoded string. [String](../data-types/string), [FixedString](../data-types/fixedstring) or [Enum](../data-types/enum).
- `offset`: The 1-based starting Unicode code point position of the substring. A negative `offset` counts from the end of the string. If `offset` is 0, an empty string is returned. [(U)Int*](../data-types/int-uint).
- `length`: The maximum number of Unicode code points to include in the substring. Optional. [(U)Int*](../data-types/int-uint).

### Returns

The extracted substring. [String](../data-types/string).

### Example

```sql
SELECT 'Täglich grüßt das Murmeltier.' AS str,
       substringUTF8(str, 9),
       substringUTF8(str, 9, 5)
```

Result:

```result
Täglich grüßt das Murmeltier.	grüßt das Murmeltier.	grüßt
```

## substringIndex

Extracts a substring from the beginning of a string up to a specified occurrence of a delimiter. The direction of counting (from left or right) depends on the `count` argument.

Alias: `SUBSTRING_INDEX`

### Syntax

```sql
substringIndex(s, delim, count)
```

### Arguments

- `s`: The string from which to extract the substring. [String](../data-types/string).
- `delim`: The delimiter character(s). [String](../data-types/string).
- `count`: The number of delimiter occurrences to count. If positive, it counts from the left. If negative, it counts from the right. [(U)Int or Int](../data-types/int-uint).

### Returns

The extracted substring. [String](../data-types/string).

### Example

```sql
SELECT substringIndex('www.tinybird.co', '.', 2)
```

Result:

```result
┌─substringIndex('www.tinybird.co', '.', 2)─┐
│ www.tinybird                              │
└───────────────────────────────────────────┘
```

## substringIndexUTF8

Extracts a substring from the beginning of a UTF-8 encoded string up to a specified occurrence of a delimiter, counting in Unicode code points. The direction of counting depends on the `count` argument. If the string is not valid UTF-8, the result is undefined.

### Syntax

```sql
substringIndexUTF8(s, delim, count)
```

### Arguments

- `s`: The UTF-8 encoded string from which to extract the substring. [String](../data-types/string).
- `delim`: The delimiter character(s). [String](../data-types/string).
- `count`: The number of delimiter occurrences to count. If positive, it counts from the left. If negative, it counts from the right. [(U)Int or Int](../data-types/int-uint).

### Returns

The extracted substring. [String](../data-types/string).

### Example

```sql
SELECT substringIndexUTF8('www.straßen-in-europa.de', '.', 2)
```

Result:

```result
www.straßen-in-europa
```

## appendTrailingCharIfAbsent

Appends a specified character to a string if the string is not empty and does not already end with that character.

### Syntax

```sql
appendTrailingCharIfAbsent(s, c)
```

### Arguments

- `s`: The input string. [String](../data-types/string).
- `c`: The character to append. [String](../data-types/string).

### Returns

The modified string with the character appended if necessary. [String](../data-types/string).

### Example

```sql
SELECT appendTrailingCharIfAbsent('hello', '/')
```

Result:

```result
┌─appendTrailingCharIfAbsent('hello', '/')─┐
│ hello/                                   │
└──────────────────────────────────────────┘
```

## convertCharset

Converts a string from one character encoding to another.

### Syntax

```sql
convertCharset(s, from, to)
```

### Arguments

- `s`: The input string to convert. [String](../data-types/string).
- `from`: The name of the source character encoding. [String](../data-types/string).
- `to`: The name of the target character encoding. [String](../data-types/string).

### Returns

The converted string in the target encoding. [String](../data-types/string).

### Example

```sql
SELECT convertCharset('hello', 'UTF-8', 'UTF-8')
```

Result:

```result
┌─convertCharset('hello', 'UTF-8', 'UTF-8')─┐
│ hello                                     │
└───────────────────────────────────────────┘
```

## base58Encode

Encodes a string using the Base58 encoding scheme, specifically with the "Bitcoin" alphabet.

### Syntax

```sql
base58Encode(plaintext)
```

### Arguments

- `plaintext`: The string to encode. [String](../data-types/string).

### Returns

A string containing the Base58 encoded value. [String](../data-types/string) or [FixedString](../data-types/fixedstring).

### Example

```sql
SELECT base58Encode('Encoded')
```

Result:

```result
┌─base58Encode('Encoded')─┐
│ 3dc8KtHrwM              │
└─────────────────────────┘
```

## base58Decode

Decodes a Base58 encoded string using the "Bitcoin" alphabet. An exception is thrown if the input is not a valid Base58 string.

### Syntax

```sql
base58Decode(encoded)
```

### Arguments

- `encoded`: The Base58 encoded string to decode. [String](../data-types/string) or [FixedString](../data-types/fixedstring).

### Returns

A string containing the decoded value. [String](../data-types/string).

### Example

```sql
SELECT base58Decode('3dc8KtHrwM')
```

Result:

```result
┌─base58Decode('3dc8KtHrwM')─┐
│ Encoded                    │
└────────────────────────────┘
```

## tryBase58Decode

Decodes a Base58 encoded string. If the input is not a valid Base58 string, it returns an empty string instead of throwing an exception.

### Syntax

```sql
tryBase58Decode(encoded)
```

### Arguments

- `encoded`: The Base58 encoded string to decode. [String](../data-types/string) or [FixedString](../data-types/fixedstring).

### Returns

A string containing the decoded value, or an empty string if decoding fails. [String](../data-types/string).

### Example

```sql
SELECT tryBase58Decode('3dc8KtHrwM') as res, tryBase58Decode('invalid') as res_invalid
```

Result:

```result
┌─res─────┬─res_invalid─┐
│ Encoded │             │
└─────────┴─────────────┘
```

## base64Encode

Encodes a string using the Base64 encoding scheme, as defined by [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4).

Alias: `TO_BASE64`.

### Syntax

```sql
base64Encode(plaintext)
```

### Arguments

- `plaintext`: The string to encode. [String](../data-types/string).

### Returns

A string containing the Base64 encoded value. [String](../data-types/string).

### Example

```sql
SELECT base64Encode('tinybird')
```

Result:

```result
┌─base64Encode('tinybird')--─┐
│ dGlueWJpcmQ=               │
└────────────────────────────┘
```

## base64URLEncode

Encodes a URL string using a Base64 variant with URL-specific modifications, as defined by [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-5).

### Syntax

```sql
base64URLEncode(url)
```

### Arguments

- `url`: The URL string to encode. [String](../data-types/string).

### Returns

A string containing the URL-safe Base64 encoded value. [String](../data-types/string).

### Example

```sql
SELECT base64URLEncode('https://tinybird.com')
```

Result:

```result
┌─base64URLEncode('https://tinybird.com')--─┐
│ aHR0cHM6Ly90aW55YmlyZC5jb20               │
└───────────────────────────────────────────┘
```

## base64Decode

Decodes a Base64 encoded string according to [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4). Throws an exception if the input is not a valid Base64 string.

Alias: `FROM_BASE64`.

### Syntax

```sql
base64Decode(encoded)
```

### Arguments

- `encoded`: The Base64 encoded string to decode. [String](../data-types/string).

### Returns

A string containing the decoded value. [String](../data-types/string).

### Example

```sql
SELECT base64Decode('dGlueWJpcmQ=')
```

Result:

```result
┌─base64Decode('dGlueWJpcmQ=')─┐
│ tinybird                     │
└──────────────────────────────┘
```

## base64URLDecode

Decodes a URL-safe Base64 encoded string, applying URL-specific modifications as defined by [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648#section-5). Throws an exception if the input is not a valid URL-safe Base64 string.

### Syntax

```sql
base64URLDecode(encodedUrl)
```

### Arguments

- `encodedURL`: The URL-safe Base64 encoded string to decode. [String](../data-types/string).

### Returns

A string containing the decoded URL value. [String](../data-types/string).

### Example

```sql
SELECT base64URLDecode('aHR0cHM6Ly90aW55YmlyZC5jb20')
```

Result:

```result
┌─base64URLDecode('aHR0cHM6Ly90aW55YmlyZC5jb20')─┐
│ https://tinybird.com                          │
└───────────────────────────────────────────────┘
```

## tryBase64Decode

Decodes a Base64 encoded string. If the input is not a valid Base64 string, it returns an empty string instead of throwing an exception.

### Syntax

```sql
tryBase64Decode(encoded)
```

### Arguments

- `encoded`: The Base64 encoded string to decode. [String](../data-types/string).

### Returns

A string containing the decoded value, or an empty string if decoding fails. [String](../data-types/string).

### Example

```sql
SELECT tryBase64Decode('dGlueWJpcmQ=') as res, tryBase64Decode('invalid') as res_invalid
```

Result:

```result
┌─res────────┬─res_invalid─┐
│ tinybird   │             │
└────────────┴─────────────┘
```

## tryBase64URLDecode

Decodes a URL-safe Base64 encoded string. If the input is not a valid URL-safe Base64 string, it returns an empty string instead of throwing an exception.

### Syntax

```sql
tryBase64URLDecode(encodedUrl)
```

### Arguments

- `encodedURL`: The URL-safe Base64 encoded string to decode. [String](../data-types/string).

### Returns

A string containing the decoded URL value, or an empty string if decoding fails. [String](../data-types/string).

### Example

```sql
SELECT tryBase64URLDecode('aHR0cHM6Ly90aW55YmlyZC5jb20') as res, tryBase64URLDecode('aHR0cHM6Ly9jbGlja') as res_invalid
```

Result:

```result
┌─res────────────────────┬─res_invalid─┐
│ https://tinybird.com   │             │
└────────────────────────┴─────────────┘
```

## endsWith

Checks if a string ends with a specified suffix.

### Syntax

```sql
endsWith(str, suffix)
```

### Arguments

- `str`: The string to check. [String](../data-types/string).
- `suffix`: The suffix to look for. [String](../data-types/string).

### Returns

`1` if `str` ends with `suffix`, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT endsWith('hello world', 'world')
```

Result:

```result
┌─endsWith('hello world', 'world')─┐
│                                1 │
└──────────────────────────────────┘
```

## endsWithUTF8

Checks if a UTF-8 encoded string ends with a specified suffix, performing the comparison based on Unicode characters rather than bytes.

### Syntax

```sql
endsWithUTF8(str, suffix)
```

### Arguments

- `str`: The UTF-8 encoded string to check. [String](../data-types/string).
- `suffix`: The UTF-8 encoded suffix to look for. [String](../data-types/string).

### Returns

`1` if `str` ends with `suffix`, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT endsWithUTF8('中国', '\xbd'), endsWith('中国', '\xbd')
```

Result:

```result
┌─endsWithUTF8('中国', '½')─┬─endsWith('中国', '½')─┐
│                        0 │                    1 │
└──────────────────────────┴──────────────────────┘
```

## startsWith

Checks if a string begins with a specified prefix.

### Syntax

```sql
startsWith(str, prefix)
```

### Arguments

- `str`: The string to check. [String](../data-types/string).
- `prefix`: The prefix to look for. [String](../data-types/string).

### Returns

`1` if `str` starts with `prefix`, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT startsWith('Spider-Man', 'Spi')
```

Result:

```result
┌─startsWith('Spider-Man', 'Spi')─┐
│                               1 │
└─────────────────────────────────┘
```

## startsWithUTF8

<VersionBadge minVersion='23.8' />

Checks if a UTF-8 encoded string begins with a specified prefix, performing the comparison based on Unicode characters rather than bytes.

### Syntax

```sql
startsWithUTF8(str, prefix)
```

### Arguments

- `str`: The UTF-8 encoded string to check. [String](../data-types/string).
- `prefix`: The UTF-8 encoded prefix to look for. [String](../data-types/string).

### Returns

`1` if `str` starts with `prefix`, `0` otherwise. [UInt8](../data-types/int-uint).

### Example

```sql
SELECT startsWithUTF8('中国', '\xe4'), startsWith('中国', '\xe4')
```

Result:

```result
┌─startsWithUTF8('中国', '⥩─┬─startsWith('中国', '⥩─┐
│                          0 │                      1 │
└────────────────────────────┴────────────────────────┘
```

## trim

Removes specified characters from the beginning, end, or both ends of a string. By default, it removes whitespace characters.

### Syntax

```sql
trim([[LEADING|TRAILING|BOTH] trim_character FROM] input_string)
```

### Arguments

- `trim_character`: The character(s) to remove. Optional. If omitted, whitespace (ASCII 32) is removed. [String](../data-types/string).
- `input_string`: The string to trim. [String](../data-types/string).

### Returns

The string with the specified characters removed from the designated ends. [String](../data-types/string).

### Example

```sql
SELECT trim(BOTH ' ()' FROM '(   Hello, world!   )')
```

Result:

```result
┌─trim(BOTH ' ()' FROM '(   Hello, world!   )')─┐
│ Hello, world!                                 │
└───────────────────────────────────────────────┘
```

## trimLeft

Removes leading occurrences of specified characters from a string. By default, it removes leading whitespace characters.

Alias: `ltrim`.

### Syntax

```sql
trimLeft(input_string[, trim_characters])
```

### Arguments

- `input_string`: The string to trim. [String](../data-types/string).
- `trim_characters`: The character(s) to remove from the left. Optional. Defaults to a single whitespace character if not specified. [String](../data-types/string).

### Returns

The string with leading specified characters removed. [String](../data-types/string).

### Example

```sql
SELECT trimLeft('     Hello, world!     ')
```

Result:

```result
┌─trimLeft('     Hello, world!     ')─┐
│ Hello, world!                       │
└─────────────────────────────────────┘
```

## trimRight

Removes trailing occurrences of specified characters from a string. By default, it removes trailing whitespace characters.

Alias: `rtrim`.

### Syntax

```sql
trimRight(input_string[, trim_characters])
```

### Arguments

- `input_string`: The string to trim. [String](../data-types/string).
- `trim_characters`: The character(s) to remove from the right. Optional. Defaults to a single whitespace character if not specified. [String](../data-types/string).

### Returns

The string with trailing specified characters removed. [String](../data-types/string).

### Example

```sql
SELECT trimRight('     Hello, world!     ')
```

Result:

```result
┌─trimRight('     Hello, world!     ')─┐
│      Hello, world!                   │
└──────────────────────────────────────┘
```

## trimBoth

Removes leading and trailing occurrences of specified characters from a string. By default, it removes leading and trailing whitespace characters.

Alias: `trim`.

### Syntax

```sql
trimBoth(input_string[, trim_characters])
```

### Arguments

- `input_string`: The string to trim. [String](../data-types/string).
- `trim_characters`: The character(s) to remove from both ends. Optional. Defaults to a single whitespace character if not specified. [String](../data-types/string).

### Returns

The string with leading and trailing specified characters removed. [String](../data-types/string).

### Example

```sql
SELECT trimBoth('     Hello, world!     ')
```

Result:

```result
┌─trimBoth('     Hello, world!     ')─┐
│ Hello, world!                       │
└─────────────────────────────────────┘
```

## CRC32

Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial and an initial value of `0xffffffff` (zlib implementation).

### Syntax

```sql
CRC32(s)
```

### Arguments

- `s`: The input string. [String](../data-types/string).

### Returns

The CRC32 checksum as a 32-bit unsigned integer. [UInt32](../data-types/int-uint).

### Example

```sql
SELECT CRC32('Tinybird')
```

Result:

```result
┌─CRC32('Tinybird')─┐
│        1009000000 │
└───────────────────┘
```

## CRC32IEEE

Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial.

### Syntax

```sql
CRC32IEEE(s)
```

### Arguments

- `s`: The input string. [String](../data-types/string).

### Returns

The CRC32 checksum as a 32-bit unsigned integer. [UInt32](../data-types/int-uint).

### Example

```sql
SELECT CRC32IEEE('Tinybird')
```

Result:

```result
┌─CRC32IEEE('Tinybird')─┐
│            1009000000 │
└───────────────────────┘
```

## CRC64

Calculates the CRC64 checksum of a string using the CRC-64-ECMA polynomial.

### Syntax

```sql
CRC64(s)
```

### Arguments

- `s`: The input string. [String](../data-types/string).

### Returns

The CRC64 checksum as a 64-bit unsigned integer. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT CRC64('Tinybird')
```

Result:

```result
┌─CRC64('Tinybird')─┐
│ 1234567890123456789 │
└───────────────────┘
```

## normalizeUTF8NFC

Converts a UTF-8 encoded string to its Unicode Normalization Form C (NFC). This form ensures canonical equivalence by composing characters into their shortest possible representation.

### Syntax

```sql
normalizeUTF8NFC(words)
```

### Arguments

- `words`: The UTF-8 encoded input string. [String](../data-types/string).

### Returns

The string transformed into NFC normalization form. [String](../data-types/string).

### Example

```sql
SELECT length('â'), normalizeUTF8NFC('â') AS nfc, length(nfc) AS nfc_len
```

Result:

```result
┌─length('â')─┬─nfc─┬─nfc_len─┐
│           2 │ â   │       2 │
└─────────────┴─────┴─────────┘
```

## normalizeUTF8NFD

Converts a UTF-8 encoded string to its Unicode Normalization Form D (NFD). This form ensures canonical equivalence by decomposing characters into their base characters and combining diacritics.

### Syntax

```sql
normalizeUTF8NFD(words)
```

### Arguments

- `words`: The UTF-8 encoded input string. [String](../data-types/string).

### Returns

The string transformed into NFD normalization form. [String](../data-types/string).

### Example

```sql
SELECT length('â'), normalizeUTF8NFD('â') AS nfd, length(nfd) AS nfd_len
```

Result:

```result
┌─length('â')─┬─nfd─┬─nfd_len─┐
│           2 │ â   │       3 │
└─────────────┴─────┴─────────┘
```

## normalizeUTF8NFKC

Converts a UTF-8 encoded string to its Unicode Normalization Form KC (NFKC). This form ensures compatibility equivalence by decomposing characters and then recomposing them, often resulting in a loss of formatting distinctions.

### Syntax

```sql
normalizeUTF8NFKC(words)
```

### Arguments

- `words`: The UTF-8 encoded input string. [String](../data-types/string).

### Returns

The string transformed into NFKC normalization form. [String](../data-types/string).

### Example

```sql
SELECT length('â'), normalizeUTF8NFKC('â') AS nfkc, length(nfkc) AS nfkc_len
```

Result:

```result
┌─length('â')─┬─nfkc─┬─nfkc_len─┐
│           2 │ â    │        2 │
└─────────────┴──────┴──────────┘
```

## normalizeUTF8NFKD

Converts a UTF-8 encoded string to its Unicode Normalization Form KD (NFKD). This form ensures compatibility equivalence by decomposing characters into their base characters and combining diacritics, often resulting in a loss of formatting distinctions.

### Syntax

```sql
normalizeUTF8NFKD(words)
```

### Arguments

- `words`: The UTF-8 encoded input string. [String](../data-types/string).

### Returns

The string transformed into NFKD normalization form. [String](../data-types/string).

### Example

```sql
SELECT length('â'), normalizeUTF8NFKD('â') AS nfkd, length(nfkd) AS nfkd_len
```

Result:

```result
┌─length('â')─┬─nfkd─┬─nfkd_len─┐
│           2 │ â    │        3 │
└─────────────┴──────┴──────────┘
```

## encodeXMLComponent

Escapes characters in a string that have special meaning in XML, making the string safe to embed within XML text nodes or attributes. Specifically, `<`, `&`, `>`, `"`, and `'` are replaced with their corresponding XML entities.

### Syntax

```sql
encodeXMLComponent(x)
```

### Arguments

- `x`: The input string to escape. [String](../data-types/string).

### Returns

The escaped string. [String](../data-types/string).

### Example

```sql
SELECT encodeXMLComponent('Hello, "world"!')
SELECT encodeXMLComponent('<123>')
SELECT encodeXMLComponent('&tinybird')
SELECT encodeXMLComponent('\'foo\'')
```

Result:

```result
Hello, &quot;world&quot;!
&lt;123&gt;
&amp;tinybird
&apos;foo&apos;
```

## decodeXMLComponent

Un-escapes common XML entities (`&quot;`, `&amp;`, `&apos;`, `&gt;`, `&lt;`) and numeric character references (both decimal and hexadecimal) back into their original characters.

### Syntax

```sql
decodeXMLComponent(x)
```

### Arguments

- `x`: The input string containing XML entities. [String](../data-types/string).

### Returns

The un-escaped string. [String](../data-types/string).

### Example

```sql
SELECT decodeXMLComponent('&apos;foo&apos;')
SELECT decodeXMLComponent('&lt; &#x3A3; &gt;')
```

Result:

```result
'foo'
< Σ >
```

## decodeHTMLComponent

Un-escapes common HTML entities (e.g., `&hbar;`, `&gt;`, `&diamondsuit;`, `&heartsuit;`, `&lt;`) and numeric character references (both decimal and hexadecimal) back into their original characters.

### Syntax

```sql
decodeHTMLComponent(x)
```

### Arguments

- `x`: The input string containing HTML entities. [String](../data-types/string).

### Returns

The un-escaped string. [String](../data-types/string).

### Example

```sql
SELECT decodeHTMLComponent('I&heartsuit;Tinybird')
```

Result:

```result
┌─decodeHTMLComponent('I&heartsuit;Tinybird')─┐
│ I♥Tinybird                                  │
└─────────────────────────────────────────────┘
```

## extractTextFromHTML

Extracts plain text content from HTML or XHTML input, removing tags, comments, and script/style blocks. It also normalizes whitespace.

### Syntax

```sql
extractTextFromHTML(x)
```

### Arguments

- `x`: The input HTML or XHTML text. [String](../data-types/string).

### Returns

The extracted plain text. [String](../data-types/string).

### Example

The first example contains several tags and a comment and also shows whitespace processing.
The second example shows `CDATA` and `script` tag processing.
In the third example text is extracted from the full HTML response received by the url function.

```sql
SELECT extractTextFromHTML(' <p> A text <i>with</i><b>tags</b>. <!-- comments --> </p> ')
SELECT extractTextFromHTML('<![CDATA[The content within <b>CDATA</b>]]> <script>alert("Script");</script>')
SELECT extractTextFromHTML(html) FROM url('http://www.donothingfor2minutes.com/', RawBLOB, 'html String')
```

Result:

```result
A text with tags .
The content within <b>CDATA</b>
Do Nothing for 2 Minutes 2:00 &nbsp
```

## ascii

Returns the ASCII decimal value of the first character of a string.

### Syntax

```sql
ascii(s)
```

### Arguments

- `s`: The input string. [String](../data-types/string).

### Returns

The ASCII code point as an [Int32](../data-types/int-uint). Returns `0` if the string is empty. The result is undefined if the first character is not an ASCII character or not within the Latin-1 supplement range.

### Example

```sql
SELECT ascii('A')
```

Result:

```result
┌─ascii('A')─┐
│         65 │
└────────────┘
```

## soundex

Generates the Soundex code for a given string, which is a phonetic algorithm for indexing names by sound.

### Syntax

```sql
soundex(val)
```

### Arguments

- `val`: The input string. [String](../data-types/string).

### Returns

The Soundex code of the input string. [String](../data-types/string).

### Example

```sql
select soundex('aksel')
```

Result:

```result
┌─soundex('aksel')─┐
│ A240             │
└──────────────────┘
```

## punycodeEncode

Converts a Unicode string into its Punycode representation. The input string must be UTF-8 encoded.

### Syntax

```sql
punycodeEncode(val)
```

### Arguments

- `val`: The input UTF-8 encoded string. [String](../data-types/string).

### Returns

The Punycode representation of the input string. [String](../data-types/string).

### Example

```sql
select punycodeEncode('München')
```

Result:

```result
┌─punycodeEncode('München')─┐
│ Mnchen-3ya                │
└───────────────────────────┘
```

## punycodeDecode

Converts a Punycode-encoded string back to its UTF-8 encoded plaintext. An exception is thrown if the input is not a valid Punycode string.

### Syntax

```sql
punycodeDecode(val)
```

### Arguments

- `val`: The Punycode-encoded string. [String](../data-types/string).

### Returns

The UTF-8 encoded plaintext of the input string. [String](../data-types/string).

### Example

```sql
select punycodeDecode('Mnchen-3ya')
```

Result:

```result
┌─punycodeDecode('Mnchen-3ya')─┐
│ München                      │
└──────────────────────────────┘
```

## tryPunycodeDecode

Converts a Punycode-encoded string back to its UTF-8 encoded plaintext. If the input is not a valid Punycode string, it returns an empty string instead of throwing an exception.

### Syntax

```sql
tryPunycodeDecode(val)
```

### Arguments

- `val`: The Punycode-encoded string. [String](../data-types/string).

### Returns

The UTF-8 encoded plaintext, or an empty string if decoding fails. [String](../data-types/string).

### Example

```sql
SELECT tryPunycodeDecode('Mnchen-3ya') AS res, tryPunycodeDecode('invalid') AS res_invalid
```

Result:

```result
┌─res─────┬─res_invalid─┐
│ München │             │
└─────────┴─────────────┘
```

## idnaEncode

Converts a domain name to its ASCII representation using the ToASCII algorithm from the Internationalized Domain Names in Applications (IDNA) mechanism. The input string must be UTF-8 encoded and translatable to ASCII.

### Syntax

```sql
idnaEncode(val)
```

### Arguments

- `val`: The input domain name string. [String](../data-types/string).

### Returns

The ASCII representation of the domain name according to IDNA. [String](../data-types/string).

### Example

```sql
select idnaEncode('straße.münchen.de')
```

Result:

```result
┌─idnaEncode('straße.münchen.de')─────┐
│ xn--strae-oqa.xn--mnchen-3ya.de     │
└─────────────────────────────────────┘
```

## tryIdnaEncode

Converts a domain name to its ASCII representation using the IDNA ToASCII algorithm. If an error occurs during conversion, it returns an empty string instead of throwing an exception.

### Syntax

```sql
tryIdnaEncode(val)
```

### Arguments

- `val`: The input domain name string. [String](../data-types/string).

### Returns

The ASCII representation of the domain name, or an empty string if conversion fails. [String](../data-types/string).

### Example

```sql
SELECT tryIdnaEncode('straße.münchen.de') AS res, tryIdnaEncode('invalid.domain.😊') AS res_invalid
```

Result:

```result
┌─res───────────────────────────┬─res_invalid─┐
│ xn--strae-oqa.xn--mnchen-3ya.de │             │
└───────────────────────────────┴─────────────┘
```

## idnaDecode

Converts an IDNA-encoded domain name back to its Unicode (UTF-8) representation using the ToUnicode algorithm. If an error occurs, the original input string is returned.

### Syntax

```sql
idnaDecode(val)
```

### Arguments

- `val`: The IDNA-encoded domain name string. [String](../data-types/string).

### Returns

The Unicode (UTF-8) representation of the domain name. [String](../data-types/string).

### Example

```sql
select idnaDecode('xn--strae-oqa.xn--mnchen-3ya.de')
```

Result:

```result
┌─idnaDecode('xn--strae-oqa.xn--mnchen-3ya.de')─┐
│ straße.münchen.de                             │
└───────────────────────────────────────────────┘
```

## byteHammingDistance

Calculates the Hamming distance between two byte strings, which is the number of positions at which the corresponding bytes are different. The strings must be of equal length.

Alias: `mismatches`

### Syntax

```sql
byteHammingDistance(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Hamming distance as an integer. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT byteHammingDistance('karolin', 'kathrin')
```

Result:

```text
┌─byteHammingDistance('karolin', 'kathrin')─┐
│                                         3 │
└───────────────────────────────────────────┘
```

## stringJaccardIndex

Calculates the Jaccard similarity index between two byte strings. This index measures the similarity between finite sample sets, defined as the size of the intersection divided by the size of the union of the sample sets.

### Syntax

```sql
stringJaccardIndex(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Jaccard similarity index as a floating-point number between 0 and 1. [Float64](../data-types/float).

### Example

```sql
SELECT stringJaccardIndex('Tinybird', 'mouse')
```

Result:

```text
┌─stringJaccardIndex('Tinybird', 'mouse')─┐
│                                       0.4 │
└───────────────────────────────────────────┘
```

## stringJaccardIndexUTF8

Calculates the Jaccard similarity index between two UTF-8 encoded strings, performing the comparison based on Unicode characters.

### Syntax

```sql
stringJaccardIndexUTF8(string1, string2)
```

### Arguments

- `string1`: The first UTF-8 encoded string. [String](../data-types/string).
- `string2`: The second UTF-8 encoded string. [String](../data-types/string).

### Returns

The Jaccard similarity index as a floating-point number between 0 and 1. [Float64](../data-types/float).

### Example

```sql
SELECT stringJaccardIndexUTF8('Tinybird', 'Тайнибёрд')
```

Result:

```result
┌─stringJaccardIndexUTF8('Tinybird', 'Тайнибёрд')─┐
│                                            0.125 │
└──────────────────────────────────────────────────┘
```

## editDistance

Calculates the Edit distance (also known as Levenshtein distance) between two byte strings. This measures the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into the other.

Alias: `levenshteinDistance`

### Syntax

```sql
editDistance(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Edit distance as an integer. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT editDistance('Tinybird', 'mouse')
```

Result:

```text
┌─editDistance('Tinybird', 'mouse')─┐
│                                   6 │
└─────────────────────────────────────┘
```

## editDistanceUTF8

Calculates the Edit distance (Levenshtein distance) between two UTF-8 encoded strings, performing the comparison based on Unicode characters.

Alias: `levenshteinDistanceUTF8`

### Syntax

```sql
editDistanceUTF8(string1, string2)
```

### Arguments

- `string1`: The first UTF-8 encoded string. [String](../data-types/string).
- `string2`: The second UTF-8 encoded string. [String](../data-types/string).

### Returns

The Edit distance as an integer. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT editDistanceUTF8('我是谁', '我是我')
```

Result:

```text
┌─editDistanceUTF8('我是谁', '我是我')──┐
│                                   1 │
└─────────────────────────────────────┘
```

## damerauLevenshteinDistance

Calculates the Damerau-Levenshtein distance between two byte strings. This metric extends the Levenshtein distance by including transpositions (swapping two adjacent characters) as a single edit operation.

### Syntax

```sql
damerauLevenshteinDistance(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Damerau-Levenshtein distance as an integer. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT damerauLevenshteinDistance('Tinybird', 'mouse')
```

Result:

```text
┌─damerauLevenshteinDistance('Tinybird', 'mouse')─┐
│                                                 6 │
└───────────────────────────────────────────────────┘
```

## jaroSimilarity

Calculates the Jaro similarity between two byte strings. This metric measures the similarity between two strings, with a higher value indicating greater similarity.

### Syntax

```sql
jaroSimilarity(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Jaro similarity as a floating-point number between 0 and 1. [Float64](../data-types/float).

### Example

```sql
SELECT jaroSimilarity('Tinybird', 'click')
```

Result:

```text
┌─jaroSimilarity('Tinybird', 'click')─┐
│                    0.8333333333333333 │
└───────────────────────────────────────┘
```

## jaroWinklerSimilarity

Calculates the Jaro-Winkler similarity between two byte strings. This is a refinement of the Jaro similarity that gives more favorable ratings to strings that match from the beginning.

### Syntax

```sql
jaroWinklerSimilarity(string1, string2)
```

### Arguments

- `string1`: The first byte string. [String](../data-types/string).
- `string2`: The second byte string. [String](../data-types/string).

### Returns

The Jaro-Winkler similarity as a floating-point number between 0 and 1. [Float64](../data-types/float).

### Example

```sql
SELECT jaroWinklerSimilarity('Tinybird', 'click')
```

Result:

```text
┌─jaroWinklerSimilarity('Tinybird', 'click')─┐
│                           0.8999999999999999 │
└──────────────────────────────────────────────┘
```

## initcap

Converts the first letter of each word in a string to uppercase and the remaining letters to lowercase. Words are identified as sequences of alphanumeric characters separated by non-alphanumeric characters.

{% callout type="info" %}
Because `initCap` converts only the first letter of each word to upper case you may observe unexpected behaviour for words containing apostrophes or capital letters. For example:

```sql
SELECT initCap('mother''s daughter'), initCap('joe McAdam')
```

will return

```result
┌─initCap('mother\'s daughter')─┬─initCap('joe McAdam')─┐
│ Mother'S Daughter             │ Joe Mcadam            │
└───────────────────────────────┴───────────────────────┘
```

This is a known behaviour, with no plans currently to fix it.
{% /callout %}

### Syntax

```sql
initcap(val)
```

### Arguments

- `val`: The input string. [String](../data-types/string).

### Returns

The string with the first letter of each word capitalized. [String](../data-types/string).

### Example

```sql
SELECT initcap('building for fast')
```

Result:

```text
┌─initcap('building for fast')─┐
│ Building For Fast            │
└──────────────────────────────┘
```

## initcapUTF8

Converts the first letter of each word in a UTF-8 encoded string to uppercase and the remaining letters to lowercase, based on Unicode characters. If the string is not valid UTF-8, the result is undefined.

{% callout type="info" %}
This function does not perform language-specific casing rules (e.g., for Turkish 'i' vs. 'İ'). If the byte length of a character changes between its uppercase and lowercase forms, the result for that character might be incorrect.
{% /callout %}

### Syntax

```sql
initcapUTF8(val)
```

### Arguments

- `val`: The input UTF-8 encoded string. [String](../data-types/string).

### Returns

The string with the first Unicode character of each word capitalized. [String](../data-types/string).

### Example

```sql
SELECT initcapUTF8('не тормозит')
```

Result:

```text
┌─initcapUTF8('не тормозит')─┐
│ Не Тормозит                │
└────────────────────────────┘
```

## firstLine

Extracts and returns the first line from a multi-line string. If the string contains no line separators, the entire string is returned.

### Syntax

```sql
firstLine(val)
```

### Arguments

- `val`: The input string, potentially containing multiple lines. [String](../data-types/string).

### Returns

The first line of the string. [String](../data-types/string).

### Example

```sql
select firstLine('foo\nbar\nbaz')
```

Result:

```result
┌─firstLine('foo\nbar\nbaz')─┐
│ foo                        │
└────────────────────────────┘
```

## stringCompare

Compares two strings lexicographically. It can optionally compare specific substrings by providing offsets and a maximum number of bytes.

### Syntax

```sql
stringCompare(string1, string2[, str1_off, string2_offset, num_bytes])
```

### Arguments

- `string1`: The first string for comparison. [String](../data-types/string).
- `string2`: The second string for comparison. [String](../data-types/string).
- `str1_off`: The zero-based starting position in `string1`. Optional. [UInt*](../data-types/int-uint).
- `string2_offset`: The zero-based starting position in `string2`. Optional. [UInt*](../data-types/int-uint).
- `num_bytes`: The maximum number of bytes to compare. If the specified range exceeds a string's length, `num_bytes` is adjusted. Optional. [UInt*](../data-types/int-uint).

### Returns

- `-1`: If `string1` (or its specified substring) is lexicographically less than `string2` (or its specified substring).
- `0`: If the strings (or their specified substrings) are lexicographically equal.
- `1`: If `string1` (or its specified substring) is lexicographically greater than `string2` (or its specified substring).

### Example

```sql
SELECT
    stringCompare('alice', 'bob', 0, 0, 3) as result1,
    stringCompare('alice', 'alicia', 0, 0, 3) as result2,
    stringCompare('bob', 'alice', 0, 0, 3) as result3
```
Result:
```result
   ┌─result1─┬─result2─┬─result3─┐
1. │      -1 │       0 │       1 │
   └─────────┴─────────┴─────────┘
```

```sql
SELECT
    stringCompare('alice', 'alicia') as result2,
    stringCompare('alice', 'alice') as result1,
    stringCompare('bob', 'alice') as result3
```
Result:
```result
   ┌─result2─┬─result1─┬─result3─┐
1. │      -1 │       0 │       1 │
   └─────────┴─────────┴─────────┘
```

## normalizeQuery

Replaces literals, sequences of literals, and complex aliases (those containing whitespace, more than two digits, or at least 36 bytes long like UUIDs) within a query string with a placeholder `?`.

### Syntax

```sql
normalizeQuery(x)
```

### Arguments

- `x`: The query string. [String](../data-types/string).

### Returns

The query string with literals and complex aliases replaced by placeholders. [String](../data-types/string).

### Example

```sql
SELECT normalizeQuery('[1, 2, 3, x]') AS query
```

Result:

```result
┌─query────┐
│ [?.., x] │
└──────────┘
```

## normalizeQueryKeepNames

Replaces literals and sequences of literals within a query string with a placeholder `?`, but preserves complex aliases (those containing whitespace, more than two digits, or at least 36 bytes long like UUIDs). This can be useful for analyzing query logs while retaining meaningful alias names.

### Syntax

```sql
normalizeQueryKeepNames(x)
```

### Arguments

- `x`: The query string. [String](../data-types/string).

### Returns

The query string with literals replaced by placeholders, but complex aliases kept intact. [String](../data-types/string).

### Example

```sql
SELECT normalizeQuery('SELECT 1 AS aComplexName123'), normalizeQueryKeepNames('SELECT 1 AS aComplexName123')
```

Result:

```result
┌─normalizeQuery('SELECT 1 AS aComplexName123')─┬─normalizeQueryKeepNames('SELECT 1 AS aComplexName123')─┐
│ SELECT ? AS `?`                               │ SELECT ? AS aComplexName123                            │
└───────────────────────────────────────────────┴────────────────────────────────────────────────────────┘
```

## normalizedQueryHash

Generates a 64-bit hash value for a query string, where the hash is identical for similar queries regardless of the specific literal values used. This is useful for grouping and analyzing query patterns in logs.

### Syntax

```sql
normalizedQueryHash(x)
```

### Arguments

- `x`: The query string. [String](../data-types/string).

### Returns

A 64-bit hash value. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT normalizedQueryHash('SELECT 1 AS `xyz`') != normalizedQueryHash('SELECT 1 AS `abc`') AS res
```

Result:

```result
┌─res─┐
│   1 │
└─────┘
```

## normalizedQueryHashKeepNames

Generates a 64-bit hash value for a query string, similar to `normalizedQueryHash()`, but it preserves complex aliases (those containing whitespace, more than two digits, or at least 36 bytes long like UUIDs) before hashing. This allows for more granular analysis of query patterns where alias names are significant.

### Syntax

```sql
normalizedQueryHashKeepNames(x)
```

### Arguments

- `x`: The query string. [String](../data-types/string).

### Returns

A 64-bit hash value. [UInt64](../data-types/int-uint).

### Example

```sql
SELECT normalizedQueryHash('SELECT 1 AS `xyz123`') != normalizedQueryHash('SELECT 1 AS `abc123`') AS normalizedQueryHash
SELECT normalizedQueryHashKeepNames('SELECT 1 AS `xyz123`') != normalizedQueryHashKeepNames('SELECT 1 AS `abc123`') AS normalizedQueryHashKeepNames
```

Result:

```result
┌─normalizedQueryHash─┐
│                   0 │
└─────────────────────┘
┌─normalizedQueryHashKeepNames─┐
│                            1 │
└──────────────────────────────┘
```
