String manipulation troubleshooting

Common issues and solutions when working with string manipulation functions in ClickHouse and Tinybird.

Common errors

ILLEGAL_TYPE_OF_ARGUMENT

Error: ILLEGAL_TYPE_OF_ARGUMENT: Argument 1 for function length must be String, but got UInt32

Cause: Passing non-string data to string functions

Solution: Convert to string first

-- Wrong
SELECT length(numeric_column)

-- Correct
SELECT length(toString(numeric_column))

BAD_ARGUMENTS

Error: BAD_ARGUMENTS: String index is out of bounds

Cause: Using invalid index in string functions like substring

Solution: Validate string length before operations

-- Safe substring with bounds checking
SELECT 
  CASE 
    WHEN length(string_column) >= 5 THEN substring(string_column, 1, 5)
    ELSE string_column
  END as safe_substring

Type edge cases

Working with nullable strings

Issue: String functions on nullable columns return unexpected results

Solution: Handle nulls explicitly

-- Handle null strings
SELECT 
  CASE 
    WHEN string_column IS NULL THEN ''
    ELSE trim(string_column)
  END as safe_trim

UTF-8 encoding issues

Issue: String functions not working with special characters

Solution: Use UTF-8 aware functions

-- Use UTF-8 length instead of byte length
SELECT lengthUTF8(string_column) as char_count

Usage patterns that break Pipes

Multiple string operations

Issue: Chaining multiple string functions can cause performance issues

Solution: Use intermediate columns

-- Instead of chaining
SELECT upper(trim(replace(string_column, 'old', 'new')))

-- Use intermediate steps
WITH cleaned AS (
  SELECT replace(string_column, 'old', 'new') as cleaned_string
  FROM events
)
SELECT upper(trim(cleaned_string)) as final_string
FROM cleaned

Regular expressions in aggregations

Issue: Complex regex in aggregations can slow down queries

Solution: Pre-process strings in data sources

-- Pre-extract patterns
SELECT 
  string_column,
  extractAll(string_column, 'pattern') as extracted_patterns
FROM events

Sample fixes

Fixing string concatenation issues

-- Problem: Concatenating different types
SELECT 'prefix_' || numeric_column FROM events

-- Solution: Convert types properly
SELECT concat('prefix_', toString(numeric_column)) as combined_string
FROM events

Working with JSON strings

-- Problem: Extracting from JSON strings
SELECT substring(json_string, 10, 20) FROM events

-- Solution: Use JSON functions
SELECT JSONExtractString(json_string, 'key') as extracted_value
FROM events

Handling empty strings

-- Problem: Empty strings causing issues
SELECT substring(string_column, 1, 5) FROM events

-- Solution: Check for empty strings
SELECT 
  CASE 
    WHEN string_column = '' THEN ''
    ELSE substring(string_column, 1, 5)
  END as safe_substring
FROM events

Best practices

  1. Always convert types - Use toString() for non-string inputs
  2. Handle nulls explicitly - Check for null values before string operations
  3. Use UTF-8 functions - Use lengthUTF8() for character count
  4. Validate string length - Check bounds before using substring()
  5. Pre-process complex operations - Do heavy string processing in data sources