---
title: "How to decode URL-encoded strings in ClickHouse®"
excerpt: "Learn how to decode URL-encoded strings in ClickHouse® using decodeURLComponent, with performance tips, edge cases, and production deployment strategies."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-09-30 17:16:32"
publishedOn: "2025-09-30 17:16:32"
updatedOn: "2025-09-30 17:16:32"
status: "published"
---

URL-encoded strings appear everywhere in web data, from search parameters to API logs. To support URL-encoding, special characters get converted to percent signs followed by hexadecimal codes like `%20` for spaces. ClickHouse{% sup %}®{% /sup %}'s `decodeURLComponent` function reverses this encoding, transforming strings like `search%3Fq%3Danalytics%26type%3Ddocs` back into readable text.

This guide covers the function's syntax, performance optimization strategies for large datasets, and how to handle edge cases like double encoding and Unicode sequences.

## What decodeURLComponent does

ClickHouse{% sup %}®{% /sup %} includes a built-in `decodeURLComponent` function that reverses URL encoding, also called percent-encoding. When browsers send data through URLs, they convert special characters into percent signs followed by hexadecimal values—spaces become `%20`, forward slashes become `%2F`, and ampersands become `%26`.

The function takes encoded strings and converts them back to readable text. You'll encounter this frequently when working with web analytics data, API logs, or any dataset containing URLs with encoded parameters.

## decodeURLComponent syntax and return type

The function accepts a single String argument and returns a String with the decoded result:

```sql
decodeURLComponent(encoded_string)
```

ClickHouse{% sup %}®{% /sup %} handles standard percent-encoding sequences and returns the original string unchanged if it encounters invalid encoding patterns. This fail-safe behavior prevents query errors when processing mixed or malformed data.

## Quick example on a single string

Here's how `decodeURLComponent` works with a simple encoded string:

```sql
SELECT decodeURLComponent('Hello%20World%21') AS decoded_text;
```

This returns `Hello World!` where `%20` becomes a space and `%21` becomes an exclamation mark. You can also decode more complex URL components:

```sql
SELECT decodeURLComponent('search%3Fq%3Dclickhouse%26type%3Ddocs') AS decoded_url;
-- Returns: search?q=clickhouse&type=docs
```

## Decoding an entire column efficiently

When working with tables containing URL-encoded data, you'll often decode entire columns rather than individual strings.

### 1. Selecting a decoded virtual column

The most straightforward approach creates a computed column in your SELECT statement:

```sql
SELECT
    original_url,
    decodeURLComponent(original_url) AS decoded_url
FROM web_logs
WHERE decoded_url LIKE '%search%';
```

This approach works well for ad-hoc queries but recalculates the decoding operation every time you run the query.

### 2. Creating a materialized view for constant-time access

For frequently accessed decoded data, a [materialized view](https://www.tinybird.co/docs/forward/work-with-data/optimize/materialized-views) pre-computes and stores the results:

```sql
CREATE MATERIALIZED VIEW decoded_urls_mv
ENGINE = MergeTree()
ORDER BY timestamp
AS SELECT
    timestamp,
    original_url,
    decodeURLComponent(original_url) AS decoded_url
FROM web_logs;
```

This approach trades storage space for query performance, making decoded URL searches nearly instantaneous.

## Handling edge cases and double encoding

Real-world URL data often contains encoding quirks that require special attention.

### Plus signs vs percent-20

Web forms sometimes encode spaces as plus signs (`+`) instead of `%20`, but `decodeURLComponent` doesn't automatically convert plus signs to spaces. You'll handle this separately:

```sql
SELECT
    decodeURLComponent(replace(url_param, '+', ' ')) AS properly_decoded
FROM form_submissions;
```

This pattern first replaces plus signs with spaces, then applies URL decoding to handle other encoded characters.

### Unicode multibyte sequences

URLs containing non-ASCII characters get encoded as multiple percent sequences. For example, the character `é` becomes `%C3%A9` in UTF-8 encoding:

```sql
SELECT decodeURLComponent('caf%C3%A9') AS decoded_text;
-- Returns: café
```

The function automatically handles multibyte sequences, reconstructing the original Unicode characters.

### Detecting double encoding

Sometimes URLs get encoded twice, creating sequences like `%2520` (which represents `%20`). You can detect and handle double encoding:

```sql
SELECT
    CASE
        WHEN url LIKE '%25%'
        THEN decodeURLComponent(decodeURLComponent(url))
        ELSE decodeURLComponent(url)
    END AS final_decoded_url
FROM suspicious_data;
```

## Performance tips at billions of rows

URL decoding operations can become expensive at scale, but several optimization strategies help maintain query performance.

### Avoiding repeated decoding in nested subqueries

When your query decodes the same column multiple times, use a WITH clause to decode once and reuse the result:

```sql
WITH decoded_data AS (
    SELECT
        user_id,
        decodeURLComponent(search_query) AS clean_query
    FROM search_logs
    WHERE timestamp > now() - INTERVAL 1 DAY
)
SELECT
    user_id,
    clean_query,
    length(clean_query) AS query_length
FROM decoded_data
WHERE clean_query LIKE '%analytics%';
```

This pattern eliminates redundant decoding operations and often improves query performance significantly.

### Using LowCardinality for decoded text

When decoded URLs contain repeated patterns or limited unique values, wrap the result in [LowCardinality](https://www.tinybird.co/docs/sql-reference/data-types/lowcardinality) for better compression:

```sql
CREATE TABLE decoded_search_logs (
    timestamp DateTime,
    user_id UInt64,
    decoded_query LowCardinality(String)
) ENGINE = MergeTree()
ORDER BY timestamp;
```

LowCardinality works particularly well for decoded search terms, category names, or other URL parameters with high repetition rates.

## When to decode on ingest using materialized views

For high-throughput applications, decoding URLs during data insertion rather than query time often provides better performance.

### Step 1: Define the target schema

Create a table that stores both raw and decoded URL data:

```sql
CREATE TABLE web_events (
    timestamp DateTime,
    raw_url String,
    decoded_url String,
    user_agent String
) ENGINE = MergeTree()
ORDER BY timestamp;
```

### Step 2: Create the materialized view with decodeURLComponent

Set up a materialized view that automatically decodes incoming URL data:

```sql
CREATE MATERIALIZED VIEW decode_urls_mv TO web_events AS
SELECT
    timestamp,
    url as raw_url,
    decodeURLComponent(url) as decoded_url,
    user_agent
FROM raw_web_logs;
```

### Step 3: Backfill existing data

Use INSERT SELECT to decode and populate existing raw URL data:

```sql
INSERT INTO web_events
SELECT
    timestamp,
    url,
    decodeURLComponent(url),
    user_agent
FROM raw_web_logs
WHERE timestamp < (SELECT min(timestamp) FROM web_events);
```

This approach moves the computational cost to ingestion time, keeping query response times consistently fast.

## Building ClickHouse{% sup %}®{% /sup %}-based web analytics APIs with Tinybird

[Tinybird](https://www.tinybird.co) is a managed service for ClickHouse{% sup %}®{% /sup %} that makes it simple to build real-time APIs over web clickstream data. Here's how to create a complete web/URL analytics pipeline with data sources, materialized views, and API endpoints.

### Step 1: Create a data source for encoded URL strings

First, create a data source to store raw web analytics events with encoded URLs. Create a `web_events.datasource` file:

```tinybird
SCHEMA >
    `timestamp` DateTime64(3, 'UTC'),
    `session_id` String,
    `user_id` String,
    `encoded_url` String,
    `referrer` String,
    `user_agent` String,
    `country` LowCardinality(String),
    `device_type` LowCardinality(String)

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, session_id"
```

Deploy the data source:

```bash
tb --cloud deploy
```

Ingest some clickstream data using the [Events API](https://www.tinybird.co/docs/forward/get-data-in/events-api). Here's an example:

```bash
curl -X POST \
  -H "Authorization: Bearer $TINYBIRD_TOKEN" \
  -H "Content-Type: application/json" \
  "https://api.tinybird.co/v0/events?name=web_events" \
  -d '[
    {"timestamp": "2024-12-01 14:30:00", "session_id": "sess_001", "user_id": "usr_123", "encoded_url": "/search%3Fq%3Danalytics%26category%3Dtools", "referrer": "https://google.com", "user_agent": "Mozilla/5.0", "country": "US", "device_type": "desktop"},
    {"timestamp": "2024-12-01 14:31:00", "session_id": "sess_002", "user_id": "usr_456", "encoded_url": "/products%2Fdatabase%2Fclickhouse%3Fref%3Dhomepage", "referrer": "https://tinybird.co", "user_agent": "Mozilla/5.0", "country": "UK", "device_type": "mobile"},
    {"timestamp": "2024-12-01 14:32:00", "session_id": "sess_003", "user_id": "usr_789", "encoded_url": "/blog%2Freal-time%20analytics%3Futm_source%3Dnewsletter", "referrer": "https://newsletter.com", "user_agent": "Mozilla/5.0", "country": "DE", "device_type": "tablet"}
  ]'
```

### Step 2: Create a materialized view with URL decoding and aggregation

Create a materialized view that decodes URLs in real-time and performs aggregations. Create a `decoded_page_views.datasource` file:

```tinybird
SCHEMA >
    `page_path` String,
    `country` LowCardinality(String),
    `device_type` LowCardinality(String),
    `hour` DateTime,
    `page_views` UInt64,
    `unique_sessions` UInt64,
    `unique_users` UInt64

ENGINE "SummingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(hour)"
ENGINE_SORTING_KEY "hour, page_path, country, device_type"
```

Create a materialized view pipe `decode_and_aggregate.pipe`:

```tinybird
NODE decode_and_aggregate
SQL >
    SELECT
        decodeURLComponent(encoded_url) AS page_path,
        country,
        device_type,
        toStartOfHour(timestamp) AS hour,
        1 AS page_views,
        1 AS unique_sessions,
        1 AS unique_users
    FROM web_events
    WHERE page_path IS NOT NULL
      AND page_path != ''

TYPE MATERIALIZED
DATASOURCE decoded_page_views
```

Deploy the materialized view:

```bash
tb --cloud deploy
```

This materialized view automatically:

- Decodes URL-encoded strings in real-time during ingestion
- Aggregates page views by hour, country, and device type
- Pre-computes unique session and user counts
- Stores results in an optimized SummingMergeTree for fast queries

### Step 3: Create an API endpoint to query the materialized view

Create a pipe to query the materialized view and expose it as an API. Create `page_analytics_api.pipe`:

```tinybird
TOKEN "page_analytics_read" READ

NODE page_analytics
SQL >
    %
    SELECT
        page_path,
        country,
        device_type,
        toDate(hour) AS date,
        toHour(hour) AS hour_of_day,
        sum(page_views) AS total_page_views,
        sum(unique_sessions) AS total_sessions,
        sum(unique_users) AS total_users,
        round(sum(page_views) / sum(unique_sessions), 2) AS pages_per_session
    FROM decoded_page_views
    WHERE hour >= toStartOfDay({{DateTime(start_date, '2024-12-01 00:00:00')}})
      AND hour < toStartOfDay({{DateTime(end_date, '2024-12-02 00:00:00')}}) + INTERVAL 1 DAY
    {\% if defined(country_filter) %}
      AND country = {{String(country_filter)}}
    {\% end %}
    {\% if defined(device_filter) %}
      AND device_type = {{String(device_filter)}}
    {\% end %}
    {\% if defined(path_filter) %}
      AND page_path LIKE {{String(path_filter, '%')}}
    {\% end %}
    GROUP BY page_path, country, device_type, date, hour_of_day
    ORDER BY total_page_views DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT
```

Deploy the API endpoint:

```bash
tb --cloud deploy
```

### Step 4: Query the API endpoint

Your decoded URL analytics API is now available. Query it using curl:

```bash
# Get top pages for a specific date range
curl "https://api.tinybird.co/v0/pipes/page_analytics_api.json?start_date=2024-12-01&end_date=2024-12-01" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Filter by country and device type
curl "https://api.tinybird.co/v0/pipes/page_analytics_api.json?start_date=2024-12-01&end_date=2024-12-01&country_filter=US&device_filter=mobile" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# Search for specific pages
curl "https://api.tinybird.co/v0/pipes/page_analytics_api.json?start_date=2024-12-01&end_date=2024-12-01&path_filter=%search%" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"
```

The API returns properly decoded URLs and pre-computed analytics:

```json
{
    "meta": [
        {"name": "page_path", "type": "String"},
        {"name": "country", "type": "String"},
        {"name": "device_type", "type": "String"},
        {"name": "date", "type": "Date"},
        {"name": "hour_of_day", "type": "UInt8"},
        {"name": "total_page_views", "type": "UInt64"},
        {"name": "total_sessions", "type": "UInt64"},
        {"name": "total_users", "type": "UInt64"},
        {"name": "pages_per_session", "type": "Float64"}
    ],
    "data": [
        {
            "page_path": "/search?q=analytics&category=tools",
            "country": "US",
            "device_type": "desktop",
            "date": "2024-12-01",
            "hour_of_day": 14,
            "total_page_views": 1,
            "total_sessions": 1,
            "total_users": 1,
            "pages_per_session": 1.0
        },
        {
            "page_path": "/products/database/clickhouse?ref=homepage",
            "country": "UK",
            "device_type": "mobile",
            "date": "2024-12-01",
            "hour_of_day": 14,
            "total_page_views": 1,
            "total_sessions": 1,
            "total_users": 1,
            "pages_per_session": 1.0
        }
    ],
    "rows": 2,
    "statistics": {
        "elapsed": 0.002,
        "rows_read": 3,
        "bytes_read": 1024
    }
}
```

This approach provides:

- **Real-time URL decoding** during data ingestion via materialized views
- **Pre-computed aggregations** for sub-100ms API response times
- **Flexible filtering** by country, device, time range, and page path patterns
- **Scalable architecture** that handles millions of events per second

Your decoded URL data becomes available as a production-ready API endpoint designed for high throughput and low latency. [Start building with Tinybird's free plan](https://cloud.tinybird.co/signup) to create your own URL analytics APIs.

## Related URL and string functions in ClickHouse{% sup %}®{% /sup %}

ClickHouse{% sup %}®{% /sup %} provides several complementary functions for comprehensive URL manipulation workflows.

### encodeURLComponent

The opposite of `decodeURLComponent`, this function URL-encodes strings for safe transmission:

```sql
SELECT encodeURLComponent('search query with spaces') AS encoded;
-- Returns: search%20query%20with%20spaces
```

### extractURLParameter

After decoding URLs, you often extract specific parameter values:

```sql
SELECT
    decodeURLComponent(extractURLParameter(full_url, 'q')) AS search_term
FROM search_logs;
```

### substring and length helpers

[String functions](https://www.tinybird.co/docs/sql-reference/functions/string-functions) work well alongside URL decoding for text analysis:

- **substring()**: Truncates decoded URLs to specific lengths for display or storage optimization
- **length()**: Measures decoded string lengths for validation or analytics
- **position()**: Finds specific patterns within decoded URL components

## Next steps with Tinybird's managed ClickHouse{% sup %}®{% /sup %}

Tinybird provides a [managed ClickHouse{% sup %}®{% /sup %} service](https://www.tinybird.co/clickhouse) that handles operational complexity while offering the same SQL functions and performance characteristics.

With Tinybird, you get fast deployments, scalable infrastructure, and built-in API generation so you can focus on building features rather than managing databases. The platform includes live schema migrations, built-in observability, and data-as-code workflows that make URL processing pipelines easier to develop and maintain.

## Additional resources

- [Functions for Working with URLs](https://clickhouse.com/docs/en/sql-reference/functions/url-functions)
- [String Functions](https://clickhouse.com/docs/en/sql-reference/functions/string-functions)
- [Materialized Views](https://clickhouse.com/docs/en/engines/table-engines/special/materializedview)
- [Use Materialized Views](https://clickhouse.com/docs/en/guides/best-practices/use-materialized-views)
- [Tinybird Web Analytics Starter Kit](https://www.tinybird.co/templates/web-analytics-starter-kit)

## FAQs about URL decoding in ClickHouse{% sup %}®{% /sup %}

### Does decodeURLComponent work on arrays of strings?

The function only accepts single String arguments, but you can use `arrayMap(x -> decodeURLComponent(x), array_column)` to decode string arrays element by element.

### Can decodeURLComponent handle malformed percent sequences?

ClickHouse{% sup %}®{% /sup %} returns the original string unchanged when it encounters invalid percent-encoding sequences rather than throwing errors, making it safe to use on mixed or dirty data.

### Does decoding impact query performance significantly?

String decoding adds computational overhead, so consider using materialized views or pre-computed columns for frequently accessed decoded data, especially when processing billions of rows.
