---
title: "How to extract URL query strings without ? or # in ClickHouse®"
excerpt: "Learn how to extract clean URL query strings in ClickHouse® using queryString(), cutFragment(), and other built-in functions with practical examples and performance tips."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-10-03 17:22:12"
publishedOn: "2025-10-03 17:22:12"
updatedOn: "2025-10-03 17:22:12"
status: "published"
---

When you're analyzing web traffic or building APIs that process URLs, extracting clean query parameters without the leading `?` or trailing `#` fragments becomes a frequent requirement. ClickHouse®'s built-in URL functions handle this parsing automatically, eliminating the need for complex regex patterns or manual string manipulation.

This guide covers the primary methods for extracting query strings using `queryString()` and related functions, along with performance optimization strategies for large datasets and real-time ingestion patterns.

## ClickHouse® URL query functions explained

The `queryString()` function extracts the query string portion from a URL while automatically removing the leading question mark and any fragment identifiers. A query string contains the parameters that follow the `?` in a URL, like `param1=value1&param2=value2` in `https://example.com/path?param1=value1&param2=value2#section`.

You might be wondering why you'd want to extract just the query string without the `?` symbol. When you're analyzing web traffic or building APIs that process URL parameters, you often want the clean parameter string for logging, forwarding to other services, or parsing individual values.

ClickHouse® provides several built-in functions specifically designed for URL manipulation. The main ones you'll work with are `queryString()`, `cutFragment()`, and `cutQueryString()`. Each handles different aspects of URL parsing without requiring manual string manipulation or regular expressions.

## Syntax to extract the query string without ? or hash

The primary method uses ClickHouse®'s `queryString()` function, which returns a clean query string without the leading question mark or trailing fragments. This function handles the most common use case in a single operation.

### 1. queryString

The `queryString()` function takes a URL string as input and returns the query parameters as a clean string:

```sql
SELECT queryString('https://example.com/path?param1=value1&param2=value2#fragment') AS query_string;
-- Returns: param1=value1&param2=value2
```

This function automatically handles both the question mark removal and fragment stripping in a single operation. You can apply it directly to URL columns in your tables without additional preprocessing.

### 2. cutFragment

When working with URLs that contain fragments but you want to preserve other URL components, `cutFragment()` removes everything after the `#` symbol:

```sql
SELECT
    cutFragment('https://example.com/path?param1=value1#section') AS clean_url,
    queryString(cutFragment('https://example.com/path?param1=value1#section')) AS query_string;
-- Returns: https://example.com/path?param1=value1, param1=value1
```

Though `queryString()` already handles fragments internally, `cutFragment()` becomes useful when you need the clean URL for other operations. You'll find this pattern helpful when building URL normalization pipelines.

### 3. cutQueryString

The `cutQueryString()` function removes the query string entirely, returning only the base URL:

```sql
SELECT cutQueryString('https://example.com/path?param1=value1&param2=value2') AS base_url;
-- Returns: https://example.com/path
```

This function serves as the complement to `queryString()` when you want URL normalization or need to separate the base path from parameters.

## Handling missing or empty query strings

ClickHouse® functions behave predictably when URLs lack query strings, though the specific return values vary between functions. Understanding this behavior helps you write more robust queries.

### 1. Returning NULL versus empty string

The `queryString()` function returns an empty string when no query parameters exist:

```sql
SELECT queryString('https://example.com/path') AS result;
-- Returns: '' (empty string)

SELECT queryString('https://example.com/path?') AS result;
-- Returns: '' (empty string)
```

This consistent behavior simplifies downstream processing since you don't need to handle NULL values in most cases. However, you might want to distinguish between URLs with empty query strings versus URLs with no query strings at all.

### 2. Using ifNull for safe defaults

When you want explicit default values for missing query strings, combine `queryString()` with conditional functions:

```sql
SELECT
    url,
    CASE
        WHEN queryString(url) = '' THEN 'no-params'
        ELSE queryString(url)
    END AS query_or_default
FROM urls_table;
```

This pattern proves especially useful when building analytics queries where empty query strings might indicate specific user behaviors.

## queryString versus extractURLParameter and regex

Different URL parsing approaches serve different use cases, from extracting complete query strings to isolating individual parameters. The choice between methods affects both performance and code complexity.

### 1. Function comparison

Here's how the main URL parsing functions compare:

- **queryString():** Extracts complete query string, fast performance, returns string
- **extractURLParameter():** Gets specific parameter value, fast performance, returns string
- **extractURLParameterNames():** Lists all parameter names, medium performance, returns array
- **Regex patterns:** Handles complex URL parsing, slow performance, varies by pattern

Choose `queryString()` when you want the entire parameter string for logging or forwarding. Use `extractURLParameter()` when targeting specific parameters like tracking codes or user IDs.

### 2. When regex is still useful

Regex patterns become necessary for non-standard URL formats or when you want to extract multiple URL components simultaneously:

```sql
SELECT extractAllGroups(url, '([^?]+)\\?([^#]+)') AS url_parts
FROM urls_table;
-- Returns: [['https://example.com/path', 'param1=value1&param2=value2']]
```

However, regex parsing typically performs 3-5x slower than dedicated URL functions on large datasets. You'll notice this performance difference becomes significant when processing millions of URLs.

## Version compatibility and behavior changes

The `queryString()` function has been available since ClickHouse® 1.1.54388 with consistent behavior across versions. Recent versions include performance optimizations but maintain the same API contract.

ClickHouse® 23.x introduced additional URL functions like `extractURLParameterNames()` but the core `queryString()` function remains unchanged. If you're running older versions, verify function availability with `SELECT version()` before deployment.

## Optimizing performance on billion-row tables

URL parsing performance becomes critical when processing large datasets, particularly for real-time analytics workloads. The difference between optimized and unoptimized approaches can mean the difference between sub-second and multi-minute query times.

### 1. Projections or materialized views

Pre-compute query string extraction using [materialized views](https://tinybird.co/docs/forward/work-with-data/optimize/materialized-views) to avoid repeated parsing:

```sql
CREATE MATERIALIZED VIEW url_parsed_mv
ENGINE = MergeTree()
ORDER BY (timestamp, domain)
AS SELECT
    timestamp,
    domain,
    queryString(full_url) AS query_string,
    extractURLParameter(full_url, 'utm_source') AS utm_source
FROM raw_logs;
```

This approach reduces query latency from seconds to milliseconds on billion-row tables since the parsing happens once during ingestion. The trade-off is additional storage space for the parsed results.

### 2. Avoiding on-the-fly regex

Replace regex-based URL parsing with ClickHouse®'s native functions wherever possible:

```sql
-- Slow: regex parsing
SELECT extractAllGroups(url, '\\?([^#]+)') FROM logs;

-- Fast: native function
SELECT queryString(url) FROM logs;
```

Native functions leverage ClickHouse®'s optimized C++ implementations and often include SIMD optimizations for string processing.

## Streaming ingestion and on-the-fly parsing

Processing query strings during data ingestion rather than at query time reduces storage requirements and improves query performance. This approach works particularly well for high-throughput scenarios.

### 1. Kafka connector pattern

Extract query strings while ingesting from [Kafka](https://tinybird.co/docs/forward/get-data-in/connectors/kafka) streams using materialized views:

```sql
CREATE TABLE kafka_raw (
    timestamp DateTime,
    raw_url String
) ENGINE = Kafka()
SETTINGS kafka_broker_list = 'localhost:9092',
         kafka_topic_list = 'web_logs';

CREATE MATERIALIZED VIEW parsed_urls_mv TO parsed_urls AS
SELECT
    timestamp,
    raw_url,
    queryString(raw_url) AS query_params
FROM kafka_raw;
```

This pattern processes millions of URLs per second while maintaining low memory overhead. The parsing happens automatically as data flows through the system.

### 2. Materialized view pattern

Automatically parse URLs on insert using materialized views attached to your main table:

```sql
CREATE MATERIALIZED VIEW url_analytics_mv
ENGINE = AggregatingMergeTree()
ORDER BY (date, domain)
AS SELECT
    toDate(timestamp) AS date,
    domain(url) AS domain,
    queryString(url) AS query_string,
    count() AS page_views
FROM web_logs
GROUP BY date, domain, query_string;
```

This approach enables real-time URL analytics without impacting insert performance.

## Parsing individual parameters into key-value maps

Beyond extracting the complete query string, you can parse individual parameters into ClickHouse®'s native Map data type for structured analysis. This becomes especially useful when you want to analyze specific parameters across millions of URLs.

### 1. mapFromString for key value pairs

Convert query strings into Map types for easier parameter access:

```sql
SELECT
    url,
    mapFromString(queryString(url), '&', '=') AS params_map
FROM urls_table;
-- Returns: {'utm_source': 'google', 'utm_medium': 'cpc'}
```

The `mapFromString()` function splits the query string on `&` for pairs and `=` for key-value separation. This creates a structured format for parameter analysis.

### 2. Accessing parameters with mapGet

Extract specific parameter values from the parsed map:

```sql
SELECT
    url,
    mapGet(mapFromString(queryString(url), '&', '='), 'utm_source') AS traffic_source,
    mapGet(mapFromString(queryString(url), '&', '='), 'campaign_id') AS campaign
FROM marketing_data;
```

This pattern works particularly well for marketing attribution analysis where you want to isolate specific tracking parameters.

## Building a real-time, ClickHouse®-based URL analytics API with Tinybird

Let's create a complete example that ingests web traffic data and exposes an [API endpoint](https://www.tinybird.co/docs/forward/work-with-data/publish-data/endpoints) returning query parameter analytics using `queryString`.

This walkthrough shows how URL query parsing integrates into real-world analytics workflows, from data ingestion through API deployment. You'll see exactly how the function works in practice.

[Tinybird](https://www.tinybird.co) is a managed service that makes it easy to build real-time analytics APIs powered by ClickHouse®. It handles the infrastructure, scaling, and API generation so you can focus on writing SQL and building analytics features.

In this example, we'll use Tinybird to:

- Create a data source for ingesting web traffic events
- Write a pipe that processes and aggregates query parameters using `queryString`
- Deploy an API endpoint that serves query parameter analytics

The workflow demonstrates how to combine ClickHouse®'s powerful URL processing with Tinybird's API capabilities to build production-ready analytics services.

### 1. Create the data source

First, create a [data source](https://www.tinybird.co/docs/forward/get-data-in/data-sources) to store your web traffic events:

```tinybird
SCHEMA >
    `timestamp` DateTime64(3, 'UTC') `json:$.timestamp`,
    `url` String `json:$.url`,
    `user_id` UInt32 `json:$.user_id`,
    `session_id` String `json:$.session_id`,
    `page_title` String `json:$.page_title`,
    `referrer` String `json:$.referrer`

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)
```

Now build the data source locally:

```bash
tb dev
```

This builds the project on [Tinybird Local](https://www.tinybird.co/docs/forward/install-tinybird/local) (which you'll need to have running) and creates your data source. Then ingest some sample data using the Events API:

```bash
curl -X POST "http://localhost:7181/v0/events?name=web_traffic" \
  -H "Content-Type: application/json" \
  -d '{"timestamp": "2025-10-03 14:30:15.123", "url": "https://www.example.com/products?utm_source=google&utm_medium=cpc&campaign_id=123", "user_id": 1001, "session_id": "sess_abc123", "page_title": "Products Page", "referrer": "https://www.google.com/search"}
{"timestamp": "2025-10-03 14:30:16.456", "url": "https://blog.example.com/article?id=456&source=newsletter", "user_id": 1002, "session_id": "sess_def456", "page_title": "Blog Article", "referrer": "https://www.facebook.com"}
{"timestamp": "2025-10-03 14:30:17.789", "url": "https://www.github.com/tinybird?tab=repositories&sort=stars", "user_id": 1003, "session_id": "sess_ghi789", "page_title": "GitHub Profile", "referrer": "https://www.twitter.com"}'
```

### 2. Write the pipe query

Create a [pipe](https://www.tinybird.co/docs/forward/work-with-data/pipes) that analyzes query parameters:

```tinybird
NODE traffic_events
SQL >
    SELECT
        timestamp,
        url,
        queryString(url) AS query_string,
        domain(url) AS domain,
        extractURLParameter(url, 'utm_source') AS utm_source,
        extractURLParameter(url, 'utm_medium') AS utm_medium,
        extractURLParameter(url, 'campaign_id') AS campaign_id,
        user_id,
        session_id,
        page_title
    FROM web_traffic
    WHERE timestamp >= now() - INTERVAL 7 DAY
      AND queryString(url) != ''

NODE query_analytics
SQL >
    %
    SELECT
        domain,
        utm_source,
        utm_medium,
        campaign_id,
        COUNT(*) AS total_events,
        COUNT(DISTINCT user_id) AS unique_users,
        COUNT(DISTINCT session_id) AS unique_sessions,
        COUNT(DISTINCT query_string) AS unique_query_strings,
        MIN(timestamp) AS first_seen,
        MAX(timestamp) AS last_seen
    FROM traffic_events
    WHERE
        {\% if defined(domain) %}
            domain = {{ String(domain) }}
        {\% end %}
        {\% if defined(utm_source) %}
            AND utm_source = {{ String(utm_source) }}
        {\% end %}
        {\% if defined(date_from) %}
            AND timestamp >= {{ DateTime(date_from) }}
        {\% end %}
        {\% if defined(date_to) %}
            AND timestamp <= {{ DateTime(date_to) }}
        {\% end %}
    GROUP BY domain, utm_source, utm_medium, campaign_id
    ORDER BY total_events DESC
    LIMIT {{ Int32(limit, 100) }}

TYPE endpoint
```

The `queryString` function ensures consistent query parameter extraction regardless of URL complexity.

### 3. Deploy and call the endpoint

Deploy your pipe to create the API endpoint:

```bash
tb --cloud deploy
```

Then call your new endpoint:

```bash
curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/query_analytics.json?domain=example.com&limit=5"
```

The response returns neatly aggregated data with query parameter insights:

```json
{
  "meta": [
    {
      "name": "domain",
      "type": "String"
    },
    {
      "name": "utm_source",
      "type": "Nullable(String)"
    },
    {
      "name": "utm_medium",
      "type": "Nullable(String)"
    },
    {
      "name": "campaign_id",
      "type": "Nullable(String)"
    },
    {
      "name": "total_events",
      "type": "UInt64"
    },
    {
      "name": "unique_users",
      "type": "UInt64"
    },
    {
      "name": "unique_sessions",
      "type": "UInt64"
    },
    {
      "name": "unique_query_strings",
      "type": "UInt64"
    },
    {
      "name": "first_seen",
      "type": "DateTime"
    },
    {
      "name": "last_seen",
      "type": "DateTime"
    }
  ],
  "data": [
    {
      "domain": "example.com",
      "utm_source": "google",
      "utm_medium": "cpc",
      "campaign_id": "123",
      "total_events": 1247,
      "unique_users": 892,
      "unique_sessions": 1156,
      "unique_query_strings": 23,
      "first_seen": "2025-10-03 14:30:15",
      "last_seen": "2025-10-03 15:45:22"
    },
    {
      "domain": "example.com",
      "utm_source": null,
      "utm_medium": null,
      "campaign_id": null,
      "total_events": 456,
      "unique_users": 234,
      "unique_sessions": 345,
      "unique_query_strings": 12,
      "first_seen": "2025-10-03 14:30:16",
      "last_seen": "2025-10-03 15:30:18"
    }
  ],
  "rows": 2,
  "statistics": {
    "elapsed": 0.045,
    "rows_read": 2000,
    "bytes_read": 89000
  }
}
```

## Next steps for fast URL analytics with ClickHouse®

While ClickHouse®'s URL functions provide excellent performance for URL parsing, managing ClickHouse® clusters at scale introduces operational complexity. Tinybird's [managed ClickHouse® platform](https://www.tinybird.co/clickhouse) significantly reduces infrastructure overhead and delivers ClickHouse®-level performance in a managed environment.

The platform includes built-in API generation, real-time ingestion, and automatic scaling. You can deploy URL parsing pipelines in minutes rather than weeks.

[Sign up for a free Tinybird account](https://cloud.tinybird.co/signup) to build and test your first ClickHouse®-based API in just a few minutes.

## Additional resources

- [queryString function](https://clickhouse.com/docs/en/sql-reference/functions/url-functions#querystring)
- [extractURLParameter function](https://clickhouse.com/docs/en/sql-reference/functions/url-functions#extracturlparameter)
- [cutFragment function](https://clickhouse.com/docs/en/sql-reference/functions/string-functions#cutfragment)
- [URL functions in ClickHouse®](https://clickhouse.com/docs/en/sql-reference/functions/url-functions)
- [String functions in ClickHouse®](https://clickhouse.com/docs/en/sql-reference/functions/string-functions)
- [How to extract URL fragments without hash symbols (#) in ClickHouse®](https://www.tinybird.co/blog-posts/extract-url-fragment-clickhouse)
- [How to extract the protocol of a URL in ClickHouse®](https://www.tinybird.co/blog-posts/extract-url-protocol-clickhouse)
- [How to extract domains without www from URLs in ClickHouse®](https://www.tinybird.co/blog-posts/extract-domain-without-www-clickhouse)

## Frequently asked questions about extracting ClickHouse® query strings

### Did queryString always strip the leading question mark?

Yes, the `queryString()` function has consistently returned clean query strings without the `?` prefix across all ClickHouse® versions since its introduction. This behavior remains unchanged in current versions, ensuring backward compatibility for existing queries.

### What happens if the URL contains encoded "?" or "#" characters?

ClickHouse® handles URL-encoded characters like `%3F` (encoded `?`) and `%23` (encoded `#`) as literal characters within the query string. The function only recognizes actual `?` and `#` characters as delimiters, not their encoded equivalents.

### Can I extract multiple query strings from a single text blob?

Yes, combine `extractAllGroups()` with regex patterns to find multiple URLs in text and extract their query strings:

```sql
SELECT arrayMap(x -> queryString(x), extractAll(text, 'https?://[^\\s]+')) AS all_query_strings
FROM text_data;
```

This approach works well for parsing log files or documents containing multiple URLs.

### How do I remove the query string entirely from a URL column?

Use the `cutQueryString()` function to get clean URLs without parameters:

```sql
SELECT cutQueryString(url) AS clean_url FROM urls_table;
-- Converts: https://example.com/path?param=value
-- To: https://example.com/path
```

This function proves useful for URL normalization in analytics where you want to group pages regardless of their parameters.
