Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Web analytics URLs often arrive cluttered with tracking parameters (those ?utm_source=google&utm_medium=cpc strings) that make URL analysis messy and grouping difficult. Query parameters serve their purpose for campaign tracking and session management, but they create noise when you're trying to analyze page performance or user behavior patterns.

ClickHouse® provides the cutQueryString() function specifically for this problem, letting you strip everything after the question mark with a single function call. This guide covers the function syntax, performance considerations, and production patterns for cleaning URLs at scale, plus how to build selective parameter removal when you need more control than the all-or-nothing approach.

Quick answer with cutQueryString

When you're working with URLs in ClickHouse®, you'll often find yourself needing to clean them up. Query parameters (those bits after the question mark like ?utm_source=google&utm_medium=cpc) can clutter your analytics and make URL grouping difficult.

ClickHouse® gives you the cutQueryString() function to handle this exact problem. This function removes everything after the question mark, including the question mark itself, leaving you with just the clean base URL.

SELECT cutQueryString('https://example.com/page?utm_source=google&utm_medium=cpc') AS clean_url;
-- Returns: https://example.com/page

Run `SELECT cutQueryString(url)`

The most straightforward approach takes your URL column and applies the function directly:

SELECT 
    original_url,
    cutQueryString(original_url) AS clean_url
FROM your_table;

This works whether your URLs have query parameters or not—URLs without parameters come back unchanged.

Function syntax and return type

The cutQueryString() function is purpose-built for URL manipulation, which makes it more reliable than trying to cobble together a solution with generic string functions. It takes one argument and returns a cleaned URL.

Arguments

You pass in a single String argument representing the URL you want to clean. The URL can be absolute like https://example.com/page?param=value or relative like /page?param=value.

Return value

The function returns a String with the original URL minus everything from the first question mark onward. If there's no question mark in your input, you get back the original string exactly as it was.

Supported versions

You'll find cutQueryString() in ClickHouse 19.14 and later. Most production deployments will have access to this function—it's been around long enough to be widely available.

Working example on a sample table

Let's walk through a realistic scenario using web analytics data. You're tracking user behavior, but your URLs are cluttered with tracking parameters that make analysis harder than it needs to be.

Create test data

First, we'll set up a table with URLs that mirror what you'd see in real web analytics:

CREATE TABLE web_events (
    timestamp DateTime,
    user_id UInt32,
    page_url String,
    referrer String
) ENGINE = MergeTree()
ORDER BY timestamp;

INSERT INTO web_events VALUES
    ('2024-01-15 10:30:00', 1001, 'https://shop.example.com/products?utm_source=google&utm_medium=cpc&utm_campaign=winter_sale', 'https://google.com'),
    ('2024-01-15 10:31:00', 1002, 'https://shop.example.com/checkout?session_id=abc123&ref=homepage', 'https://shop.example.com'),
    ('2024-01-15 10:32:00', 1003, 'https://shop.example.com/about', ''),
    ('2024-01-15 10:33:00', 1004, 'https://shop.example.com/contact?source=newsletter&campaign=jan2024', 'https://newsletter.example.com');

Compare before and after

Now you can see exactly how cutQueryString() transforms your URLs:

SELECT 
    page_url AS original,
    cutQueryString(page_url) AS cleaned
FROM web_events
ORDER BY timestamp;

The transformation becomes clear when you see the results side by side:

Original: https://shop.example.com/products?utm_source=google&utm_medium=cpc&utm_campaign=winter_sale
Cleaned: https://shop.example.com/products
Original: https://shop.example.com/checkout?session_id=abc123&ref=homepage
Cleaned: https://shop.example.com/checkout
Original: https://shop.example.com/about
Cleaned: https://shop.example.com/about (unchanged)

Does cutQueryString remove URL fragments

Here's something that trips people up: cutQueryString() only removes query parameters (everything after ?) but leaves URL fragments (everything after #) alone. This distinction matters when you're working with single-page applications or anchor links.

Behavior with `#fragment`

Watch what happens when your URL has both query parameters and fragments:

SELECT cutQueryString('https://example.com/page?param=value#section') AS result;
-- Returns: https://example.com/page#section

The query parameters disappear, but the fragment stays put. If you want to remove fragments too, you'll combine cutQueryString() with cutFragment():

SELECT cutFragment(cutQueryString('https://example.com/page?param=value#section')) AS clean_url;
-- Returns: https://example.com/page

Performance versus regular expressions

Native ClickHouse URL functions like cutQueryString() perform better than regex-based alternatives. The built-in function is optimized specifically for URL parsing and skips the overhead that comes with pattern matching.

Benchmark results

While exact performance depends on your data and hardware, the differences are significant enough to matter:

Method	Readability	Performance	Use Case
cutQueryString	High	Fast	Remove all parameters
replaceRegexpAll	Medium	Slower	Complex pattern matching
substring + indexOf	Low	Medium	Custom logic needed

For straightforward query parameter removal, cutQueryString() gives you the best combination of speed and simplicity. However, when you need more complex URL manipulation—like removing only specific parameters—regex functions become necessary.

Keeping some parameters while dropping others

Sometimes you want to keep certain query parameters while removing others. Unfortunately, cutQueryString() is an all-or-nothing function—it removes everything. For selective parameter removal, you'll need different approaches.

Using `replaceRegexpAll`

To remove specific parameters while preserving others, regex patterns work well:

SELECT replaceRegexpAll(
    'https://example.com/page?keep=yes&remove=no&also_keep=maybe',
    '[?&]remove=[^&]*&?',
    ''
) AS selective_clean;
-- Removes only the 'remove' parameter

Regex fallback

For more complex scenarios, you might combine multiple regex operations or build custom logic using extractURLParameter() to rebuild URLs with only the parameters you want:

SELECT 
    cutQueryString(url) || '?keep=' || extractURLParameter(url, 'keep') AS rebuilt_url
FROM your_table
WHERE extractURLParameter(url, 'keep') != '';

Automating cleanup with a materialized view

In production, you typically want to clean URLs automatically as data arrives rather than processing them during every query. Materialized views give you an elegant solution for this pattern—they do the work once when data comes in, not every time you query it.

View DDL

Create a materialized view that automatically strips query parameters from incoming URLs:

CREATE MATERIALIZED VIEW clean_web_events
ENGINE = MergeTree()
ORDER BY timestamp
AS SELECT
    timestamp,
    user_id,
    cutQueryString(page_url) AS clean_page_url,
    cutQueryString(referrer) AS clean_referrer
FROM web_events;

Real-time verification

After creating the materialized view, verify it's working correctly with new data:

-- Insert new test data
INSERT INTO web_events VALUES
    ('2024-01-15 11:00:00', 1005, 'https://example.com/new-page?test=123', '');

-- Check the materialized view
SELECT * FROM clean_web_events WHERE user_id = 1005;
-- Shows clean_page_url as 'https://example.com/new-page'

The materialized view processes new data automatically, giving you cleaned URLs without any query-time overhead.

Building a URL cleaning API with Tinybird

Let's create a complete example that demonstrates how to build a real-time API for URL cleaning using Tinybird's managed ClickHouse platform.

This walkthrough shows how to ingest web analytics data, process URLs with ClickHouse functions, and expose clean URL analytics through a real-time API endpoint.

1. Create the data source

First, create a data source to store web analytics events:

SCHEMA >
    `timestamp` DateTime64(3) `json:$.timestamp`,
    `user_id` String `json:$.user_id`,
    `page_url` String `json:$.page_url`,
    `referrer` String `json:$.referrer`,
    `utm_source` String `json:$.utm_source`,
    `utm_medium` String `json:$.utm_medium`

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
ENGINE_PARTITION_KEY toYYYYMM(timestamp)

Now build the project locally:

tb dev

This builds the project on Tinybird Local and creates your data source. Then ingest some sample data using the Events API:

curl -X POST "http://localhost:7181/v0/events?name=web_analytics" \
  -H "Content-Type: application/json" \
  -d '{"timestamp": "2025-01-15 14:30:15.123", "user_id": "user123", "page_url": "https://shop.example.com/products?utm_source=google&utm_medium=cpc&utm_campaign=winter_sale", "referrer": "https://google.com", "utm_source": "google", "utm_medium": "cpc"}
{"timestamp": "2025-01-15 14:30:16.456", "user_id": "user456", "page_url": "https://shop.example.com/checkout?session_id=abc123&ref=homepage", "referrer": "https://shop.example.com", "utm_source": "", "utm_medium": ""}
{"timestamp": "2025-01-15 14:30:17.789", "user_id": "user789", "page_url": "https://shop.example.com/about", "referrer": "", "utm_source": "", "utm_medium": ""}'

2. Create the pipe files

Create a pipe that processes URL data and provides analytics:

NODE web_events
SQL >
    SELECT
        timestamp,
        user_id,
        page_url,
        referrer,
        utm_source,
        utm_medium,
        cutQueryString(page_url) AS clean_page_url,
        cutQueryString(referrer) AS clean_referrer,
        domain(page_url) AS domain,
        extractURLParameter(page_url, 'utm_campaign') AS utm_campaign
    FROM web_analytics
    WHERE timestamp >= now64(3) - INTERVAL 1 HOUR

NODE url_analytics
SQL >
    %
    SELECT
        clean_page_url,
        domain,
        utm_source,
        utm_medium,
        COUNT(*) AS page_views,
        COUNT(DISTINCT user_id) AS unique_users,
        MIN(timestamp) AS first_seen,
        MAX(timestamp) AS last_seen
    FROM web_events
    WHERE 
        {\% if defined(domain) %}
            domain = {{ String(domain) }}
        {\% end %}
        {\% if defined(utm_source) %}
            AND utm_source = {{ String(utm_source) }}
        {\% end %}
        {\% if defined(hours_back) %}
            AND timestamp >= now64(3) - INTERVAL {{ Int32(hours_back) }} HOUR
        {\% end %}
    GROUP BY clean_page_url, domain, utm_source, utm_medium
    ORDER BY page_views DESC
    LIMIT {{ Int32(limit, 100) }}

TYPE endpoint

3. Deploy and test the API

Deploy your project to Tinybird Cloud:

tb --cloud deploy

Then test your API endpoint:

# Get all page analytics from the last hour
curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/url_analytics.json"

# Filter by domain
curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/url_analytics.json?domain=shop.example.com"

# Filter by UTM source and time range
curl -H "Authorization: Bearer <your_token>" \
  "https://api.tinybird.co/v0/pipes/url_analytics.json?utm_source=google&hours_back=2"

4. Sample API response

The API returns structured data with clean URL analytics:

{
  "meta": [
    {
      "name": "clean_page_url",
      "type": "String"
    },
    {
      "name": "domain",
      "type": "String"
    },
    {
      "name": "utm_source",
      "type": "String"
    },
    {
      "name": "utm_medium",
      "type": "String"
    },
    {
      "name": "page_views",
      "type": "UInt64"
    },
    {
      "name": "unique_users",
      "type": "UInt64"
    },
    {
      "name": "first_seen",
      "type": "DateTime64(3)"
    },
    {
      "name": "last_seen",
      "type": "DateTime64(3)"
    }
  ],
  "data": [
    {
      "clean_page_url": "https://shop.example.com/products",
      "domain": "shop.example.com",
      "utm_source": "google",
      "utm_medium": "cpc",
      "page_views": 1247,
      "unique_users": 892,
      "first_seen": "2025-01-15 14:30:15.123",
      "last_seen": "2025-01-15 15:45:22.789"
    },
    {
      "clean_page_url": "https://shop.example.com/checkout",
      "domain": "shop.example.com",
      "utm_source": "",
      "utm_medium": "",
      "page_views": 456,
      "unique_users": 234,
      "first_seen": "2025-01-15 14:30:16.456",
      "last_seen": "2025-01-15 15:30:18.345"
    }
  ],
  "rows": 2,
  "statistics": {
    "elapsed": 0.045,
    "rows_read": 2000,
    "bytes_read": 89000
  }
}

Other useful URL functions

ClickHouse provides several other URL manipulation functions that work well alongside cutQueryString() for comprehensive URL processing workflows.

hostname

Extract the domain name from URLs using the hostname() function:

SELECT hostname('https://shop.example.com/products?sale=true') AS domain;
-- Returns: shop.example.com

domain

Get the top-level domain portion with the domain() function:

SELECT domain('https://shop.example.com/products') AS tld;
-- Returns: example.com

extractURLParameter

Retrieve specific query parameter values before removing them:

SELECT 
    extractURLParameter(page_url, 'utm_source') AS traffic_source,
    cutQueryString(page_url) AS clean_url
FROM web_events;

This combination lets you capture important parameter values while still cleaning the URLs for analysis.

Build real-time, web analytics APIs with ClickHouse

Managing ClickHouse infrastructure for URL processing involves complexity—from optimizing storage and query performance to handling URL parsing edge cases and maintaining consistent data quality across distributed systems.

Tinybird's managed ClickHouse service reduces infrastructure overhead by providing a managed ClickHouse environment that abstracts optimization, scaling, and maintenance. Tinybird exposes ClickHouse features such as cutQueryString() and URL parsing without requiring direct infrastructure management, including database clusters, performance monitoring, or version upgrades.

This enables teams to focus on application development rather than database operations or configuring ClickHouse for production workloads. Tinybird's platform supports developer efficiency through managed scaling and workflow integrations.

Additional resources

FAQs about stripping query parameters in ClickHouse

How do I backfill an existing table with cleaned URLs without downtime?

Use ALTER TABLE UPDATE with cutQueryString() to modify existing data in place. ClickHouse processes updates asynchronously without blocking queries, though large tables may take time to complete the mutation.

Can I strip only dynamic tracking params like gclid but keep others?

Use replaceRegexpAll() with specific parameter patterns or combine extractURLParameter() with cutQueryString() for selective removal. Note that cutQueryString() removes all parameters, so selective removal requires regex or custom rebuilding logic.

Does cutQueryString handle non-UTF-8 URLs correctly?

ClickHouse cutQueryString() works with any valid URL string encoding. Invalid URL formats return the original string unchanged rather than throwing errors, making the function safe for messy real-world data.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

How to strip query parameters from URLs in ClickHouse^®

Quick answer with cutQueryString

Run `SELECT cutQueryString(url)`

Function syntax and return type

Arguments

Return value

Supported versions

Working example on a sample table

Create test data

Compare before and after

Does cutQueryString remove URL fragments

Behavior with `#fragment`

Performance versus regular expressions

Benchmark results

Keeping some parameters while dropping others

Using `replaceRegexpAll`

Regex fallback

Automating cleanup with a materialized view

View DDL

Real-time verification

Building a URL cleaning API with Tinybird

1. Create the data source

2. Create the pipe files

3. Deploy and test the API

4. Sample API response

Other useful URL functions

hostname

domain

extractURLParameter

Build real-time, web analytics APIs with ClickHouse

Additional resources

FAQs about stripping query parameters in ClickHouse

How do I backfill an existing table with cleaned URLs without downtime?

Can I strip only dynamic tracking params like gclid but keep others?

Does cutQueryString handle non-UTF-8 URLs correctly?

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse® project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse® project now.

How to strip query parameters from URLs in ClickHouse®

Quick answer with cutQueryString

Run SELECT cutQueryString(url)

Function syntax and return type

Arguments

Return value

Supported versions

Working example on a sample table

Create test data

Compare before and after

Does cutQueryString remove URL fragments

Behavior with #fragment

Performance versus regular expressions

Benchmark results

Keeping some parameters while dropping others

Using replaceRegexpAll

Regex fallback

Automating cleanup with a materialized view

View DDL

Real-time verification

Building a URL cleaning API with Tinybird

1. Create the data source

2. Create the pipe files

3. Deploy and test the API

4. Sample API response

Other useful URL functions

hostname

domain

extractURLParameter

Build real-time, web analytics APIs with ClickHouse

Additional resources

FAQs about stripping query parameters in ClickHouse

How do I backfill an existing table with cleaned URLs without downtime?

Can I strip only dynamic tracking params like gclid but keep others?

Does cutQueryString handle non-UTF-8 URLs correctly?

Skip the infra work. Deploy your first ClickHouse® project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

Skip the infra work. Deploy your first ClickHouse^®
project now.

How to strip query parameters from URLs in ClickHouse^®

Run `SELECT cutQueryString(url)`

Behavior with `#fragment`

Using `replaceRegexpAll`

Skip the infra work. Deploy your first ClickHouse^®
project now.