Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

URL fragments—the text after the # symbol in web addresses—help browsers navigate to specific sections within pages, but extracting them for analysis requires the right approach. ClickHouse's fragment() function strips away the hash symbol and returns just the meaningful anchor text, making it straightforward to analyze user navigation patterns and page engagement.

This guide covers the syntax and performance considerations for fragment extraction, along with edge cases where regex patterns work better than native functions. You'll also see how to transform fragment analysis into production APIs using managed ClickHouse platforms.

How fragment extraction works in ClickHouse

ClickHouse provides the fragment() function to extract the part of a URL that comes after the hash symbol (#) without including the hash itself. When you call fragment('https://example.com/page#section1'), it returns just section1, not #section1.

This might seem straightforward, but there's a key distinction here. URL fragments serve as anchor points within web pages—they tell browsers where to jump on a page. Unlike query parameters that get sent to servers, fragments stay in the browser and guide navigation to specific sections.

The fragment() function handles this extraction cleanly, which matters when you're analyzing user behavior or building navigation systems. You get the meaningful part without having to strip characters manually.

Fragment vs path vs query string

Understanding URL structure helps clarify what the fragment() function actually extracts:

Fragment: The anchor portion after # (section1 in https://example.com/page#section1)
Path: The directory structure (/page/article in https://example.com/page/article)
Query string: Parameters after ? (id=123&type=news in https://example.com?id=123&type=news)

Each piece serves a different purpose. Paths define resources on servers, query strings pass data to applications, and fragments control client-side navigation.

Syntax and minimal example using fragment()

The fragment() function takes a single string argument containing a URL and returns the fragment portion as a string. If no fragment exists, it returns an empty string rather than NULL.

SELECT fragment(url_column) FROM your_table;

1. Create sample table

Start by creating a test table with various URL examples to see how fragment extraction works:

CREATE TABLE url_examples (
    id UInt32,
    url String
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO url_examples VALUES
    (1, 'https://example.com/page#intro'),
    (2, 'https://docs.site.com/guide#getting-started'),
    (3, 'https://blog.example.com/post'),
    (4, 'https://shop.example.com/product#reviews');

Notice how some URLs have fragments while others don't. This mix helps demonstrate different scenarios you'll encounter with real data.

2. Run a select using fragment()

Extract fragments from your sample URLs using the fragment() function:

SELECT 
    url,
    fragment(url) AS extracted_fragment
FROM url_examples;

The query processes each URL and pulls out whatever comes after the hash symbol. For URLs without fragments, you get empty strings.

3. Verify output without the hash

The query returns clean fragment values without the hash symbol:

┌─url─────────────────────────────────────┬─extracted_fragment─┐
│ https://example.com/page#intro          │ intro              │
│ https://docs.site.com/guide#getting-started │ getting-started    │
│ https://blog.example.com/post           │                    │
│ https://shop.example.com/product#reviews │ reviews            │
└─────────────────────────────────────────┴────────────────────┘

You can see how the function strips the # character completely. This saves you from having to clean the data later.

Handling empty or missing fragments

URLs without fragments return empty strings rather than NULL values, but you might want different behavior depending on your use case. The fragment() function consistently handles missing fragments by returning empty strings, which works well for most analytics scenarios.

However, sometimes you need to distinguish between URLs that have empty fragments (#) and URLs with no fragment at all. The function treats both cases the same way.

Coalesce to empty string

When you want consistent string output for downstream processing, wrap the function with COALESCE():

SELECT 
    url,
    COALESCE(fragment(url), 'no-fragment') AS fragment_or_default
FROM url_examples;

This approach works well when you're building reports or dashboards where empty values might cause display issues.

Preserve nulls for upstream apps

For applications that distinguish between empty fragments and missing fragments, use conditional logic:

SELECT 
    url,
    CASE 
        WHEN position(url, '#') = 0 THEN NULL
        ELSE fragment(url)
    END AS fragment_with_nulls
FROM url_examples;

This pattern helps when your application logic needs to handle the two cases differently.

Performance and storage considerations at scale

Processing millions of URLs requires optimization strategies beyond basic function calls. ClickHouse offers several approaches to improve performance when working with fragments at scale, especially when you're running the same fragment extraction repeatedly.

The key insight here is that fragment extraction can become expensive when you're doing it on every query. Pre-computing and storing fragments often makes more sense.

Add a materialized column

Create a computed column that stores fragment values automatically:

ALTER TABLE url_examples 
ADD COLUMN url_fragment String MATERIALIZED fragment(url);

Storage trade-off: You use more disk space but gain query speed
Automatic updates: New rows get fragment values computed on insert
Query optimization: Filtering and grouping by fragments becomes much faster

This approach works particularly well when you frequently filter or group by fragments in your queries.

Use LowCardinality for group by

When fragments have limited unique values, wrap them in LowCardinality for better compression and performance:

SELECT 
    toLowCardinality(fragment(url)) AS fragment_category,
    count() AS page_views
FROM url_examples
GROUP BY fragment_category;

LowCardinality works best when you have many repeated fragment values—think navigation sections that appear across many pages.

Benchmark native functions vs regex

The fragment() function typically outperforms regex alternatives, but you can verify with your data:

-- Native function (recommended)
SELECT fragment(url) FROM large_table;

-- Regex alternative (slower)
SELECT extractAll(url, '#(.*)$')[1] FROM large_table;

We've found the native function runs about 3x faster on typical URL datasets, but your mileage may vary depending on URL complexity.

When to use regex instead of native URL functions

While fragment() handles standard URLs effectively, certain scenarios require regex patterns for more flexible extraction. The native function assumes well-formed URLs, which isn't always the case with real-world data.

You'll encounter edge cases where the built-in function doesn't work as expected. Malformed URLs, custom protocols, or legacy data often require more flexible approaches.

Pre-20.3 ClickHouse versions

Earlier ClickHouse versions lack the fragment() function, requiring regex-based extraction:

-- For ClickHouse versions before 20.3
SELECT 
    url,
    extractAll(url, '#(.*)')[1] AS fragment_extracted
FROM url_examples;

If you're stuck on an older version, this regex pattern captures everything after the first hash symbol. It's not as fast as the native function, but it works reliably.

Non-standard URI schemes

Custom protocols or malformed URLs might need regex handling:

-- Extract fragments from custom schemes
SELECT 
    url,
    extractAll(url, '#([^?&]*)')[1] AS custom_fragment
FROM url_examples
WHERE url LIKE 'custom://%';

This pattern stops at query parameters or additional hash symbols, which can be useful for cleaning up messy data.

Related URL functions developers confuse with fragment()

ClickHouse provides several URL manipulation functions that serve different purposes than fragment extraction. The naming similarities often lead to confusion, especially when you're working quickly or learning the functions for the first time.

Getting these mixed up can lead to subtle bugs that are hard to track down. Let's clarify what each function actually does.

urlCutFragment

This function removes fragments entirely rather than extracting them:

-- Removes fragment, returns URL without #section
SELECT cutFragment('https://example.com/page#section'); 
-- Result: 'https://example.com/page'

-- Extracts fragment, returns just the fragment part
SELECT fragment('https://example.com/page#section');
-- Result: 'section'

Use cutFragment() when you want clean URLs for canonical purposes or when fragments interfere with your analysis.

urlCutQueryString

Query string manipulation affects the portion after the ? symbol, not fragments:

SELECT 
    cutQueryString('https://example.com/page?id=123#section') AS no_query,
    fragment('https://example.com/page?id=123#section') AS fragment_part;
-- Results: 'https://example.com/page#section', 'section'

Notice how cutting the query string leaves the fragment intact. These functions work on different parts of the URL structure.

extractURLParameter

Parameter extraction works with query strings, while fragments represent page anchors:

SELECT 
    extractURLParameter('https://example.com/page?id=123&type=news', 'id') AS param_value,
    fragment('https://example.com/page?id=123&type=news#comments') AS fragment_value;
-- Results: '123', 'comments'

Parameters carry data to servers, fragments control browser navigation. The distinction matters when you're building analytics that track both user interactions and page sections.

Building ClickHouse-based user navigation analytics APIs with Tinybird

You can transform your fragment extraction into a comprehensive user navigation analytics API using Tinybird's managed ClickHouse platform.

Instead of managing ClickHouse infrastructure, you define your analytics logic as code and Tinybird handles the operational complexity. The platform provides sub-second query performance for millions of navigation events while abstracting away database administration.

1. Create a data source for user navigation events

Define your navigation analytics data structure in a data source file:

SCHEMA >
    `timestamp` DateTime64(3, 'UTC'),
    `user_id` String,
    `session_id` String,
    `page_url` String,
    `referrer_url` String,
    `user_agent` String,
    `country` LowCardinality(String),
    `device_type` LowCardinality(String),
    `page_title` String,
    `time_on_page` UInt32,
    `is_bounce` UInt8

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"

This schema captures comprehensive user navigation data including fragment-based navigation patterns. The partitioning by month ensures fast queries as your analytics data grows.

2. Build navigation analytics pipes

Create pipes that analyze fragment-based user journeys and section engagement:

NODE section_engagement_analysis
SQL >
    %
    WITH navigation_events AS (
        SELECT 
            timestamp,
            user_id,
            session_id,
            page_url,
            page_title,
            country,
            device_type,
            time_on_page,
            is_bounce,
            fragment(page_url) AS section_fragment
        FROM user_navigation
        WHERE timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-31 23:59:59')}}
        {\% if defined(country_filter) %}
          AND country = {{String(country_filter)}}
        {\% end %}
        {\% if defined(device_filter) %}
          AND device_type = {{String(device_filter)}}
        {\% end %}
    )
    SELECT 
        section_fragment,
        page_title,
        count() AS section_views,
        uniq(user_id) AS unique_users,
        uniq(session_id) AS unique_sessions,
        countIf(is_bounce = 1) AS bounces,
        round(countIf(is_bounce = 1) * 100.0 / count(), 2) AS bounce_rate_percent,
        round(avg(time_on_page), 2) AS avg_time_on_section,
        -- Navigation flow analysis
        round(count() * 100.0 / sum(count()) OVER (), 2) AS traffic_share_percent,
        -- Top referrers to this section
        topK(3)(referrer_url) AS top_referrers
    FROM navigation_events
    WHERE section_fragment != ''
    GROUP BY section_fragment, page_title
    ORDER BY section_views DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 50)}}
    {\% end %}

TYPE ENDPOINT

Create a user journey analysis pipe:

NODE user_journey_analysis
SQL >
    %
    WITH user_sessions AS (
        SELECT 
            user_id,
            session_id,
            timestamp,
            page_url,
            fragment(page_url) AS section_fragment,
            time_on_page,
            row_number() OVER (PARTITION BY session_id ORDER BY timestamp) AS navigation_step
        FROM user_navigation
        WHERE timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-31 23:59:59')}}
        {\% if defined(user_id_filter) %}
          AND user_id = {{String(user_id_filter)}}
        {\% end %}
    ),
    journey_flows AS (
        SELECT 
            section_fragment,
            navigation_step,
            count() AS step_occurrences,
            uniq(session_id) AS unique_sessions,
            round(avg(time_on_page), 2) AS avg_time_at_step,
            -- Calculate drop-off rate
            round((count() - lead(count()) OVER (ORDER BY navigation_step)) * 100.0 / count(), 2) AS drop_off_rate
        FROM user_sessions
        WHERE section_fragment != ''
        GROUP BY section_fragment, navigation_step
    )
    SELECT 
        section_fragment,
        navigation_step,
        step_occurrences,
        unique_sessions,
        avg_time_at_step,
        drop_off_rate,
        -- Identify critical drop-off points
        CASE 
            WHEN drop_off_rate > 50 THEN 'high_dropoff'
            WHEN drop_off_rate > 25 THEN 'medium_dropoff'
            ELSE 'low_dropoff'
        END AS dropoff_category
    FROM journey_flows
    ORDER BY navigation_step ASC, step_occurrences DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT

3. Deploy and test your analytics APIs

Deploy your pipes and test the generated API endpoints:

tb --cloud deploy

Test your navigation analytics APIs:

# Section engagement analysis
curl "https://api.tinybird.co/v0/pipes/section_engagement_analysis.json?start_date=2024-01-01&end_date=2024-01-31&country_filter=US&limit=20" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# User journey analysis
curl "https://api.tinybird.co/v0/pipes/user_journey_analysis.json?start_date=2024-01-01&end_date=2024-01-31&user_id_filter=user_123" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

The APIs return comprehensive navigation analytics including section engagement metrics, user journey flows, and drop-off analysis. This enables real-time insights into how users navigate through complex interfaces, identify content bottlenecks, and optimize user experience based on fragment-based navigation patterns.

Next steps with Tinybird for real-time web analytics

Building production-ready URL analytics requires reliable ingestion, low-latency queries, and scalable APIs. Managing your own ClickHouse infrastructure means dealing with replication, backups, monitoring, and scaling challenges that can distract from building features.

Tinybird handles the operational complexity while providing extensive ClickHouse capabilities for analytics workloads.

Whether you're tracking user navigation patterns, analyzing content engagement, or building real-time dashboards, the managed approach allows you to focus on analytics logic rather than database administration.

If you want to try Tinybird, you can sign up for free to build and test your first API in a few minutes.

FAQs about URL fragment extraction in ClickHouse

What happens if the URL has multiple "#" characters?

The fragment() function returns everything after the first # symbol, treating subsequent # characters as literal text within the fragment. For example, fragment('https://example.com#section#subsection') returns section#subsection.

Does fragment() validate percent-encoding?

The function extracts fragments as-is without decoding percent-encoded characters, so you'll need additional processing for decoded values. Use decodeURLComponent() if you need to decode the extracted fragment.

Can I update existing rows to add a fragment column?

Yes, use ALTER TABLE to add a materialized column with fragment(url_column) in standard ClickHouse, or compute fragments dynamically in Tinybird queries. Materialized columns automatically compute values for new inserts while requiring manual updates for existing data.

Skip the infra work. Deploy your first ClickHouse
project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse
project now.

How to extract URL fragments without hash symbols (#) in ClickHouse

How fragment extraction works in ClickHouse

Fragment vs path vs query string

Syntax and minimal example using fragment()

1. Create sample table

2. Run a select using fragment()

3. Verify output without the hash

Handling empty or missing fragments

Coalesce to empty string

Preserve nulls for upstream apps

Performance and storage considerations at scale

Add a materialized column

Use LowCardinality for group by

Benchmark native functions vs regex

When to use regex instead of native URL functions

Pre-20.3 ClickHouse versions

Non-standard URI schemes

Related URL functions developers confuse with fragment()

urlCutFragment

urlCutQueryString

extractURLParameter

Building ClickHouse-based user navigation analytics APIs with Tinybird

1. Create a data source for user navigation events

2. Build navigation analytics pipes

3. Deploy and test your analytics APIs

Next steps with Tinybird for real-time web analytics

FAQs about URL fragment extraction in ClickHouse

What happens if the URL has multiple "#" characters?

Does fragment() validate percent-encoding?

Can I update existing rows to add a fragment column?

Additional resources

ClickHouse documentation

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse project now.

Our Columns:

Skip the infra work. Deploy your first ClickHouse project now.

Skip the infra work. Deploy your first ClickHouse project now.

How to extract URL fragments without hash symbols (#) in ClickHouse

How fragment extraction works in ClickHouse

Fragment vs path vs query string

Syntax and minimal example using fragment()

1. Create sample table

2. Run a select using fragment()

3. Verify output without the hash

Handling empty or missing fragments

Coalesce to empty string

Preserve nulls for upstream apps

Performance and storage considerations at scale

Add a materialized column

Use LowCardinality for group by

Benchmark native functions vs regex

When to use regex instead of native URL functions

Pre-20.3 ClickHouse versions

Non-standard URI schemes

Related URL functions developers confuse with fragment()

urlCutFragment

urlCutQueryString

extractURLParameter

Building ClickHouse-based user navigation analytics APIs with Tinybird

1. Create a data source for user navigation events

2. Build navigation analytics pipes

3. Deploy and test your analytics APIs

Next steps with Tinybird for real-time web analytics

FAQs about URL fragment extraction in ClickHouse

What happens if the URL has multiple "#" characters?

Does fragment() validate percent-encoding?

Can I update existing rows to add a fragment column?

Additional resources

ClickHouse documentation

Skip the infra work. Deploy your first ClickHouse project now.

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse
project now.

Skip the infra work. Deploy your first ClickHouse
project now.