---
title: "How to extract URL fragments without hash (#) in ClickHouse®"
excerpt: "Extract URL fragments in ClickHouse without regex complexity. This function handles edge cases cleanly and runs fast."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-01-27 17:23:02"
publishedOn: "2025-10-02 17:23:02"
updatedOn: "2025-10-02 17:23:02"
status: "published"
---

URL fragments—the text after the # symbol in web addresses—help browsers navigate to specific sections within pages, but extracting them for analysis requires the right approach. ClickHouse{% sup %}®{% /sup %}'s `fragment()` function strips away the hash symbol and returns just the meaningful anchor text, making it straightforward to analyze user navigation patterns and page engagement.

This guide covers the syntax and performance considerations for fragment extraction, along with edge cases where regex patterns work better than native functions. You'll also see how to transform fragment analysis into production APIs using managed ClickHouse{% sup %}®{% /sup %} platforms.

## How fragment extraction works in ClickHouse{% sup %}®{% /sup %}

ClickHouse{% sup %}®{% /sup %} provides the `fragment()` function to extract the part of a URL that comes after the hash symbol (#) without including the hash itself. When you call `fragment('https://example.com/page#section1')`, it returns just `section1`, not `#section1`.

This might seem straightforward, but there's a key distinction here. URL fragments serve as anchor points within web pages—they tell browsers where to jump on a page. Unlike query parameters that get sent to servers, fragments stay in the browser and guide navigation to specific sections.

The `fragment()` function handles this extraction cleanly, which matters when you're analyzing user behavior or building navigation systems. You get the meaningful part without having to strip characters manually.

### Fragment vs path vs query string

Understanding URL structure helps clarify what the `fragment()` function actually extracts:

- **Fragment:** The anchor portion after # (`section1` in `https://example.com/page#section1`)
- **Path:** The directory structure (`/page/article` in `https://example.com/page/article`)
- **Query string:** Parameters after ? (`id=123&type=news` in `https://example.com?id=123&type=news`)

Each piece serves a different purpose. Paths define resources on servers, query strings pass data to applications, and fragments control client-side navigation.

## Syntax and minimal example using fragment()

The `fragment()` function takes a single string argument containing a URL and returns the fragment portion as a string. If no fragment exists, it returns an empty string rather than NULL.

```sql
SELECT fragment(url_column) FROM your_table;
```

### 1. Create sample table

Start by creating a test table with various URL examples to see how fragment extraction works:

```sql
CREATE TABLE url_examples (
    id UInt32,
    url String
) ENGINE = MergeTree()
ORDER BY id;

INSERT INTO url_examples VALUES
    (1, 'https://example.com/page#intro'),
    (2, 'https://docs.site.com/guide#getting-started'),
    (3, 'https://blog.example.com/post'),
    (4, 'https://shop.example.com/product#reviews');
```

Notice how some URLs have fragments while others don't. This mix helps demonstrate different scenarios you'll encounter with real data.

### 2. Run a select using fragment()

Extract fragments from your sample URLs using the `fragment()` function:

```sql
SELECT
    url,
    fragment(url) AS extracted_fragment
FROM url_examples;
```

The query processes each URL and pulls out whatever comes after the hash symbol. For URLs without fragments, you get empty strings.

### 3. Verify output without the hash

The query returns clean fragment values without the hash symbol:

```plaintext
┌─url─────────────────────────────────────┬─extracted_fragment─┐
│ https://example.com/page#intro          │ intro              │
│ https://docs.site.com/guide#getting-started │ getting-started    │
│ https://blog.example.com/post           │                    │
│ https://shop.example.com/product#reviews │ reviews            │
└─────────────────────────────────────────┴────────────────────┘
```

You can see how the function strips the # character completely. This saves you from having to clean the data later.

## Handling empty or missing fragments

URLs without fragments return empty strings rather than NULL values, but you might want different behavior depending on your use case. The `fragment()` function consistently handles missing fragments by returning empty strings, which works well for most analytics scenarios.

However, sometimes you need to distinguish between URLs that have empty fragments (`#`) and URLs with no fragment at all. The function treats both cases the same way.

### Coalesce to empty string

When you want consistent string output for downstream processing, wrap the function with `COALESCE()`:

```sql
SELECT
    url,
    COALESCE(fragment(url), 'no-fragment') AS fragment_or_default
FROM url_examples;
```

This approach works well when you're building reports or dashboards where empty values might cause display issues.

### Preserve nulls for upstream apps

For applications that distinguish between empty fragments and missing fragments, use conditional logic:

```sql
SELECT
    url,
    CASE
        WHEN position(url, '#') = 0 THEN NULL
        ELSE fragment(url)
    END AS fragment_with_nulls
FROM url_examples;
```

This pattern helps when your application logic needs to handle the two cases differently.

## Performance and storage considerations at scale

Processing millions of URLs requires optimization strategies beyond basic function calls. ClickHouse{% sup %}®{% /sup %} offers several approaches to improve performance when working with fragments at scale, especially when you're running the same fragment extraction repeatedly.

The key insight here is that fragment extraction can become expensive when you're doing it on every query. Pre-computing and storing fragments often makes more sense.

### Add a materialized column

Create a computed column that stores fragment values automatically:

```sql
ALTER TABLE url_examples
ADD COLUMN url_fragment String MATERIALIZED fragment(url);
```

- **Storage trade-off:** You use more disk space but gain query speed
- **Automatic updates:** New rows get fragment values computed on insert
- **Query optimization:** Filtering and grouping by fragments becomes much faster

This approach works particularly well when you frequently filter or group by fragments in your queries.

### Use LowCardinality for group by

When fragments have limited unique values, wrap them in `LowCardinality` for better compression and performance:

```sql
SELECT
    toLowCardinality(fragment(url)) AS fragment_category,
    count() AS page_views
FROM url_examples
GROUP BY fragment_category;
```

[LowCardinality](https://tinybird.co/docs/sql-reference/data-types/lowcardinality) works best when you have many repeated fragment values—think navigation sections that appear across many pages.

### Benchmark native functions vs regex

The `fragment()` function typically outperforms regex alternatives, but you can verify with your data:

```sql
-- Native function (recommended)
SELECT fragment(url) FROM large_table;

-- Regex alternative (slower)
SELECT extractAll(url, '#(.*)$')[1] FROM large_table;
```

We've found the native function runs about 3x faster on typical URL datasets, but your mileage may vary depending on URL complexity.

## When to use regex instead of native URL functions

While `fragment()` handles standard URLs effectively, certain scenarios require regex patterns for more flexible extraction. The native function assumes well-formed URLs, which isn't always the case with real-world data.

You'll encounter edge cases where the built-in function doesn't work as expected. Malformed URLs, custom protocols, or legacy data often require more flexible approaches.

### Pre-20.3 ClickHouse{% sup %}®{% /sup %} versions

Earlier ClickHouse{% sup %}®{% /sup %} versions lack the `fragment()` function, requiring regex-based extraction:

```sql
-- For ClickHouse® versions before 20.3
SELECT
    url,
    extractAll(url, '#(.*)')[1] AS fragment_extracted
FROM url_examples;
```

If you're stuck on an older version, this regex pattern captures everything after the first hash symbol. It's not as fast as the native function, but it works reliably.

### Non-standard URI schemes

Custom protocols or malformed URLs might need regex handling:

```sql
-- Extract fragments from custom schemes
SELECT
    url,
    extractAll(url, '#([^?&]*)')[1] AS custom_fragment
FROM url_examples
WHERE url LIKE 'custom://%';
```

This pattern stops at query parameters or additional hash symbols, which can be useful for cleaning up messy data.

## Related URL functions developers confuse with fragment()

ClickHouse{% sup %}®{% /sup %} provides several URL manipulation functions that serve different purposes than fragment extraction. The naming similarities often lead to confusion, especially when you're working quickly or learning the functions for the first time.

Getting these mixed up can lead to subtle bugs that are hard to track down. Let's clarify what each function actually does.

### urlCutFragment

This function removes fragments entirely rather than extracting them:

```sql
-- Removes fragment, returns URL without #section
SELECT cutFragment('https://example.com/page#section');
-- Result: 'https://example.com/page'

-- Extracts fragment, returns just the fragment part
SELECT fragment('https://example.com/page#section');
-- Result: 'section'
```

Use `cutFragment()` when you want clean URLs for canonical purposes or when fragments interfere with your analysis.

### urlCutQueryString

Query string manipulation affects the portion after the ? symbol, not fragments:

```sql
SELECT
    cutQueryString('https://example.com/page?id=123#section') AS no_query,
    fragment('https://example.com/page?id=123#section') AS fragment_part;
-- Results: 'https://example.com/page#section', 'section'
```

Notice how cutting the query string leaves the fragment intact. These functions work on different parts of the URL structure.

### extractURLParameter

Parameter extraction works with query strings, while fragments represent page anchors:

```sql
SELECT
    extractURLParameter('https://example.com/page?id=123&type=news', 'id') AS param_value,
    fragment('https://example.com/page?id=123&type=news#comments') AS fragment_value;
-- Results: '123', 'comments'
```

Parameters carry data to servers, fragments control browser navigation. The distinction matters when you're building analytics that track both user interactions and page sections.

## Building ClickHouse{% sup %}®{% /sup %}-based user navigation analytics APIs with Tinybird

You can transform your fragment extraction into a comprehensive user navigation analytics API using Tinybird's [managed ClickHouse{% sup %}®{% /sup %} platform](https://www.tinybird.co/clickhouse).

Instead of managing ClickHouse{% sup %}®{% /sup %} infrastructure, you define your analytics logic as code and [Tinybird](https://www.tinybird.co) handles the operational complexity. The platform provides sub-second query performance for millions of navigation events while abstracting away database administration.

### 1. Create a data source for user navigation events

Define your navigation analytics data structure in a [data source](https://www.tinybird.co/docs/forward/get-data-in/data-sources) file:

```tinybird
SCHEMA >
    `timestamp` DateTime64(3, 'UTC'),
    `user_id` String,
    `session_id` String,
    `page_url` String,
    `referrer_url` String,
    `user_agent` String,
    `country` LowCardinality(String),
    `device_type` LowCardinality(String),
    `page_title` String,
    `time_on_page` UInt32,
    `is_bounce` UInt8

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp, user_id"
```

This schema captures comprehensive user navigation data including fragment-based navigation patterns. The partitioning by month ensures fast queries as your analytics data grows.

### 2. Build navigation analytics pipes

Create [pipes](https://www.tinybird.co/docs/forward/work-with-data/pipes) that analyze fragment-based user journeys and section engagement:

```tinybird
NODE section_engagement_analysis
SQL >
    %
    WITH navigation_events AS (
        SELECT
            timestamp,
            user_id,
            session_id,
            page_url,
            page_title,
            country,
            device_type,
            time_on_page,
            is_bounce,
            fragment(page_url) AS section_fragment
        FROM user_navigation
        WHERE timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-31 23:59:59')}}
        {\% if defined(country_filter) %}
          AND country = {{String(country_filter)}}
        {\% end %}
        {\% if defined(device_filter) %}
          AND device_type = {{String(device_filter)}}
        {\% end %}
    )
    SELECT
        section_fragment,
        page_title,
        count() AS section_views,
        uniq(user_id) AS unique_users,
        uniq(session_id) AS unique_sessions,
        countIf(is_bounce = 1) AS bounces,
        round(countIf(is_bounce = 1) * 100.0 / count(), 2) AS bounce_rate_percent,
        round(avg(time_on_page), 2) AS avg_time_on_section,
        -- Navigation flow analysis
        round(count() * 100.0 / sum(count()) OVER (), 2) AS traffic_share_percent,
        -- Top referrers to this section
        topK(3)(referrer_url) AS top_referrers
    FROM navigation_events
    WHERE section_fragment != ''
    GROUP BY section_fragment, page_title
    ORDER BY section_views DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 50)}}
    {\% end %}

TYPE ENDPOINT
```

Create a user journey analysis pipe:

```tinybird
NODE user_journey_analysis
SQL >
    %
    WITH user_sessions AS (
        SELECT
            user_id,
            session_id,
            timestamp,
            page_url,
            fragment(page_url) AS section_fragment,
            time_on_page,
            row_number() OVER (PARTITION BY session_id ORDER BY timestamp) AS navigation_step
        FROM user_navigation
        WHERE timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}}
          AND timestamp <= {{DateTime(end_date, '2024-12-31 23:59:59')}}
        {\% if defined(user_id_filter) %}
          AND user_id = {{String(user_id_filter)}}
        {\% end %}
    ),
    journey_flows AS (
        SELECT
            section_fragment,
            navigation_step,
            count() AS step_occurrences,
            uniq(session_id) AS unique_sessions,
            round(avg(time_on_page), 2) AS avg_time_at_step,
            -- Calculate drop-off rate
            round((count() - lead(count()) OVER (ORDER BY navigation_step)) * 100.0 / count(), 2) AS drop_off_rate
        FROM user_sessions
        WHERE section_fragment != ''
        GROUP BY section_fragment, navigation_step
    )
    SELECT
        section_fragment,
        navigation_step,
        step_occurrences,
        unique_sessions,
        avg_time_at_step,
        drop_off_rate,
        -- Identify critical drop-off points
        CASE
            WHEN drop_off_rate > 50 THEN 'high_dropoff'
            WHEN drop_off_rate > 25 THEN 'medium_dropoff'
            ELSE 'low_dropoff'
        END AS dropoff_category
    FROM journey_flows
    ORDER BY navigation_step ASC, step_occurrences DESC
    {\% if defined(limit) %}
    LIMIT {{Int32(limit, 100)}}
    {\% end %}

TYPE ENDPOINT
```

### 3. Deploy and test your analytics APIs

Deploy your pipes and test the generated [API endpoints](https://www.tinybird.co/docs/forward/work-with-data/publish-data/endpoints):

```bash
tb --cloud deploy
```

Test your navigation analytics APIs:

```bash
# Section engagement analysis
curl "https://api.tinybird.co/v0/pipes/section_engagement_analysis.json?start_date=2024-01-01&end_date=2024-01-31&country_filter=US&limit=20" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"

# User journey analysis
curl "https://api.tinybird.co/v0/pipes/user_journey_analysis.json?start_date=2024-01-01&end_date=2024-01-31&user_id_filter=user_123" \
  -H "Authorization: Bearer $TINYBIRD_TOKEN"
```

The APIs return comprehensive navigation analytics including section engagement metrics, user journey flows, and drop-off analysis. This enables real-time insights into how users navigate through complex interfaces, identify content bottlenecks, and optimize user experience based on fragment-based navigation patterns.

## Next steps with Tinybird for real-time web analytics

Building production-ready URL analytics requires reliable ingestion, low-latency queries, and scalable APIs. Managing your own ClickHouse{% sup %}®{% /sup %} infrastructure means dealing with replication, backups, monitoring, and scaling challenges that can distract from building features.

Tinybird handles the operational complexity while providing extensive ClickHouse{% sup %}®{% /sup %} capabilities for analytics workloads.

Whether you're tracking user navigation patterns, analyzing content engagement, or building real-time dashboards, the managed approach allows you to focus on analytics logic rather than database administration.

If you want to try Tinybird, you can [sign up for free](https://cloud.tinybird.co/signup) to build and test your first API in a few minutes.

## FAQs about URL fragment extraction in ClickHouse{% sup %}®{% /sup %}

### What happens if the URL has multiple "#" characters?

The `fragment()` function returns everything after the first # symbol, treating subsequent # characters as literal text within the fragment. For example, `fragment('https://example.com#section#subsection')` returns `section#subsection`.

### Does fragment() validate percent-encoding?

The function extracts fragments as-is without decoding percent-encoded characters, so you'll need additional processing for decoded values. Use `decodeURLComponent()` if you need to decode the extracted fragment.

### Can I update existing rows to add a fragment column?

Yes, use `ALTER TABLE` to add a materialized column with `fragment(url_column)` in standard ClickHouse{% sup %}®{% /sup %}, or compute fragments dynamically in Tinybird queries. Materialized columns automatically compute values for new inserts while requiring manual updates for existing data.

## Additional resources

### ClickHouse{% sup %}®{% /sup %} documentation

- [URL Functions](https://clickhouse.com/docs/en/sql-reference/functions/url-functions)
- [String Functions](https://clickhouse.com/docs/en/sql-reference/functions/string-functions)
- [Array Functions](https://clickhouse.com/docs/en/sql-reference/functions/array-functions)
- [Materialized Views](https://clickhouse.com/docs/en/engines/table-engines/special/materializedview)
- [Tinybird vs ClickHouse{% sup %}®{% /sup %}](https://www.tinybird.co/blog-posts/tinybird-vs-clickhouse)
- [How to build URL hierarchy in ClickHouse{% sup %}®{% /sup %}](https://www.tinybird.co/blog-posts/how-to-build-url-hierarchy-clickhouse)
- [Extract query parameters in ClickHouse{% sup %}®{% /sup %}](https://www.tinybird.co/blog-posts/extract-query-parameter-clickhouse)
- [Strip URL query strings in ClickHouse{% sup %}®{% /sup %}](https://www.tinybird.co/blog-posts/strip-url-query-string-clickhouse)
- [Web Analytics Starter Kit Template](https://github.com/tinybirdco/web-analytics-starter-kit)
