---
title: "How to extract port numbers from URLs in ClickHouse®"
excerpt: "Learn how to extract port numbers from URLs in ClickHouse® using urlExtractPort() function, handle default ports for HTTP/HTTPS, and build reusable queries."
authors: "Cameron Archer"
categories: "AI Resources"
createdOn: "2025-09-30 17:17:02"
publishedOn: "2025-09-30 17:17:02"
updatedOn: "2025-09-30 17:17:02"
status: "published"
---

When you need to extract a port number from a URL in ClickHouse{% sup %}®{% /sup %}, you use ClickHouse{% sup %}®{% /sup %}'s built-in `port()` function.

Keep in mind that `port()` returns 0 when no explicit port exists, which means HTTP and HTTPS URLs without ports need special handling to return standard values of 80 and 443 respectively.

This guide covers ClickHouse{% sup %}®{% /sup %}'s URL parsing functions, conditional logic for default ports, edge case handling, and performance optimization techniques for large-scale URL analysis.

## Extracting port numbers with built-in functions

ClickHouse{% sup %}®{% /sup %} provides the `port()` function to directly extract port numbers from URLs, returning the port as a UInt16 value when explicitly specified, or 0 when no port is present. This built-in function handles URL parsing efficiently and covers most standard URL formats you'll encounter in log analysis or web traffic data.

The function works consistently across different URL types, though you'll want to understand its behavior with missing ports before building production queries.

### urlExtractPort syntax

The `port()` function (also available as `urlExtractPort()`) takes a single string parameter containing the URL and returns the port number as UInt16:

```sql
SELECT port(url_string) AS extracted_port
```

Here's how it works with different URL formats:

```sql
SELECT
    port('http://example.com:8080/path') AS explicit_port,
    port('https://example.com/secure') AS no_port,
    port('http://localhost:3000') AS localhost_port;

-- Results:
-- explicit_port: 8080
-- no_port: 0
-- localhost_port: 3000
```

Notice that URLs without explicit ports return 0, not NULL. This behavior becomes important when you're building conditional logic around port values.

### Quick example query

When working with real URL data, you'll often have a mix of URLs with and without explicit ports. The function consistently returns 0 for URLs without ports, making it predictable for downstream logic:

```sql
WITH sample_urls AS (
    SELECT arrayJoin([
        'https://api.example.com:443/v1/data',
        'http://localhost:8080/admin',
        'https://secure.site.com/login',
        'http://dev.internal:9000/metrics'
    ]) AS url
)
SELECT
    url,
    port(url) AS extracted_port
FROM sample_urls;
```

This query demonstrates the function's behavior across different scenarios—explicit HTTPS ports, development servers, production URLs without ports, and internal services with custom ports.

## Returning default ports when none are present

Since `port()` returns 0 when no explicit port exists, you'll often want to substitute standard default ports based on the protocol. HTTP typically uses port 80, while HTTPS uses port 443, and you can implement this logic using conditional statements.

The key challenge here is determining which protocol you're dealing with so you can apply the right default.

### Case for http and https

The most reliable approach uses a `CASE` statement that checks both the extracted port and the URL protocol to determine the appropriate default:

```sql
SELECT
    url,
    CASE
        WHEN port(url) = 0 AND protocol(url) = 'https' THEN 443
        WHEN port(url) = 0 AND protocol(url) = 'http' THEN 80
        ELSE port(url)
    END AS port_with_default
FROM your_table;
```

Alternatively, you can use ClickHouse{% sup %}®{% /sup %}'s `multiIf()` function for a more compact syntax:

```sql
SELECT
    url,
    multiIf(
        port(url) != 0, port(url),
        protocol(url) = 'https', 443,
        protocol(url) = 'http', 80,
        port(url)
    ) AS port_with_default
FROM your_table;
```

Both approaches give you the same result, though `multiIf()` can be easier to read when you have multiple conditions to check.

### Generic fallback using coalesce pattern

For more complex scenarios where you want custom defaults or handle additional protocols, you can combine `nullIf()` with `coalesce()` to create flexible fallback logic:

```sql
SELECT
    url,
    coalesce(
        nullIf(port(url), 0),
        multiIf(
            protocol(url) = 'https', 443,
            protocol(url) = 'http', 80,
            protocol(url) = 'ftp', 21,
            8080  -- Generic fallback
        )
    ) AS port_with_custom_default
FROM your_table;
```

This pattern works well when you're dealing with diverse URL types and want different default behaviors for different protocols. The `nullIf(port(url), 0)` converts zero values to NULL, allowing `coalesce()` to fall through to your default logic.

## Handling edge cases and invalid URLs

URL parsing can encounter various edge cases that affect port extraction, particularly with IPv6 addresses, malformed URLs, or URLs containing authentication information. Understanding these scenarios helps you build robust queries that handle real-world data.

Most of these edge cases won't break your queries, but they might return unexpected results if you're not prepared for them.

### IPv6 and user info in authority

IPv6 addresses in URLs require square brackets and can complicate port extraction, but ClickHouse{% sup %}®{% /sup %}'s `port()` function handles them correctly:

```sql
SELECT
    port('http://[2001:db8::1]:8080/path') AS ipv6_port,
    port('https://user:pass@example.com:9000/api') AS userinfo_port,
    port('http://[::1]/local') AS ipv6_no_port;

-- Results:
-- ipv6_port: 8080
-- userinfo_port: 9000
-- ipv6_no_port: 0
```

The function correctly identifies ports even when userinfo (username:password) precedes the hostname, and it properly handles IPv6 literal addresses enclosed in brackets.

### Malformed or empty strings

When working with log data or user input, you'll encounter invalid URLs that could cause issues. The `port()` function returns 0 for malformed URLs, but you might want explicit error handling:

```sql
SELECT
    url,
    CASE
        WHEN url = '' OR url IS NULL THEN NULL
        WHEN protocol(url) = '' THEN 0  -- Likely malformed
        ELSE port(url)
    END AS safe_port_extraction
FROM (
    SELECT arrayJoin(['', 'not-a-url', 'http://valid.com:8080', NULL]) AS url
);
```

This approach distinguishes between legitimately missing ports and malformed input, which can be valuable for data quality monitoring.

## Performance tips and anti-patterns

When extracting ports from large datasets, the choice of extraction method significantly impacts query performance. ClickHouse{% sup %}®{% /sup %}'s built-in URL functions are optimized for speed and correctness, while alternatives like regular expressions carry performance penalties.

The performance difference becomes especially pronounced when you're processing millions of URLs in a single query.

### Avoid regexpExtract for simple ports

While you might be tempted to use regular expressions for port extraction, the built-in `port()` function offers substantial performance advantages:

```sql
-- Slow: regex-based extraction
SELECT regexpExtract(url, ':(\d+)', 1) AS regex_port FROM large_table;

-- Fast: built-in function
SELECT port(url) AS builtin_port FROM large_table;
```

The performance difference becomes pronounced with large datasets because of several factors:

- **Built-in optimization:** Native URL parsing leverages optimized C++ code paths
- **Error handling:** Better handling of edge cases without regex complexity
- **Type safety:** Returns proper UInt16 values instead of strings

### Materialized view on ingestion

For frequently queried port information, consider pre-computing port values during data ingestion rather than extracting them repeatedly at query time:

```sql
CREATE MATERIALIZED VIEW url_analysis_mv
ENGINE = MergeTree()
ORDER BY timestamp
AS SELECT
    timestamp,
    url,
    port(url) AS extracted_port,
    multiIf(
        port(url) != 0, port(url),
        protocol(url) = 'https', 443,
        protocol(url) = 'http', 80,
        port(url)
    ) AS port_with_default
FROM raw_logs;
```

This pattern works particularly well for analytics workloads where the same URL analysis happens repeatedly across different queries. You pay the extraction cost once during ingestion instead of every time you query the data.

## Building a reusable port-extraction view or API endpoint

Creating reusable components for port extraction with default handling eliminates code duplication and ensures consistent logic across your analytics. You can implement this as a standard ClickHouse{% sup %}®{% /sup %} view or leverage [Tinybird's](https://www.tinybird.co) [pipe](https://www.tinybird.co/docs/forward/work-with-data/pipes) architecture for more flexible parameterization.

The choice between views and pipes often depends on whether you need parameterization and how you plan to expose the logic to other systems.

### Creating a helper view in plain ClickHouse{% sup %}®{% /sup %}

A well-designed view encapsulates the port extraction logic and default handling, making it easy to reuse across different queries:

```sql
CREATE VIEW url_port_extractor AS
SELECT
    url,
    port(url) AS raw_port,
    protocol(url) AS url_protocol,
    multiIf(
        port(url) != 0, port(url),
        protocol(url) = 'https', 443,
        protocol(url) = 'http', 80,
        protocol(url) = 'ftp', 21,
        protocol(url) = 'ssh', 22,
        8080
    ) AS port_with_defaults,
    CASE
        WHEN port(url) = 0 THEN 'default'
        ELSE 'explicit'
    END AS port_type
FROM (
    SELECT column_name AS url FROM your_source_table
);
```

This view provides multiple port-related fields that different queries can use as needed, while centralizing the extraction logic. You can then query this view like any other table, and ClickHouse{% sup %}®{% /sup %} will optimize the underlying operations.

### Defining a Tinybird pipe for the same logic

Tinybird pipes offer more flexibility for parameterized port extraction, allowing you to customize default values based on [query parameters](https://www.tinybird.co/docs/forward/work-with-data/query-parameters):

```tinybird
NODE url_port_analysis
SQL >
    %
    SELECT
        url,
        port(url) AS raw_port,
        protocol(url) AS url_protocol,
        multiIf(
            port(url) != 0, port(url),
            protocol(url) = 'https', {{ Int16(https_default, 443) }},
            protocol(url) = 'http', {{ Int16(http_default, 80) }},
            {{ Int16(fallback_default, 8080) }}
        ) AS port_with_defaults
    FROM {{ String(source_table, 'access_logs') }}
    WHERE url IS NOT NULL
    {\% if defined(protocol_filter) %}
        AND protocol(url) = {{ String(protocol_filter) }}
    {\% end %}
```

This pipe accepts parameters for custom default ports and optional protocol filtering, making it adaptable to different use cases while maintaining the core extraction logic. The parameterization means you can reuse the same pipe for different scenarios without duplicating code.

## Exposing the query as an API endpoint with Tinybird

You can deploy your port extraction logic (plus any additional web analytics queries) into a production-ready [API endpoint](https://www.tinybird.co/docs/forward/work-with-data/publish-data/endpoints) that applications can consume directly. Tinybird abstracts infrastructure management for ClickHouse{% sup %}®{% /sup %}, allowing developers to focus on query logic and API design.

Once you have your pipe working locally, getting it deployed as an API takes just a few commands.

### Deploy and test the endpoint

To create an API endpoint from your pipe, you'll need to define it in a `.pipe` file with the `TYPE ENDPOINT` configuration. First, create your pipe file with the endpoint type:

```tinybird
NODE url_port_analysis
SQL >
    %
    SELECT
        url,
        port(url) AS raw_port,
        protocol(url) AS url_protocol,
        multiIf(
            port(url) != 0, port(url),
            protocol(url) = 'https', {{ Int16(https_default, 443) }},
            protocol(url) = 'http', {{ Int16(http_default, 80) }},
            {{ Int16(fallback_default, 8080) }}
        ) AS port_with_defaults
    FROM {{ String(source_table, 'access_logs') }}
    WHERE url IS NOT NULL
    {\% if defined(protocol_filter) %}
        AND protocol(url) = {{ String(protocol_filter) }}
    {\% end %}

TYPE ENDPOINT
```

Then deploy your pipe to Tinybird:

```bash
# Deploy the pipe as an API endpoint
tb --cloud deploy
```

After deployment, you can test the endpoint with curl to verify it handles different URL formats correctly:

```bash
# Test with various URL types
curl "https://api.tinybird.co/v0/pipes/port_extractor_api.json" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -d "source_table=access_logs&https_default=8443"
```

Tinybird's [managed service for ClickHouse{% sup %}®{% /sup %}](https://www.tinybird.co/clickhouse) reduces the need for manual infrastructure management. You can monitor API usage and performance through Tinybird's built-in observability features.

## FAQs about extracting port numbers in ClickHouse{% sup %}®{% /sup %}

### What value does urlExtractPort return when the URL has no port?

The `port()` function returns 0 (zero) as a UInt16 value when no explicit port is specified in the URL. This differs from returning NULL, which means you can safely use the result in mathematical operations or comparisons without additional null checking.

### Does urlExtractPort support IPv6 literals?

Yes, the function correctly handles IPv6 addresses enclosed in square brackets within URLs. For example, `port('http://[2001:db8::1]:8080/path')` returns 8080, while `port('http://[::1]/local')` returns 0 since no port is specified.

### Can I filter rows using urlExtractPort in a where clause?

Yes, you can use the `port()` function in WHERE clauses, but consider the performance implications for large datasets. For frequently filtered port values, consider creating a materialized column or using a materialized view to pre-compute port information during data ingestion rather than extracting it repeatedly at query time.
