---
title: "ClickHouse cybersecurity logs"
excerpt: "Security logs arrive fast, in volume, and need to be queryable the moment they land. ClickHouse handles the ingestion throughput, stores compressed log data efficiently, and runs correlation queries that would take minutes elsewhere in seconds."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-05-28 00:00:00"
publishedOn: "2026-05-28 00:00:00"
updatedOn: "2026-05-28 00:00:00"
status: "published"
---

Security operations run on log data. Firewall events, authentication attempts, endpoint telemetry, cloud audit trails, all of it needs to be ingested fast, stored for months or years, and queried interactively during investigations. The volume is constant and unforgiving.

Traditional SIEM platforms solve this with expensive proprietary storage and query engines. ClickHouse® solves it with open-source columnar storage optimized for exactly this workload. As described on the [ClickHouse cybersecurity page](https://clickhouse.com/industries/cybersecurity), it can "ingest millions of rows per second, handle the most heavily concurrent workloads, and deliver unbeatable query performance for mission-critical and time-sensitive applications." IBM deploys hundreds of ClickHouse servers across Instana and QRadar. Exabeam uses it for scalable, high-performance search across advanced security analytics.

This post covers the schema patterns, detection queries, log enrichment, and operational setup for using ClickHouse as a security log backend.

## Security event schema

The core table needs to handle high insert volume, diverse log sources, and fast time-range and IP-based lookups:

```sql
CREATE TABLE security_events
(
    event_time      DateTime,
    source_system   LowCardinality(String),  -- 'firewall', 'auth', 'endpoint', 'cloud'
    event_type      LowCardinality(String),  -- 'auth_failure', 'port_scan', 'dns_query'
    severity        LowCardinality(String),  -- 'info', 'warning', 'critical'
    source_ip       IPv4,
    dest_ip         Nullable(IPv4),
    dest_port       Nullable(UInt16),
    username        Nullable(String),
    hostname        Nullable(String),
    raw_message     String,
    bytes           Nullable(UInt64),
    action          LowCardinality(String)   -- 'allow', 'deny', 'alert'
)
ENGINE = MergeTree
PARTITION BY toYYYYMMDD(event_time)
ORDER BY (source_system, event_type, event_time, source_ip)
TTL event_time + INTERVAL 90 DAY;
```

`LowCardinality` on `source_system`, `event_type`, `severity`, and `action` is important here. These columns have a bounded set of values and are used heavily in WHERE filters and GROUP BY. Dictionary encoding makes filtering and aggregating on them significantly faster with much better compression.

`TTL event_time + INTERVAL 90 DAY` automatically deletes events older than 90 days, keeping storage bounded without manual cleanup jobs. Adjust to your retention requirements, and see the tiered storage section below for keeping older data on cheaper object storage before deletion.

`IPv4` as the native type (not String) enables IP-range functions like `isIPAddressInRange()` and compresses much better than storing IPs as strings. For IPv6, use the `IPv6` type. ClickHouse stores both native IP types in 4 and 16 bytes respectively, vs. up to 45 bytes for a string representation.

## Firewall log analysis

Firewall logs are high-volume and repetitive, which means ClickHouse's columnar compression handles them exceptionally well. Standard threat queries:

```sql
-- Top blocked source IPs, last 24 hours
SELECT
    source_ip,
    count()                     AS blocked_attempts,
    uniq(dest_ip)               AS distinct_targets,
    uniq(dest_port)             AS distinct_ports,
    round(sum(bytes) / 1e6, 2)  AS total_mb
FROM security_events
WHERE source_system = 'firewall'
  AND action = 'deny'
  AND event_time >= now() - INTERVAL 24 HOUR
GROUP BY source_ip
ORDER BY blocked_attempts DESC
LIMIT 20;

-- Port scan detection: single source hitting many ports in 1 hour
SELECT
    source_ip,
    count()           AS connection_attempts,
    uniq(dest_port)   AS unique_ports,
    min(event_time)   AS first_seen,
    max(event_time)   AS last_seen
FROM security_events
WHERE source_system = 'firewall'
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY source_ip
HAVING unique_ports > 100
ORDER BY unique_ports DESC;
```

Both queries run in milliseconds against hundreds of millions of firewall events because `source_system` is first in the sort key, and `event_time` follows immediately. ClickHouse prunes partitions to the relevant day and then skips granules outside the time range.

## Authentication event correlation

Brute force detection, account takeover attempts, and credential stuffing all show up in authentication logs as statistical anomalies:

```sql
-- Accounts under brute force: >10 failures from same IP in 5 minutes
SELECT
    source_ip,
    username,
    count()                 AS failed_attempts,
    min(event_time)         AS first_attempt,
    max(event_time)         AS last_attempt
FROM security_events
WHERE event_type = 'auth_failure'
  AND event_time >= now() - INTERVAL 5 MINUTE
GROUP BY source_ip, username
HAVING failed_attempts > 10
ORDER BY failed_attempts DESC;

-- Impossible travel: same user auth from 2+ distinct IPs within 1 hour
SELECT
    username,
    uniq(source_ip)         AS distinct_ips,
    count()                 AS auth_count
FROM security_events
WHERE event_type = 'auth_success'
  AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY username
HAVING distinct_ips > 1;
```

For production use, run these as materialized views that continuously update a summary table. The detection query then reads from the summary rather than scanning raw events on every SOC dashboard refresh.

## Multi-stage attack detection with `windowFunnel`

Security correlation rules detect sequences of suspicious events from the same source. ClickHouse's `windowFunnel` function finds ordered sequences within a time window without requiring a separate correlation engine:

```sql
-- Detect port_scan -> auth_failure -> auth_success attack sequence
SELECT
    source_ip,
    windowFunnel(3600)(
        event_time,
        event_type = 'port_scan',
        event_type = 'auth_failure',
        event_type = 'auth_success'
    ) AS attack_stage
FROM security_events
WHERE event_time >= today()
GROUP BY source_ip
HAVING attack_stage >= 2
ORDER BY attack_stage DESC;
```

`attack_stage = 3` means all three steps were observed in order within 1 hour from the same source IP. `attack_stage = 2` means port scan followed by auth failure, a recon-and-probe pattern. This runs in milliseconds across millions of events, without any graph processing layer or external correlation engine.

## Privilege escalation correlation

Multi-step attacks that involve lateral movement require correlating events across different event types. ClickHouse handles this with a self-join:

```sql
SELECT
    e1.username,
    e1.source_ip,
    e1.event_time   AS escalation_time,
    e2.event_time   AS data_access_time,
    e2.raw_message
FROM security_events AS e1
JOIN security_events AS e2
    ON e1.username = e2.username
    AND e2.event_time BETWEEN e1.event_time
        AND e1.event_time + INTERVAL 10 MINUTE
WHERE e1.event_type = 'privilege_escalation'
  AND e2.event_type = 'sensitive_data_access'
  AND e1.event_time >= now() - INTERVAL 24 HOUR;
```

Keep the right-side table (e2) small by filtering aggressively before the join. For very high-volume correlation, pre-aggregate suspicious events into a summary table and join against that rather than the raw event table.

## Log enrichment with GeoIP

Raw security logs contain IP addresses. Investigations need country, ASN, and threat reputation context. ClickHouse supports dictionary-based lookups for enriching log data at query time without denormalizing the base table:

```sql
-- Define a dictionary backed by a GeoIP CSV
CREATE DICTIONARY geoip_dict (
    ip_prefix IPv4,
    country_code String,
    asn UInt32,
    asn_name String
)
PRIMARY KEY ip_prefix
SOURCE(FILE(path '/data/geoip.csv' format 'CSV'))
LAYOUT(IP_TRIE)
LIFETIME(3600);

-- Enrich at query time
SELECT
    source_ip,
    dictGet('geoip_dict', 'country_code', source_ip) AS country,
    dictGet('geoip_dict', 'asn_name', source_ip)     AS asn,
    count() AS events
FROM security_events
WHERE event_time >= now() - INTERVAL 1 HOUR
  AND action = 'deny'
GROUP BY source_ip, country, asn
ORDER BY events DESC;
```

Dictionaries load once into memory and serve lookups in microseconds, making enrichment at query time essentially free. The same pattern works for threat intel IP reputation lists: load a feed of malicious IPs as a dictionary and filter by whether the source IP appears in it.

## Real-time SIEM dashboard via materialized view

For a security operations dashboard that needs to refresh continuously without scanning raw events on every load:

```sql
CREATE TABLE siem_hourly
(
    hour            DateTime,
    source_system   LowCardinality(String),
    event_type      LowCardinality(String),
    severity        LowCardinality(String),
    events          AggregateFunction(count),
    unique_ips      AggregateFunction(uniq, IPv4)
)
ENGINE = AggregatingMergeTree
ORDER BY (hour, source_system, event_type, severity);

CREATE MATERIALIZED VIEW siem_hourly_mv TO siem_hourly AS
SELECT
    toStartOfHour(event_time)   AS hour,
    source_system,
    event_type,
    severity,
    countState()                AS events,
    uniqState(source_ip)        AS unique_ips
FROM security_events
GROUP BY hour, source_system, event_type, severity;
```

Dashboard queries against `siem_hourly` return instantly. Investigation queries that need raw events hit the base table with appropriate time and IP filters. Both paths stay fast. The materialized view updates automatically as new events arrive, so the dashboard always shows current data without polling the raw table.

## Log retention and tiered storage

Security log storage costs are significant at scale. ClickHouse's columnar compression is well-suited to log data, which is repetitive and highly compressible. Typical security log compression ratios are 5-15x depending on field cardinality. A terabyte of raw firewall logs commonly compresses to 70-150GB in ClickHouse.

Use TTL to tier storage across hot, warm, and cold:

```sql
ALTER TABLE security_events MODIFY TTL
    event_time + INTERVAL 30 DAY TO VOLUME 'warm',
    event_time + INTERVAL 1 YEAR TO VOLUME 'cold',
    event_time + INTERVAL 3 YEAR DELETE;
```

Recent events stay on fast NVMe for interactive investigation. Older events move to object storage for compliance retention. Data past your retention policy is deleted automatically. This three-tier approach typically reduces storage cost by 60-80% compared to keeping all data on fast NVMe indefinitely.

## Connecting to Tinybird

For security teams who want ClickHouse-backed log analytics without running a cluster, Tinybird is managed ClickHouse with a [real-time data ingestion](https://www.tinybird.co/blog/real-time-data-ingestion) API and SQL endpoint publishing.

Ship security events via the Events API directly from your log forwarder:

```bash
curl -X POST "https://api.tinybird.co/v0/events?name=security_events" \
  -H "Authorization: Bearer $TB_TOKEN" \
  -d '{"event_time":"2026-05-28T10:00:00Z","source_system":"firewall","event_type":"deny","severity":"warning","source_ip":"192.168.1.100","dest_port":443}'
```

Publish your threat detection queries as HTTP endpoints your SOC dashboard calls directly, with `severity` and `time_range` as parameters. No SIEM license, no separate query layer, no cluster to maintain.

{% cta
  title="Security log analytics without the SIEM price tag"
  text="Tinybird is managed ClickHouse with real-time ingestion and SQL endpoints. Ingest millions of events per second, query in milliseconds."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
