Financial applications live and die on data latency. A fraud signal that arrives 10 seconds after a transaction clears is a missed catch. A risk score that takes 2 seconds to compute is a UX problem on a payment flow. A compliance query that takes 20 minutes to run against 3 years of transaction history is a regulatory risk.
ClickHouse® is built for exactly these workloads: high-throughput ingestion, sub-second queries over billions of rows, and time-series analysis at the scale financial data generates. Neobanks, payments processors, and trading platforms use it as their analytical backbone because it handles both the volume and the latency requirements that traditional warehouses can't meet.
This post covers the schema patterns, fraud detection queries, compliance analytics, payment metrics, and risk scoring patterns that matter for fintech applications.
Transaction event schema
Financial event data is append-only by design. Transactions don't change after they settle, they produce new events (authorizations, settlements, chargebacks, reversals). ClickHouse's MergeTree engine is the right fit:
CREATE TABLE transactions
(
event_time DateTime64(3),
event_type LowCardinality(String), -- 'auth', 'capture', 'refund', 'chargeback'
transaction_id String,
user_id String,
merchant_id String,
merchant_category LowCardinality(String), -- MCC code or category
amount Decimal(18, 4),
currency LowCardinality(String),
status LowCardinality(String), -- 'approved', 'declined', 'pending'
decline_reason LowCardinality(Nullable(String)),
country LowCardinality(String),
device_fingerprint String,
ip_address IPv4,
risk_score Nullable(Float32)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time, event_type);
Partition by month keeps historical queries fast even across years of data. The sort key puts user_id first because most fraud and risk queries filter by user, then time. For merchant-level analytics, a second table sorted by (merchant_id, event_time) avoids full scans.
Fraud detection queries
Fraud detection in ClickHouse combines velocity checks, behavioral anomaly detection, and sequence analysis. All of these run in-database at query time, fast enough to use as signals in a real-time decisioning pipeline.
Card velocity checks: flag users with unusual transaction volume:
SELECT
user_id,
count() AS tx_count_1h,
sum(amount) AS total_amount_1h,
uniq(merchant_id) AS distinct_merchants,
uniq(ip_address) AS distinct_ips,
uniq(country) AS distinct_countries
FROM transactions
WHERE event_type = 'auth'
AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY user_id
HAVING tx_count_1h > 10
OR distinct_countries > 2
OR distinct_ips > 5
ORDER BY tx_count_1h DESC;
Card testing detection: small-amount transactions probing for valid cards before a large purchase:
SELECT
device_fingerprint,
count() AS probe_count,
countIf(amount < 1.00) AS micro_transactions,
countIf(status = 'declined') AS declines,
round(
countIf(status = 'declined') / count() * 100, 1
) AS decline_rate,
sum(amount) AS total_amount
FROM transactions
WHERE event_time >= now() - INTERVAL 30 MINUTE
GROUP BY device_fingerprint
HAVING micro_transactions > 3
OR decline_rate > 50
ORDER BY decline_rate DESC;
Multi-step fraud sequence with windowFunnel. For real-time fraud detection, detecting attack sequences (account takeover attempt, then profile change, then large transfer) in one query:
SELECT
user_id,
windowFunnel(3600)(
event_time,
event_type = 'login_new_device',
event_type = 'profile_change',
event_type = 'large_transfer'
) AS attack_stage
FROM transactions
WHERE event_time >= today()
GROUP BY user_id
HAVING attack_stage >= 2
ORDER BY attack_stage DESC;
attack_stage = 3 is a high-confidence account takeover signal. Stage 2 (new device + profile change) is worth a step-up authentication challenge.
Real-time anomaly detection
For real-time anomaly detection on payment flows, a Z-score approach flags transactions that deviate significantly from a user's historical behavior:
WITH user_stats AS (
SELECT
user_id,
avg(amount) AS avg_amount,
stddevPop(amount) AS stddev_amount
FROM transactions
WHERE event_time >= now() - INTERVAL 90 DAY
AND event_type = 'auth'
GROUP BY user_id
)
SELECT
t.user_id,
t.transaction_id,
t.amount,
s.avg_amount,
round((t.amount - s.avg_amount) / nullIf(s.stddev_amount, 0), 2) AS z_score
FROM transactions AS t
INNER JOIN user_stats AS s USING (user_id)
WHERE t.event_time >= now() - INTERVAL 1 HOUR
AND t.event_type = 'auth'
AND abs((t.amount - s.avg_amount) / nullIf(s.stddev_amount, 0)) > 3
ORDER BY abs(z_score) DESC;
A Z-score above 3 means the transaction amount is more than 3 standard deviations above the user's 90-day average, a meaningful outlier. Combine with velocity and sequence signals for a composite risk score.
Payment analytics
For payment operations teams monitoring authorization rates, latency, and decline reasons in real time:
SELECT
toStartOfMinute(event_time) AS minute,
merchant_category,
count() AS total_auths,
countIf(status = 'approved') AS approved,
countIf(status = 'declined') AS declined,
round(countIf(status = 'approved') / count() * 100, 2) AS auth_rate_pct,
avg(risk_score) AS avg_risk_score
FROM transactions
WHERE event_type = 'auth'
AND event_time >= now() - INTERVAL 1 HOUR
GROUP BY minute, merchant_category
ORDER BY minute DESC, total_auths DESC;
For decline reason breakdown, which informs issuer negotiation and routing decisions:
SELECT
decline_reason,
count() AS count,
round(count() / sum(count()) OVER () * 100, 1) AS pct_of_declines,
sum(amount) AS declined_volume
FROM transactions
WHERE status = 'declined'
AND event_time >= today()
GROUP BY decline_reason
ORDER BY count DESC;
Compliance and audit queries
Compliance reporting requires querying full transaction history, often years of it, under time pressure. ClickHouse's columnar storage handles this well because compliance queries typically access a small subset of columns (timestamp, amount, counterparty) across a wide time range.
AML transaction pattern detection: frequent small transactions to the same counterparty (structuring):
SELECT
user_id,
merchant_id,
count() AS tx_count_30d,
sum(amount) AS total_amount_30d,
avg(amount) AS avg_amount,
max(amount) AS max_single_tx
FROM transactions
WHERE event_time >= today() - 30
AND event_type = 'capture'
GROUP BY user_id, merchant_id
HAVING tx_count_30d >= 10
AND total_amount_30d >= 9000
AND max_single_tx < 1000
ORDER BY total_amount_30d DESC;
The HAVING clause targets structuring behavior: many small transactions summing to just under reporting thresholds.
GDPR data subject access request: all events for a given user across the full history:
SELECT
event_time,
event_type,
transaction_id,
merchant_id,
amount,
currency,
status
FROM transactions
WHERE user_id = {{ String(user_id, required=True) }}
ORDER BY event_time DESC;
Because user_id is first in the sort key, this query reads only the granules for that user regardless of table size. A user with 5 years of transaction history returns in milliseconds even when the table contains billions of rows.
Risk scoring at scale
For portfolio-level risk analysis, ClickHouse computes aggregate risk metrics across millions of accounts in one pass. See real-time data processing for the ingestion patterns that keep this data fresh:
SELECT
toStartOfMonth(event_time) AS month,
merchant_category,
count() AS total_transactions,
sum(amount) AS total_volume,
countIf(status = 'chargeback') AS chargebacks,
round(
sumIf(amount, status = 'chargeback') /
sum(amount) * 100, 4
) AS chargeback_rate_pct,
avg(risk_score) AS avg_risk_score,
quantile(0.95)(risk_score) AS p95_risk_score
FROM transactions
WHERE event_time >= today() - 365
GROUP BY month, merchant_category
ORDER BY month DESC, total_volume DESC;
For real-time risk exposure on open authorizations, a materialized view computes per-merchant exposure continuously:
CREATE MATERIALIZED VIEW merchant_exposure_mv TO merchant_exposure AS
SELECT
toStartOfHour(event_time) AS hour,
merchant_id,
merchant_category,
sumIf(amount, status = 'approved') AS approved_volume,
countIf(status = 'approved') AS approved_count,
sumIf(amount, status = 'chargeback') AS chargeback_volume
FROM transactions
GROUP BY hour, merchant_id, merchant_category;
Data retention and compliance storage
Financial data has regulatory retention requirements that vary by jurisdiction: PSD2 mandates 5 years, SOX requires 7, and some AML regulations require 10 years or more. ClickHouse's TTL rules handle multi-tier retention automatically, keeping hot data on fast storage and cold data on cheap object storage:
ALTER TABLE transactions MODIFY TTL
event_time + INTERVAL 90 DAY TO VOLUME 'warm', -- move to S3-backed warm tier
event_time + INTERVAL 7 YEAR DELETE; -- delete after retention window
For GDPR right-to-erasure requests, ClickHouse supports selective deletion by partition. If your partition key includes user ID (or you use a separate per-user partitioning table), individual user data can be dropped without rebuilding the full table:
-- Drop the partition containing this user's data
-- Works when partitioned by (user_id, toYYYYMM(event_time))
ALTER TABLE transactions DROP PARTITION ('user_123', 202401);
For tables not partitioned by user ID, use lightweight deletes (ClickHouse 22.8+):
DELETE FROM transactions WHERE user_id = 'user_123';
Lightweight deletes mark rows as deleted without immediately removing them from disk. Background cleanup happens asynchronously during merges, so the deletion is instantaneous from the query perspective but storage is reclaimed over time.
ClickHouse's columnar storage compresses data significantly. LZ4 is the default codec, optimized for decompression speed. For long-retention compliance data where storage cost matters over read speed, ZSTD reduces storage further at the cost of slightly higher CPU on reads. In ClickHouse's own benchmark with a 1B-row weather dataset, the default ZSTD(1) configuration in ClickHouse Cloud achieved 32x compression across the full schema:
CREATE TABLE transactions_archive (...)
ENGINE = MergeTree
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_time)
SETTINGS storage_policy = 'cold_storage'
CODEC(ZSTD(3));
Tinybird for fintech
Building enterprise-grade real-time analytics for financial applications means two things simultaneously: the latency that compliance and fraud detection demand, and the reliability and certification posture that financial regulators and enterprise buyers require.
Tinybird is managed ClickHouse and is SOC 2 Type II certified. The fraud detection queries in this post become parameterized HTTP endpoints your decisioning service calls via build real-time APIs on ClickHouse. Resend, an email platform processing 100TB per month on Tinybird, measured 62ms p90 query latency in production without caching.
The ingestion path for fintech workloads: your payment processor emits events to Kafka, the Kafka to ClickHouse connector consumes the topic and writes to Tinybird datasources, and transactions are queryable within seconds of settlement. No custom consumer code, no batching logic to maintain, no separate stream processing layer.
On the serving side, each fraud or risk query becomes a Tinybird Pipe, published as a JSON endpoint with {{ String(user_id, required=True) }} parameter enforcement. Your decisioning service calls the endpoint with the user ID and gets the velocity check, anomaly score, or sequence analysis result in under 100ms, fast enough to block a transaction before it clears. The SQL enforces that each request only sees data for the requested user, with no application-level row filtering required.
For compliance queries, the same pattern applies: write the AML structuring query once as a Pipe, expose it as an endpoint your compliance team's tooling calls, and the results are always current without manual report generation. Tinybird handles cluster scaling, replication, rolling upgrades, and retention policy execution. Your team writes SQL and ships.
