PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Our Columns:

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline
Tinybird wordmark
PricingDocs
Bars

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
Sign inSign up
Product []

Data Platform

Managed ClickHouse
Production-ready with Tinybird's DX
Streaming ingestion
High-throughput streaming ingest
Schema iteration
Safe migrations with zero downtime
Connectors
Plug and play Kafka, S3, and GCS

Developer Experience

Instant SQL APIs
Turn SQL into an endpoint
BI & Tool Connections
Connect your BI tools and ORMs
Tinybird Code
Ingest and query from your terminal

Enterprise

Tinybird AI
AI resources for LLMs and agents
High availability
Fault-tolerance and auto failovers
Security and compliance
Certified SOC 2 Type II for enterprise
PricingDocs
Resources []

Learn

Blog
Musings on transformations, tables and everything in between
Customer Stories
We help software teams ship features with massive data sets
Videos
Learn how to use Tinybird with our videos
ClickHouse for Developers
Understand ClickHouse with our video series

Build

Templates
Explore our collection of templates
Tinybird Builds
We build stuff live with Tinybird and our partners
Changelog
The latest updates to Tinybird

Community

Slack Community
Join our Slack community to get help and share your ideas
Open Source Program
Get help adding Tinybird to your open source project
Schema > Evolution
Join the most read technical biweekly engineering newsletter
Back to Blog
Share this article:
Back

The simplest way to count 100B unique IDs: Part 2

How to make a simple counter scale to trillions by using the right count functions paired with pre-aggregations
Scalable Analytics Architecture
Ariel Pérez
Ariel PérezHead of Product & Technology

In my last post, I showed how to build a simple, powerful system to count unique viewers, inspired by Reddit's implementation. But what happens when you start hitting real scale (trillions of views)? At this point, the simple solution must evolve.

Let's jump into these scaling challenges and how to address them.

When does the number of views become problematic?

The simple implementation I shared in Part 1 stores view events sorted by post_id, and our counter filters by post_id, meaning the main scaling challenge comes from the number of views per post. Endpoint performance might degrade due to:

  • Too many events to scan per post (billions of rows)
  • Concurrent queries on popular posts

For example, let's look at some real numbers for posts with different view counts:

  • 10M views = ~57MB of compressed data
  • 100M views = ~565MB of compressed data
  • 1B views = ~5.5GB of compressed data

Even with compression, great indexing, and filtering on post_id, scanning this many views starts adding up:

  • 10M views = ~20 ms
  • 100M views = ~200-400 ms
  • 1B views = ~2-4 seconds

For every 10x increase in views, you can expect query time to increase by 10x. And this is just for a single query. With multiple users checking view counts simultaneously, these times will further increase.

When does uniqExact start showing its limitations?

Scanning millions of views per post isn't even your biggest headache. The real bottleneck happens when you're dealing with large numbers of unique viewers. That's when the uniqExact function starts to crumble. Yes, it gives perfect accuracy, but boy, does it make you pay for it.

The query time is compounded by two factors beyond just scanning rows:

  1. Hash set insertions (scales linearly with unique values)
  2. Memory allocation (also linear, but hits performance cliffs)

As unique viewers increase, the hash set grows and needs to resize more often, causing CPU stalls. But that's just the beginning. The real pain comes when your hash set overflows the L3 cache and spills into RAM. And heaven help you if memory pressure forces swapping to disk. As if that weren't enough, you'll see more hash collisions as your dataset grows, adding yet another tax on performance.

Some real-world numbers

I've seen this so many times with customers. Here's how unique viewer counting actually scales:

Memory Requirements vs. Performance (64-bit viewer_ids)

Unique ViewersMemory UsageStorage LocationQuery Time (10% uniqueness)
1M~16MBCPU L3 Cache10-20ms (zippy performance)
10M~160MBRAM~20-60ms (feeling those cache misses)
100M~1.6GBRAM~2s-5s (heavy memory access)
1B~16GBRAM + Potential Swap~15-20s (database crying)

I think of this as three distinct performance zones:

  1. The L3 Cache Zone (<1M uniques) where everything's fast as lightning
  2. The Memory Zone (1M-100M uniques) which is the "it's fine" zone with gradually declining performance
  3. The Danger Zone (>100M uniques) where performance falls off a cliff.

Even with decent server hardware (32GB RAM), you'll start feeling real pain at around 500M unique viewers. Queries will take forever (well, more than a second), memory errors will wake you up at 3AM, and infrastructure costs make your finance team ask uncomfortable questions. And this is all before considering concurrent queries.

Two paths to optimization

Ok, so how do you count billions of unique viewers per post on a table with trillions of views without breaking the bank? 

1. Approximate counting with uniqCombined64

The simplest optimization is switching from uniqExact to uniqCombined64:

I prefer uniqCombined64 over uniqHLL12. It's not just more modern—it's smarter about how it uses memory. The uniqCombined64 function actually switches between three different counting methods based on your data scale:

  • Array mode (for small cardinalities): Uses a simple array when you have few unique values
  • Hash mode (for medium cardinalities): Switches to a sparse hash set as unique values grow
  • HyperLogLog mode (for large cardinalities): Finally moves to full HLL when dealing with massive scale (if you recall, this is what Reddit implemented in Redis to count views efficiently)

This adaptive behavior means you get better accuracy at lower cardinalities without sacrificing the ability to scale to billions of unique values. By contrast, uniqHLL12 is optimized for large-scale use cases but transitions to HLL sooner, potentially losing some accuracy for smaller datasets.

Why uniqCombined64 wins in most real-world scenarios

  • Higher accuracy at smaller scales without premature approximation
  • Relatively small error ~0.8%, which is surprisingly good for analytics use
  • Constant memory usage (~80KB per aggregation), keeping RAM consumption predictable
  • Scales efficiently up to billions of unique values with little loss of precision
  • MUCH faster than uniqExact (250ms vs 10s for 1B unique values), avoiding expensive hash set memory overhead

Many teams default to uniqExact for perfect accuracy, only to realize that 99.2% accuracy with uniqCombined64 is more than enough.  And they sleep better at night knowing their queries won’t OOM the database.

While this solves the memory problem and query performance improves as a side effect, we might still have the problem of having to scan through billions of views

2. Pre-aggregation with materialized views

When you need exact counts but, more importantly, faster queries, pre-aggregation is your friend:

Materialized view data source:

Materialized view pipe:

Then your API endpoint becomes:

This approach:

  • Maintains exact counting
  • Drastically reduces query time
  • Uses less memory per query due to only counting daily uniques
  • Updates in real-time
  • Trades off some flexibility in time ranges

Combining approaches for maximum scalability

For truly massive scale, you can mix and match these approaches:

  1. Pre-aggregate uniqExact using a materialized view for common time ranges (daily, monthly). You can find out more on how to create rollups with materialized views in this blog post.
  2. Aggregate viewers with uniqCombined64 at query time for arbitrary ranges.
  3. Pre-aggregate with uniqCombined64 on common time ranges (daily, monthly) and fill in the gaps by aggregating at query time with uniqCombined64 over the raw views.

Here's an example of the combined approach that still keeps it relatively simple:

Materialized view data source:

Materialized view pipe:

Endpoint:

When to use what

  1. Start Simple: Begin with uniqExact until you hit performance issues
  2. Quick Fix: Switch to uniqCombined64 when memory becomes a problem
  3. Scale Up: Add pre-aggregation when query performance matters
  4. Go Hybrid: Combine approaches for maximum flexibility

Try it yourself

Want to try these optimizations yourself? Check out the Tinybird documentation to get started.

Do you like this post? Spread it!

Skip the infra work. Deploy your first ClickHouse
project now

Get started for freeRead the docs
A geometric decoration with a matrix of rectangles.
Tinybird wordmark

Product /

ProductWatch the demoPricingSecurityRequest a demo

Company /

About UsPartnersShopCareers

Features /

Managed ClickHouseStreaming IngestionSchema IterationConnectorsInstant SQL APIsBI & Tool ConnectionsTinybird CodeTinybird AIHigh AvailabilitySecurity & Compliance

Support /

DocsSupportTroubleshootingCommunityChangelog

Resources /

ObservabilityBlogCustomer StoriesTemplatesTinybird BuildsTinybird for StartupsRSS FeedNewsletter

Integrations /

Apache KafkaConfluent CloudRedpandaGoogle BigQuerySnowflakePostgres Table FunctionAmazon DynamoDBAmazon S3

Use Cases /

User-facing dashboardsReal-time Change Data Capture (CDC)Gaming analyticsWeb analyticsReal-time personalizationUser-generated content (UGC) analyticsContent recommendation systemsVector search
All systems operational

Copyright © 2025 Tinybird. All rights reserved

|

Terms & conditionsCookiesTrust CenterCompliance Helpline

Related posts

I Built This!
Mar 25, 2025
The simplest way to count 100 billion unique IDs: Part 1
Ariel Pérez
Ariel PérezHead of Product & Technology
1The simplest way to count 100 billion unique IDs: Part 1
Scalable Analytics Architecture
Apr 21, 2025
How to count 100B events: Comparing architectures
Ariel Pérez
Ariel PérezHead of Product & Technology
1How to count 100B events: Comparing architectures
Scalable Analytics Architecture
Apr 08, 2025
Best practices for downsampling billions of rows of data
Paco González
Paco GonzálezData Engineer
1Best practices for downsampling billions of rows of data
Scalable Analytics Architecture
Feb 21, 2025
The perfect data ingestion API design
Javi Santana
Javi SantanaCo-founder
1The perfect data ingestion API design
Scalable Analytics Architecture
Apr 24, 2025
dbt in real-time
Javi Santana
Javi SantanaCo-founder
1dbt in real-time
Scalable Analytics Architecture
May 23, 2025
Building Real-Time Analytics Applications with Redpanda, Iceberg, and Tinybird
Alberto Romeu
Alberto RomeuSoftware Engineer
1Building Real-Time Analytics Applications with Redpanda, Iceberg, and Tinybird
Scalable Analytics Architecture
Feb 19, 2025
I've helped huge companies scale logs analysis. Here’s how.
Paco González
Paco GonzálezData Engineer
1I've helped huge companies scale logs analysis. Here’s how.
Scalable Analytics Architecture
Mar 07, 2025
How to run load tests in real-time data systems
Ana Guerrero
Ana GuerreroData Engineer
1How to run load tests in real-time data systems
Scalable Analytics Architecture
May 20, 2025
Real-Time Analytics on Apache Iceberg with Tinybird
Alberto Romeu
Alberto RomeuSoftware Engineer
1Real-Time Analytics on Apache Iceberg with Tinybird
Scalable Analytics Architecture
Jun 27, 2025
How we made our ingestion pipeline 30% faster with C++ (not Rust, sorry)
Javier Goizueta
Javier GoizuetaSoftware Engineer
1How we made our ingestion pipeline 30% faster with C++ (not Rust, sorry)