---
title: "The simplest way to count 100B unique IDs: Part 2"
excerpt: "How to make a simple counter scale to trillions by using the right count functions paired with pre-aggregations"
authors: "Ariel Perez"
categories: "Scalable Analytics Architecture"
createdOn: "2025-04-10 00:00:00"
publishedOn: "2025-04-11 00:00:00"
updatedOn: "2025-04-24 00:00:00"
status: "published"
---

<p>In my <a href="https://www.tinybird.co/blog-posts/the-simplest-way-to-count-100-billion-unique-ids-part-1"><u>last post</u></a>, I showed how to build a simple, powerful system to count unique viewers, inspired by <a href="https://www.linkedin.com/posts/stanislavkozlovski_kafka-activity-7301266372326518785-DZFw"><u>Reddit's implementation</u></a>. But what happens when you start hitting <em>real</em> scale (trillions of views)? At this point, the simple solution must evolve.</p><p>Let's jump into these scaling challenges and how to address them.</p><h2 id="when-does-the-number-of-views-become-problematic">When does the number of views become problematic?</h2><p>The simple implementation I shared in Part 1 stores view events sorted by <code>post_id</code>, and our counter filters by <code>post_id</code>, meaning the main scaling challenge comes from the number of views per post. Endpoint performance might degrade due to:</p><ul><li>Too many events to scan per post (billions of rows)</li><li>Concurrent queries on popular posts</li></ul><p>For example, let's look at some real numbers for posts with different view counts:</p><ul><li><strong>10M</strong> views = <strong>~57MB</strong> of compressed data</li><li><strong>100M</strong> views = <strong>~565MB</strong> of compressed data</li><li><strong>1B</strong> views = <strong>~5.5GB</strong> of compressed data</li></ul><p>Even with compression, great indexing, and filtering on <code>post_id</code>, scanning this many views starts adding up:</p><ul><li><strong>10M</strong> views = <strong>~20</strong> ms</li><li><strong>100M</strong> views = <strong>~200-400</strong> ms</li><li><strong>1B</strong> views = <strong>~2-4</strong> seconds</li></ul><p>For every <strong>10x</strong> increase in views, you can expect query time to increase by <em>10x</em>. And this is just for a single query. With multiple users checking view counts simultaneously, these times will further increase.</p><h2 id="when-does-uniqexact-start-showing-its-limitations">When does <code>uniqExact</code> start showing its limitations?</h2><p>Scanning millions of views per post isn't even your biggest headache. The real bottleneck happens when you're dealing with large numbers of unique viewers. That's when the <a href="https://www.tinybird.co/docs/sql-reference/functions/aggregate-functions#uniqexact"><u><code>uniqExact</code></u></a> function starts to crumble. Yes, it gives perfect accuracy, but boy, does it make you pay for it.</p><p>The query time is compounded by two factors beyond just scanning rows:</p><ol><li>Hash set insertions (scales linearly with unique values)</li><li>Memory allocation (also linear, but hits performance cliffs)</li></ol><p>As unique viewers increase, the hash set grows and needs to resize more often, causing CPU stalls. But that's just the beginning. The real pain comes when your hash set overflows the L3 cache and spills into RAM. And heaven help you if memory pressure forces swapping to disk. As if that weren't enough, you'll see more hash collisions as your dataset grows, adding yet another tax on performance.</p><h3 id="some-real-world-numbers">Some real-world numbers</h3><p>I've seen this so many times with customers. Here's how unique viewer counting actually scales:</p><p><strong>Memory Requirements vs. Performance (64-bit <code>viewer_ids</code>)</strong></p>
<!--kg-card-begin: html-->
<table>
  <tbody>
    <tr>
        <th>Unique Viewers</th>
        <th>Memory Usage</th>
        <th>Storage Location</th>
        <th>Query Time (10% uniqueness)</th>
    </tr>
    <tr>
        <td>1M</td>
        <td>~16MB</td>
        <td>CPU L3 Cache</td>
        <td>10-20ms (zippy performance)</td>
    </tr>
    <tr>
        <td>10M</td>
        <td>~160MB</td>
        <td>RAM</td>
        <td>~20-60ms (feeling those cache misses)</td>
    </tr>
    <tr>
        <td>100M</td>
        <td>~1.6GB</td>
        <td>RAM</td>
        <td>~2s-5s (heavy memory access)</td>
    </tr>
    <tr>
        <td>1B</td>
        <td>~16GB</td>
        <td>RAM + Potential Swap</td>
        <td>~15-20s (database crying)</td>
    </tr>
    </tbody>
</table>
<!--kg-card-end: html-->
<p>I think of this as three distinct performance zones: </p><ol><li><strong>The L3 Cache Zone (&lt;1M uniques)</strong> where everything's fast as lightning</li><li><strong>The Memory Zone (1M-100M uniques)</strong> which is the "it's fine" zone with gradually declining performance</li><li><strong>The Danger Zone (&gt;100M uniques)</strong> where performance falls off a cliff.</li></ol><p>Even with decent server hardware (32GB RAM), you'll start feeling real pain at around 500M unique viewers. Queries will take forever (well, more than a second), memory errors will wake you up at 3AM, and infrastructure costs make your finance team ask uncomfortable questions. And this is all before considering concurrent queries.</p><h2 id="two-paths-to-optimization">Two paths to optimization</h2><p>Ok, so how do you count billions of unique viewers per post on a table with trillions of views without breaking the bank?&nbsp;</p><h3 id="1-approximate-counting-with-uniqcombined64">1. Approximate counting with <code>uniqCombined64</code></h3><p>The simplest optimization is switching from <code>uniqExact</code> to <a href="https://www.tinybird.co/docs/sql-reference/functions/aggregate-functions#uniqcombined64"><u><code>uniqCombined64</code></u></a>:</p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAALsAAAAAAAAAABBKUqGk9nLKvRke-EHmX2ZBOjSvy-ymtTDvCPeQX7Tu8Q5Z08JvjVZ5_19qOHFCuWiseeYWczPopvz-K0uTntZyIofdyzP7ILgu8i0oVjye9BlF-Cgofbc4FkYKJ8usBitKN0uWGG089GLVKvlD_FEZR9mkY0DlAKTrRWA_h9B_elilpB24Ov2VpBzz3K7HQA7EoOEH3Vubav9UCx9flCJqmmFxIdtb-F7QNB8JAan3zQeZJ1xjTl_VOX6LccA/embed"></iframe>
<!--kg-card-end: html-->
<p>I prefer <code>uniqCombined64</code> over <a href="https://www.tinybird.co/docs/sql-reference/functions/aggregate-functions#uniqhll12"><u><code>uniqHLL12</code></u></a>. It's not just more modern—it's smarter about how it uses memory. The <code>uniqCombined64</code> function actually switches between three different counting methods based on your data scale:</p><ul><li><strong>Array mode (for small cardinalities)</strong>: Uses a simple array when you have few unique values</li><li><strong>Hash mode (for medium cardinalities)</strong>: Switches to a sparse hash set as unique values grow</li><li><strong>HyperLogLog mode (for large cardinalities)</strong>: Finally moves to full HLL when dealing with massive scale (if you recall, this is what Reddit implemented in Redis to count views efficiently)</li></ul><p>This adaptive behavior means you get better accuracy at lower cardinalities without sacrificing the ability to scale to billions of unique values. By contrast, <code>uniqHLL12</code> is optimized for large-scale use cases but transitions to HLL sooner, potentially losing some accuracy for smaller datasets.</p><h4 id="why-uniqcombined64-wins-in-most-real-world-scenarios">Why <code>uniqCombined64</code> wins in most real-world scenarios</h4><ul><li>Higher accuracy at smaller scales without premature approximation</li><li>Relatively small error <strong>~0.8%</strong>, which is surprisingly good for analytics use</li><li>Constant memory usage (<strong>~80KB</strong> per aggregation), keeping RAM consumption predictable</li><li>Scales efficiently up to billions of unique values with little loss of precision</li><li>MUCH faster than <code>uniqExact</code> (<strong><em>250</em>ms</strong> vs <strong><em>10</em>s</strong> for <strong>1B</strong> unique values), avoiding expensive hash set memory overhead</li></ul><p>Many teams default to <code>uniqExact</code> for perfect accuracy, only to realize that 99.2% accuracy with <code>uniqCombined64</code> is more than enough.&nbsp; And they sleep better at night knowing their queries won’t OOM the database.</p><p>While this solves the memory problem and query performance improves as a side effect, we might still have the problem of having to scan through billions of views</p><h3 id="2-pre-aggregation-with-materialized-views">2. Pre-aggregation with materialized views</h3><p>When you need exact counts but, more importantly, faster queries, pre-aggregation is your friend:</p><p><strong>Materialized view data source:</strong></p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAJ5AQAAAAAAAABBKUqGk9nLKwwfJIvRf5t2ofE3N5NV0evr38c9jpZVWUtCXKW4DkScz4l4Utc58X4zmIlaaWrTYXUM-sckvnR1xS6e57s6LPr5jleJHOjz325809_jTa8QJtATFPS1hZxpz4tuWZvWm6Ik4TsSVT1lcB4I8b7a5LqVdpftcd823vpS_x1ZaOml3r5W6M-NyddiwsrZKpf7poquEHhl4x0MPBCNCM1CWVunCfSo029U5mmXBawVwuo9lXA5EFxDfXh-Cf260m7J3X_xzPFfgBTm3Ru6em--DjWBNr6B0zRSZsGAMpBvnOMWm2FLeBEZKqKZPcXR7NnGPo1Uurp7h3z7NvQe62-prIsxk5BllDwf5_-HmVQA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Materialized view pipe:</strong></p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAK8AQAAAAAAAABBKUqGk9nLKzhUSV5J8wr-r0aHJTEhEpLyNSPy0tOmPhI0DGtLrWF66nrVO825E8h60hw2gm1r0_YW7FwyoMk1CYW9pqnY6BqSMBrHgVAHViUQxAfZbhaAYsb4sDKLgjymZe6hESgONa8GBwed11WuezP9wX1kancQ1YFB8sQey3u_FPeSroazfrnl5Lpu5cg9ZweG0KvdRJbOt6sHuBlbhnb0KsjdsDGsP-WNBnda2XZCp0sU0s7g7kQEBVORJczRE6XfZ5ihB-eGwAKlSGwK5FapXb9hGMuR3P1ctAi-2SHMFq0f5VVAWIwcCclAAkW2Pvet1fCE34k5Zs_D_E-wqdmcgwD-2ExtaxmMLacAmyu5eKxO_4T4qcA/embed"></iframe>
<!--kg-card-end: html-->
<p>Then your API endpoint becomes:</p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAJ-AgAAAAAAAABBKUqGk9nLKwKaJTw0FCN3clWtl6fzAsSmvNKm6xZWrUZzM9MaSgnydrFEPJtiukgDlTFW1kZCpPabIkgWd4d340RLemehzooCLXWSW0WAtDL3O_m2spCGvSY6wrxmWGEmTIEq1pb_qsetK16F-7nCR76DMcf-y3NZczZZnYwBf_HRL34R37PeL8SXw0-EUL6VL0Ihjf02bP_H4uBZn3B3CBYBkQkJhv6eGlM1kyVqbIb5GAS-yon6oOKwaQsOvki9M2qnFZYngc1SPHAP8UD9BpFbg0XEY-TYG1E1oGSA3AJaTVTk-lRKB5Ou_BWr9f12eh6iSeqbN5yZhTjgnk71nlQZYjmAWvUbpwgGXbcs5CDaHTIqaaUJWW_GHtF7p9tQlzBsER-kVmnwCWA0n7VSaV-iOy6j9za1Iwiz8ZAAqgqJ13CIJmkkIq5cFK2al_p2moVPAQ4mTXyzFjcUal13mOyHF0T_98AQ2Q/embed"></iframe>
<!--kg-card-end: html-->
<p>This approach:</p><ul><li>Maintains exact counting</li><li>Drastically reduces query time</li><li>Uses less memory per query due to only counting daily uniques</li><li>Updates in real-time</li><li>Trades off some flexibility in time ranges</li></ul><h2 id="combining-approaches-for-maximum-scalability">Combining approaches for maximum scalability</h2><p>For truly massive scale, you can mix and match these approaches:</p><ol><li>Pre-aggregate <code>uniqExact</code> using a materialized view for common time ranges (daily, monthly). You can find out more on how to <a href="https://www.tinybird.co/blog-posts/roll-up-data-with-materialized-views"><u>create rollups with materialized views in this blog post</u></a>.</li><li>Aggregate viewers with <code>uniqCombined64</code> at query time for arbitrary ranges.</li><li>Pre-aggregate with <code>uniqCombined64</code> on common time ranges (daily, monthly) and fill in the gaps by aggregating at query time with <code>uniqCombined64</code> over the raw views.</li></ol><p>Here's an example of the combined approach that still keeps it relatively simple:</p><p><strong>Materialized view data source:</strong></p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAJ-AQAAAAAAAABBKUqGk9nLKwwfJIvRf5t2ofE3N5NV0evr38c9jpZVWUtCXKW4DkScz4l4Utc58X4zmIlaaWrTYXUM-sckvnR1xS6e57s6LPr5jleJHPWW2VPNjHlClfQMKkXnCQaKkteHZF3xMO1OZR3pJyJAU3hXThqrie-XfPTU0jGbdU1lR2bbW-Yglo-h-AoCdmTD-LOYmbbGnFFzFTLIftHKmrAAxABxMcxhsap5fdcrjqp2eIfvlWF1CVld37B3FnNY3CPPpX180NUPM3ECyccrYEjQUvdc9Yhwo9td7XqKT9KOD4kpd-JMNglH1cnLlv8e2KuQ4RT5n_4v4X0ot8HoSp8gmtFXrXhleDd5brm4DJ9ZYs4GdZv6xAEA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Materialized view pipe:</strong></p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAKwAQAAAAAAAABBKUqGk9nLKzhUSV5J8wr-r0aHJTEhEpLyNSPy0tOmPhI0DGtLrWF66nrVO825E8h60hw2gm1r0_YW7FwyoMk1CYW9pqnY6BqSMBrHgVAHViUQxAfZaQfYYsb4sDKLgjymZe6hESgONa8FI_c_aGii0VRrNYcLztW5MzS2nKM2n8Q6HhGtbb7EVxGAb_4uTi2ApEloiM1Pc08gpWemupO8Hu5J4CKi75s40NGTB5I-ZPvp-EehP0pirZV1nSlUylfAGdKeo3bmXZpvlfCwsmbKb98trQDASN9r0S7PY0BlY6ss1CV9yA4lIe2N62FnKvwP-ztMpEuB_sPc70FWGxNGcipS8SjxnuMBpe467RnXsFn0eCehd_G0i_4A2X4/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Endpoint:</strong></p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAK0DAAAAAAAAABBKUqGk9nLKwRAs-EHosCHNupDSPv2gqf2I3t0L-rgIcVapbAgBS12d1mJLjtPrxL-PiWXgRtVcBUQH8bV8YTh9V8yMw2jC31Y1Nzdqk4s3H6Juo6v7dujiE0kMcwf4wuv5H7FQlI-jo90Np27R7Of3pdTMx8hMNqJyN_sk9CMHpcW5ZFHzGtxvRMQAYddyxMdgrFThZsKs5WJv9rX062bu5DrfMp04Mw7pp1ELpsC3CIFilxKRi1a7xLU1Dh-kzh7XbLwCCue02DoqIV7w-tUoLB6xUbQJRfoSyyHAId424H-NOi306xb3qiPj7n0Yd3MFTeL1k_hvGZogmUewLdofxDXa6n11UF-jhbDELGfn4G9-GkeBqYeMHrL0QZmVGwxyJV4cthp7ZYbjQ3U4-EW0pCoBUFvyYTB46bBoh3tEg195UvMzV5r4Ig3Ekz_AlcZzZFBSJnwXwMCONtx-Bd9a4UAv8wISkvS7DXaAyulqaQgzjh7kU0tx2kJhKWMkWDlXMFm7-jyxlF6Od5nzNb76N9-ucd5UOhAxs109-b_gp27qlsnDuC9f8KsRH_eNVWnzd29hYeGajsg-N1QFPgKjQPiYXeZVwzJq5TUGyJPH2dW8RhESkZssTtRqvTBS1ObXG78iOV8xsgXlio057HpvpZNxae_wzjBnlU6sbrz33RoYHa6sOfKClwzcmc91LH43M8_oloGRArihkP7N7v71LZxLfPAffuiAgoEu76qrBcF8gF3-XEUd_vqUFA4ZfQ9eL3agQ1OcCiV21tTa43QWn3S5MokK-TyeKIY5gOEPE4gU03S-0b3IhKAkK1ViIS3NpFS-TDDkb24QH4UWVPpxl_exDCBjP3-5iMP7xO4gaEZbdmXkecBCkovJluEBLJampzgpk7iGRhAhq4DtbRFMliwDqVHJv_qoVwe/embed"></iframe>
<!--kg-card-end: html-->
<h2 id="when-to-use-what">When to use what</h2><ol><li><strong>Start Simple</strong>: Begin with <code>uniqExact</code> until you hit performance issues</li><li><strong>Quick Fix</strong>: Switch to <code>uniqCombined64</code> when memory becomes a problem</li><li><strong>Scale Up</strong>: Add pre-aggregation when query performance matters</li><li><strong>Go Hybrid</strong>: Combine approaches for maximum flexibility</li></ol><h2 id="try-it-yourself">Try it yourself</h2><p>Want to try these optimizations yourself? Check out the <a href="https://www.tinybird.co/docs/forward"><u>Tinybird documentation</u></a> to get started.</p>
