---
title: "Outgrowing Postgres: How to run OLAP workloads on Postgres"
excerpt: "A deep dive into running analytics on Postgres, from basic optimizations to advanced techniques and knowing when to quit."
authors: "Ariel Perez"
categories: "The Data Base"
createdOn: "2025-01-31 00:00:00"
publishedOn: "2025-01-31 00:00:00"
updatedOn: "2025-04-24 00:00:00"
status: "published"
---

<p>I used to think running large-scale analytics workloads on Postgres was like trying to use a screwdriver as a hammer - technically possible but probably not a great idea. But after years of running both OLTP and OLAP workloads on Postgres at scale, I've learned it's much more nuanced than that. Postgres can actually handle analytics workloads quite well, if you know its limits and how to work within them.</p><p>In my previous articles, we looked at <a href="https://www.tinybird.co/blog-posts/outgrowinghandling-growing-data-volumes" rel="noreferrer">handling growing data volumes</a> and <a href="https://www.tinybird.co/blog-posts/outgrowing-postgres-handling-increased-user-concurrency" rel="noreferrer">increasing user concurrency</a>. Analytics workloads add another dimension of complexity - instead of simple lookups and CRUD operations, you're now dealing with complex aggregations across millions or billions of rows, joins across multiple large tables, and queries that need to scan huge chunks of your data.</p><p>The good news? With the right approach, Postgres actually can handle many analytics use cases, from basic reporting to complex business intelligence queries. I've seen it work well for datasets up to several terabytes, providing query response times from seconds to minutes. The bad news? You'll need to make some specific optimizations and architectural choices to get there, and it does require a lot of oversight, planning, and investment. </p><h2 id="understanding-olap-vs-oltp-query-patterns"><strong>Understanding OLAP vs OLTP query patterns</strong></h2><p>Before we get into specific optimizations, let's understand what makes analytics queries fundamentally different from the transactional queries we've discussed in previous articles. Having run both types at scale, I can tell you the differences go way beyond just query complexity.</p><p>Here's what we're dealing with:</p>
<!--kg-card-begin: html-->
<table style="border:none;border-collapse:collapse;"><colgroup><col width="140"><col width="240"><col width="240"></colgroup><tbody><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#cccccc;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Dimension</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#cccccc;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">OLTP Workloads</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#cccccc;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">OLAP Workloads</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Query Type</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Single-record lookups and updates Example: Fetching a user's profile</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Large dataset analysis</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"><br></span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Example: Monthly revenue across all customers</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Query Frequency</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">High (100s-1000s/second)</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Lower, unless user-facing dashboards</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Query Complexity</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Simple, few joins</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Complex, multiple joins and aggregations</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Performance Needs</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Milliseconds or bust</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Seconds OK for batch&lt;br&gt;Must match OLTP speed for user-facing</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Resource Pattern</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">CPU-bound</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"><br></span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Heavy index usage</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">I/O-bound</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"><br></span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Large sequential scans</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Data Access</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">&lt; 1% of rows per query</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Often &gt; 10% of rows per query</span></p></td></tr><tr style="height:0pt"><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;background-color:#efefef;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Index Usage</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Relies heavily on index seeks</span></p></td><td style="border-left:solid #000000 1pt;border-right:solid #000000 1pt;border-bottom:solid #000000 1pt;border-top:solid #000000 1pt;vertical-align:top;padding:4.968pt 4.968pt 4.968pt 4.968pt;overflow:hidden;overflow-wrap:break-word;"><p dir="ltr" style="line-height:1.2;text-align: center;margin-top:12pt;margin-bottom:12pt;"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Often bypasses indexes for full scans</span></p></td></tr></tbody></table>
<!--kg-card-end: html-->
<p>I think this table neatly summarizes why we need different strategies for OLAP workloads in Postgres. When your analytics queries start competing with your OLTP workloads for resources, you'll need to get creative with your approach - which is exactly what we'll cover next.</p><h2 id="postgres-olap-optimization-strategies"><strong>Postgres OLAP optimization strategies</strong></h2><h3 id="separate-your-workloads">Separate your workloads</h3><p>The first rule of running analytics on Postgres: <strong>don't run analytics queries on your primary OLTP database</strong>. This isn't just best practice - it's survival. One complex analytics query scanning millions of rows can bring your entire application to a halt. I've seen it happen, and the incident reviews aren't fun.</p><p>Instead, set up a dedicated analytics replica. But not just any replica - one specifically configured for analytical workloads. Here's what this gets you:</p><ul><li>Complete isolation from production OLTP traffic</li><li>Freedom to tune and optimize specifically for analytics</li><li>Ability to scale your analytics infrastructure independently</li><li>Protection against runaway queries impacting production</li><li>Option to add more read replicas as analytics needs grow</li></ul><p>Here's a basic setup using streaming replication:</p><ol><li>Configure the primary and replicas by updating the postgresql.conf:</li></ol>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAALbAQAAAAAAAABBKUqGk9nLKzhaH5laPRthtcoi1E2w8ORBlKFHNTY30T4rs1ZXedqqgFe1oe55LYaw-umlXRK2I6o63En8fLYuy4dZXvT9C_7xgrQfgzEDcmIg3brh1-jxzaRtgQPC_DP_6sZAWdy_jmjcYjmbWBLSY_cwkskZvtRQxErG4i9pYHukZiMbuhnDj_XbbJ94tdVbauEWuXJq6O35-SUSOIPFVxFjM_Xd4ruz22AYd7ek9MjG25t-b1hBz88j2STLTtV50CvnGYJCBs11pGIPkwX16P2ev-_rNEmLQkOBcravgkYFgfvn2vHSklHMz2b20dZYREdvUiW5hiVjL6sjuCgnzof1mHCFkbis5nuJ7V3l2OXAPJV0aR4UTJFz38WGGwLtqOHhnazbyjabqzuN360oswbfFLVwyVn5UbMGDUiLNUfP2eLazv71U7Xv/embed"></iframe>
<!--kg-card-end: html-->
<ol start="2"><li>Update your application to point to the analytics replicas where appropriate.</li></ol><p>Notice the differences from the suggested settings in the “<a href="https://www.tinybird.co/blog-posts/outgrowing-postgres-handling-increased-user-concurrency"><u>Outgrowing Postgres: Handling increased user concurrency</u></a>” article. The key difference is that you’ll want to optimize these replicas for analytics workloads. Skip to the <strong>Performance tuning for analytics</strong> section below for more find-tuned settings for your replicas.&nbsp;</p><p>Key things you’ll want to watch out for:</p><ul><li><strong>Replication lag</strong>: Analytics queries can cause the replica to fall behind</li><li><strong>Resource contention</strong>: Even on a replica, poorly optimized queries can cause problems</li><li><strong>Data freshness</strong>: Your analytics will be slightly behind production due to replication</li></ul><p>A common question I get is "How much replication lag is acceptable?" For most analytics workloads, a few seconds or even minutes of lag is fine. But if you need real-time analytics, you'll need to either accept higher costs from a more powerful replica setup or start looking at specialized analytics solutions.</p><h3 id="design-your-schema-for-analytics">Design your schema for analytics</h3><p>Your schema design can make or break analytics performance. Let's look at several techniques that can dramatically improve query speed, starting with the basics and moving to more advanced approaches.</p><h4 id="materialized-views-for-pre-aggregation">Materialized views for pre-aggregation</h4><p>If you've read my first article in this series on handling growing data volumes, you're already familiar with materialized views. They're particularly valuable for analytics workloads because they let you pre-calculate expensive aggregations:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250" src="https://snippets.tinybird.co/XQAAAAI9AQAAAAAAAABBKUqGk9nLKwdAX2g-jZn2gsY7xbBt0MO931CQ-Z6r_qtT-HteF-iNB6XZZj5IEtM4J9OAk3VH5e3dzzkxe5QSYyc6MZqG4BgyTI-MbTz2NJ49AqlYSxCrGckm73LT_4_bpYXhEsGvEzXGbEjyXn2Tt32XxZ0di7TwPHFbSoMXuLNa_73WY4RgNIEw96Q79iIeI3-RBrZZTNUnl6GOlmMZcJLbPUThbLgoMWSrHn9xZ-GiimfISc_JbIrpyqRim7AK-rYF5dcGugkz_0YSd6AfnlamBqT_epnjCDGFGNfaKetF55Z5wVH8Gqd51lkKo-AFKxdD9fsfTwg/embed"></iframe>
<!--kg-card-end: html-->
<p>For a more thorough treatment of materialized views, including concurrent refreshes and maintenance strategies, check out the <a href="https://www.tinybird.co/blog-posts/outgrowinghandling-growing-data-volumes#cascadinghierarchical-materialized-views" rel="noreferrer">materialized views section</a> of that article.</p><h4 id="table-partitioning">Table partitioning</h4><p>Partitioning is another fundamental technique we covered in the data volumes article. For analytics workloads, time-based partitioning is particularly effective:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="200" src="https://snippets.tinybird.co/XQAAAALSAAAAAAAAAABBKUqGk9nLKvqe5a9QJjhkk7XL0a_bPUaLvKxqW9ch4LqTp4it2Dn5etkx7E32IOWl924OuHipsw7Ol5_eCt591QJtGUzTJ2vhrbiGbZj6jd8IAujvl-ghvMpWA7N6zNZ_zpEOh01LKYcdB7Wrka8_Oik0hwJ2IRH-eijkuqOTSES9lGUukyaVhvzho0qdJwNib60WSHUhyo0TGVxZ5S0CA6EZhi0SN-KrAnZf_3gEAAA/embed"></iframe>
<!--kg-card-end: html-->
<p>Take a look at the <a href="https://www.tinybird.co/blog-posts/outgrowinghandling-growing-data-volumes#table-partitioning"><u>table partitioning section</u></a> of that article for detailed partitioning strategies and maintenance approaches.</p><h4 id="vertical-partitioning-for-analytics">Vertical partitioning for analytics</h4><p>Here's where things get interesting for analytics workloads. Vertical partitioning - splitting tables by columns rather than rows - can significantly improve analytics performance. While we touched on this in that same data volumes article, it deserves special attention for analytics.</p><p>Consider a wide events table:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAJwAQAAAAAAAABBKUqGk9nLKvkLf2NaXiRAkWjQ66WzjHdc51Rsi1aZ_v4qL_ft0MrugYwai-Wft7Oex7NjiynEGwg3Qg2yrNg5ysFqWvJoeCFT-VIGoQ8dTifOUbH7q39jtrLzGNRUapBE69tCx0QL_X__xEjCjx16r4uKkqIARuhOrzFxNCiiAt9vavVfumihobAwI-Txv4Wm9qNLnd8t0Nz5UFgg3zCM3agXUvtGSwTtCeeLp5g3fjQgbKJ0AWr4KCXtav2XVfVO2wMfLZ-2YUN9vX1EaNCYdNCiTkIyX-a_Labj6Tvt84DwpicVCyIXZcE0qkSueSFEmeb9iFVM/embed"></iframe>
<!--kg-card-end: html-->
<p>You can split this into separate tables based on query patterns:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="450" src="https://snippets.tinybird.co/XQAAAALsAQAAAAAAAABBKUqGk9nLKzhQCDFLbLGg8e4GA5hInOQXuGcPJOkwQZoYe7qJr14WWWH09saA9SFPAuIy68RRArHseTJ-0iZflJnaP7jSVTId1x-NQt9Wy-g8Kn90zp298j0QzdK77331cCrl0_0mXYM6kyLoIXcxqbITihUTw4gTfjq35J1mo3_9qZh7zGKgDpVi_jOJ_6IyAhBnTW03g9DM-CIaElDeyBHg4L8UUTaTJMtcCO-h86oOJI20FgI7X3481MRb0IbIb3gjMHS4LnAq6i-aHCGtM_zXmto-u3IWMBXNNsL2Hxzhp-lmyzJaAMcL8cO3axvthvFbyd_zjWlYnVtmqqgPCf11y_-M5h8A/embed"></iframe>
<!--kg-card-end: html-->
<p>This approach has several benefits for analytics:</p><ul><li>Queries only need to scan the columns they need</li><li>Better cache utilization for frequently accessed columns</li><li>More efficient compression for similar data types</li><li>Ability to use different storage strategies per table</li></ul><p>But - and this is a big but - vertical partitioning isn't free. Here's what you're signing up for:</p><p><strong>Write complexity</strong></p><ul><li>Every insert now requires multiple queries</li><li>Transactions need to span multiple tables</li><li>More complex error handling and rollback scenarios</li><li>Higher chance of constraint violations</li><li>Need for careful transaction management to maintain data consistency</li></ul>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAKMAQAAAAAAAABBKUqGk9nLKzhdgL7Vs5Dz7iSSVKzzBujeo6O4-YXLe4rHc_xZLB1u3LbeTjdhs4L9Fr9ZQVBilC2V0AZFqF1CTnFusCYFE00nDpNEQguv4Q453gjarHhKbxEANWwJcIbmkRHE-Aay8otgQgaTrpIhpCw4Ih1eXBFfrMiuQTDKfEJfoMXG3Bewj5ODfnhnk9PmGBAZKzF6ae8ZTfN9jEBUSTaeku3O_s1IKVEUj2Uc8_d_aZVLB_EqgKUcirJmrtGEjYo5m8wKT4eBhSumcArTi6wGvy3w-YBZvJSfhW5pUsynGw-HSQDXlgSOZMoHOMNPm76384ghGlBK_FvXAaaROFcOCsObQ_zeEcr_r5WwIA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Read complexity</strong></p><ul><li>If you need to read columns across the partitions, simple queries become joins</li><li>Need to maintain indexes across multiple tables</li><li>More complex query planning</li><li>Potential for performance cliffs if joins aren't optimized</li></ul>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAKMAQAAAAAAAABBKUqGk9nLKzhaHqTnZzVPrWoSyLinMm1ZCbMiwKt5Ilwdrz4y9FhW028-1bQ3-IRxpLv2VYkBH10mqVT6wIr6fqOC-1V6AxonHc2I2R-1mF8386n2sTxeAjyoQa81sc7--Jy2aPNRrA0pGs7oodALNW44Cz88KBIVRdeTjvqzrym4AU4JOWuoMfmNoQ3g2mPaoW3mNjYvd25bREtXK4gvp9RwuCzh1oeRmRM3X6rrDNHdEdmmv3OH3hRHfTdvzGDfmuhi0LpjzKxDbStLM0TZ_i6kJGPW6SwYwg5yGUA0Ijw03GSz4JbDs3EcBx4Nb-tgJjl7O1idJ0YdSMCW-H9I7xxXohR9QZgOlmJ0d11ggYTCAPeJXONro_9svkgA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Maintenance headaches</strong></p><ul><li>More tables to manage and monitor</li><li>Complex cleanup for orphaned records</li><li>Harder to modify schema (changes need to be coordinated)</li><li>More complex backup and restore procedures</li><li>Additional foreign key constraints to maintain</li></ul><p>When should you actually use vertical partitioning? In my experience, it works best when:</p><ol><li>Your analytical queries have clear patterns (some columns always used together)</li><li>You have very wide tables where most queries use a small subset of columns</li><li>You have different access patterns for different column groups</li><li>Your write volume is moderate (and you can handle the extra complexity)</li><li>You have columns with very different storage needs (like text vs numeric data)</li></ol><p>If you're just starting out with analytics on Postgres, I'd recommend exhausting simpler options first. Materialized views and regular partitioning can take you quite far with less complexity. But when you're hitting scan performance issues and you see clear patterns in how columns are accessed, vertical partitioning might be worth the trade-offs.</p><h3 id="columnar-storage-with-postgres-plugins">Columnar storage with Postgres plugins</h3><p>Postgres has a powerful extension system that lets you add new capabilities to your database. Think of extensions like browser plugins - they extend functionality but run with the same privileges as Postgres itself. That's both powerful and potentially risky.</p><h4 id="understanding-postgres-extensions">Understanding Postgres extensions</h4><p>Extensions are shared libraries that can:</p><ul><li>Add new data types and functions</li><li>Create new table access methods</li><li>Implement new index types</li><li>Add foreign data wrappers</li></ul><p>They're typically written in C and need to be compiled against your specific Postgres version. While the official Postgres extension repository (PGXN) has some quality controls, you should still evaluate extensions carefully - they run with full database privileges.</p><h4 id="columnar-storage-options">Columnar storage options</h4><p>For analytics workloads, the following columnar storage extensions stand out. However, teams eventually considering dedicated OLAP databases should also evaluate alternatives like ClickHouse®; see our <a href="https://www.tinybird.co/blog-posts/clickhouse-vs-postgresql-with-extensions">detailed comparison of ClickHouse® vs PostgreSQL with extensions</a> to understand performance trade-offs and migration strategies.</p><p><a href="https://github.com/citusdata/cstore_fdw"><strong><u>cstore_fdw</u></strong></a></p><p>The battle-tested option, developed by <a href="https://www.citusdata.com/"><u>Citus Data</u></a> (now part of Microsoft):</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="275" src="https://snippets.tinybird.co/XQAAAAJSAQAAAAAAAABBKUqGk9nLKw8-CqMbwKet5cHOWzs7g0khQZAavTw_U2Zc-30olRfnP5IPLewYIjx6HrrkS1w7mKJjoPsE2ZqI22aPX26eNmFSjBg_S_smvTcz5nDYmREVePbuvqXbuB9rGF3oWy8l74DK19Yn9-vsBBFpRW9RXlSR6THBR3XMpEyihWoo05aj82CJsNHtl6wrtUXS86DZVb7i0FbquV30XA7cPPBiEhBU5aXs7d1fT1j1_f0u4QT-vizzu5pjUon_i7Cd05_q1nXcD4Zk6iAp_7W41roZ6wTwrEEEl5zwcune3kb67Infw2HrUfNDXJ4L2erLe67_yJVWqA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Pros</strong>:</p><ul><li>Mature and stable</li><li>Used in production by many companies</li><li>Good documentation and community support</li><li>Solid compression ratios</li><li>Supports both read and write operations</li><li>Works with standard Postgres backup tools</li></ul><p><strong>Cons</strong>:</p><ul><li>No support for indexes</li><li>Limited to PostgreSQL 12 and older in the open-source version</li><li>Writes are slower than regular tables</li><li>No concurrent write support</li><li>Limited query optimization capabilities</li></ul><p><a href="https://github.com/paradedb/pg_analytics"><strong><u>pg_analytics</u></strong></a> <strong>(formerly pg_lakehouse)</strong></p><p>A DuckDB-powered option by <a href="https://www.paradedb.com/"><u>ParadeDB</u></a> focused on data lake integration:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250" src="https://snippets.tinybird.co/XQAAAAIlAQAAAAAAAABBKUqGk9nLKw8-CqMbwaA5SEjvC6tiQcqEjvQAqUplFONYr9dIchBehMU49u33T4JQRbNGSzPePr9YN3RIkZ9EAXjCEjkLewHwJWVqi5kw1t7omH26bqHgrobj1vP9ROpoFXGI8XqMJsJJx7uSPqqD1-rKkiS5ZANllosBtOF6CrSyEWNcbfJjaX9R5f1oNuj1b1MTmIk4l_WEe86K74KasjROKKnjmcoRpNjnclM5n5-j0PCRfZI3v2w5rjMH1z7GGC94OCDjwyxL4P0T6UaYT7vPC0LXd2gb6KcGoI_X6GU6yFJ7J0w6tdDC_kHizv__49BgAA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Pros</strong>:</p><ul><li>Native integration with data lakes and object stores</li><li>Supports Delta Lake and Apache Iceberg formats</li><li>Query pushdown to DuckDB for performance</li><li>Good compatibility with Postgres types</li></ul><p><strong>Cons</strong>:</p><ul><li>Read-only access currently</li><li>Requires shared_preload_libraries configuration</li><li>Limited write capabilities</li><li>Early in development cycle</li></ul><p><a href="https://github.com/duckdb/pg_duckdb"><strong><u>pg_duckdb</u></strong></a>&nbsp;</p><p>A relative newcomer by <a href="https://hydra.so/"><u>Hydra</u></a> and <a href="https://motherduck.com/"><u>MotherDuck</u></a> that also embeds DuckDB in Postgres: </p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250" src="https://snippets.tinybird.co/XQAAAALTAAAAAAAAAABBKUqGk9nLKvw1wpoF6nx2A7OnQO27H7WkNTz7koIwFR-972vAKFj6zw3swnUWG1MVktbjDOtjhJP_OrPUQAgL9c4a3armlHtl7Pb1DmDG9ahZNZ8AOGzzn62WP3aMIBLWPYGUF9BHOaHQtB4rODT1OF9fN39k9rEJLNtSuVwP6rL66tXStNGj8wm3LTfrWs1DqzrU57f7umMKaV9DCbIaiXsMuJ_49R0-Hexet1jvICuzAaH_M_YAAA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Pros</strong>:</p><ul><li>Leverages DuckDB's high-performance analytics engine</li><li>Can directly query data in S3/object storage</li><li>Supports Parquet, CSV, JSON file formats</li><li>Integrates with MotherDuck cloud service</li><li>Active development from DuckDB team</li></ul><p><strong>Cons</strong>:</p><ul><li>Currently read-only for object storage</li><li>Some Postgres data types not yet supported</li><li>Relatively new (v0.2.0 as of early 2024)</li><li>Limited production deployments</li></ul><p><a href="https://github.com/Mooncake-Labs/pg_mooncake"><strong><u>pg_mooncake</u></strong></a></p><p>The newest kid on the block by <a href="https://www.mooncake.dev/"><u>Mooncake Labs</u></a> combines DuckDB execution with Iceberg/Delta Lake storage and is available on <a href="https://neon.tech/"><u>Neon</u></a>:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250"  src="https://snippets.tinybird.co/XQAAAALcAAAAAAAAAABBKUqGk9nLKv9hzpoF6nx2TqMpdQ4uCUXpElW4F3MgJdvOzJbc2rTpOFEdjGkHVibQOGcx-TtGGPUjsmPJ9FKiNyGXJKnP89TM-MUQuGoIHX77tWe2qXVZ_JFoXmYYDJvuHblmaNyoc35Dp_c5IxgBFJftVo0Z_pkuSWXT0B_noKbUJsY4lBxmvRxQs4ccWEtT9mvxP2IVzMhzsTtSJ8ydROVqZxGmfHVuBzur3TClH8YZ5AH9foXR7P__zUVQAA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Pros:</strong></p><ul><li>Significantly faster analytics queries due to DuckDB's vectorized execution</li><li>Better compression ratios</li><li>Native integration with data lakes (can query Parquet files)</li><li>Modern columnar storage format</li><li>Active development</li></ul><p><strong>Cons:</strong></p><ul><li>Relatively new and less battle-tested</li><li>Limited write capabilities (no UPDATE/DELETE)</li><li>Some Postgres features not supported yet</li><li>More complex setup if using cloud storage</li><li>Smaller community and fewer production deployments</li></ul><h4 id="when-to-use-a-columnar-storage-plugin">When to use a columnar storage plugin</h4><p>There are definitely some risks you need to consider before going down this route<strong>:</strong></p><ol><li><strong>Version Lock-in</strong>: Extensions need to match your Postgres version exactly</li><li><strong>Operational Complexity</strong>: Need to maintain extension updates separately</li><li><strong>Backup Considerations</strong>: Some backup tools may not handle extension data properly</li><li><strong>Performance Overhead</strong>: Extensions can add CPU and memory overhead</li><li><strong>Support Challenges</strong>: Limited commercial support options</li></ol><p>Despite these challenges, columnar storage can be worth it when:</p><ul><li>You have large analytics tables (&gt;100GB)</li><li>Queries typically access a subset of columns</li><li>You need better compression ratios</li><li>Read performance is more important than write performance</li><li>You can manage your own Postgres infrastructure</li></ul><p>If you're on a managed platform that doesn't support these extensions, you might need to:</p><ol><li>Use regular Postgres tables with good indexing strategies</li><li>Consider a separate analytics database</li><li>Look at managed analytics solutions</li><li>Evaluate moving to self-hosted Postgres</li></ol><p>Remember: adding extensions is a significant architectural decision. Start with standard Postgres features first, and only add extensions when you have clear evidence they'll solve specific problems in your workload.</p><p>While these advanced techniques can give you a huge boost in analytics performance, they also add complexity to your system. Start with the basics (materialized views, partitioning, and analytics replicas) and only move to columnar storage when you have clear evidence they'll help your specific workload.</p><h2 id="analytics-query-optimization-techniques">Analytics query optimization techniques</h2><p>Up to this point I’ve focused on infrastructure and schema changes - in many situations however, smart query optimization often beats brute force.</p><h3 id="understanding-analytics-query-patterns">Understanding analytics query patterns</h3><p>Analytics queries are fundamentally different from OLTP queries. They often:</p><ul><li>Scan large portions of tables</li><li>Perform complex aggregations</li><li>Join multiple large tables</li><li>Use window functions for trend analysis</li><li>Need to handle significant intermediate results</li></ul><p>Now I’ll walk you through how to optimize each of these patterns.</p><h3 id="window-functions">Window functions</h3><p><a href="https://www.postgresql.org/docs/current/tutorial-window.html"><u>Window functions</u></a> are incredibly powerful for analytics. They let you perform calculations across sets of rows without the overhead of self-joins or complex subqueries:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="300"  src="https://snippets.tinybird.co/XQAAAAJ6AgAAAAAAAABBKUqGk9nLKwD34_uEsenNAZW7SuY3ifiVLwIgGPQU_410oxzN7QIpFUwkAV7j2aZxVOpvMOSE0FhqVWLLc1_vvOatvI8YUcCSZyhnKdT7hz83Xkl5xlJ1ccynGnrdQUmxUKsvZycWmdjSPn7oEBA2h3nv8M5tN7a9HVLzlbJXDkXzXGIebLzA8TLoQJwVLEGpHyfslQ0cAc8Merj9Kz4pV9wbE2fwWXv9J8Iky1SLH-1vHc9y_-HYHaqlQGflKP_Rh65gngupi_TiPQ_BuKAbucgIgdhi-8MTMxPAr1aWqel7jiBp-rRmHch9RA4UKFHbTl1ls3NrS37tHV2kuGKHThqyoS01VE6j3si_VyZQY-A5ySFb6qkjpoZtFYSzh4lA3RIdCgvYhcUXB7NnAyzJKLYGmtNPd3pweIYYffG0MqgrQEphcIWhccrqkGtR9iKml8FSQrMM2tDgIEHwntupf_8tnNQA/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>Pro tip:</strong> Window functions are processed <em>after</em> aggregations. This means you can combine them with <code>GROUP BY</code>:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250" src="https://snippets.tinybird.co/XQAAAAJwAQAAAAAAAABBKUqGk9nLKwwTxNsXq0eIW_YfUDGkwslLe5hEYK-cA_S9jWiPipYFpoNTjIWet4PAPkM2uC_zDxj1nnvDa2o-iJbR9P8GkbpV6pGMuJDKqlE6vAgWF2pQvvPexMmtDvzFyerr3xDjj-2vru0fQlPBcP7oItRROdudtDMN0An5kThQKinsJKmMI-HnU5RYs0kEzS_UDJwnYHBs8Nj8hEfBiEVGNer5IIoih7GphzTmoNkVlVbn7KoP-omvSMQEmG1Zxmb2aMjgySes9BaHgZT3_CgLI0Bm1gi7p8ysPj4USo7KHSlKOFu84kHlrvyt5wy6cRGAu1G2_zet0AA/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="common-table-expressions-ctes">Common table expressions (CTEs)</h3><p><a href="https://www.postgresql.org/docs/current/queries-with.html"><u>Common Table Expressions (CTEs)</u></a> are often seen as just a way to make queries more readable. But they're also a powerful optimization tool because of how they work in Postgres but, that also makes them a double-edged sword:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="450" src="https://snippets.tinybird.co/XQAAAAImAwAAAAAAAABBKUqGk9nLKw2oBTg59b62fWpvMpJ6DoKbZ5RCPRV6kBMdxwS4Kb8_r0PJud5wVIXfDe2ZcufLEB-qTemhmehOnGV13B3xDyvnZQ3y-UOgo5vIFO_71qnFGYY1Gg-psBLFSy_UuQkGju5PV0EUrmvCYHH4HTfMMnQKiMmUrIXkJDF2pClRqOIO0NyR01piRfO68Ib5IX68QDiWkvrZ903HKcY3e8WSrmZu-GtB91LqKehnXdBzd5Tm1oqKTiHHTHGWJohifNdsQwkp-fHcJ817STiIqoVgXssaXqeShye508TaRhNrSXkG8qjUyMhPmVHW2iYrt7T-CSZP7pdduLO18By6zOaWJcicbvPnqcAJ2qwdTFSE5Q5p_URh76a7-nDcSrssh-QCRQwiDPH7Un8q5ws7EF03YBe-eeiBR_8vg6ecZLwWk_dlcefhbMrY6BVTMiSprJaELJDlIZS-wYCCfIu-iS5Iu6cZlmHRnnFsGtGE6AuwbKP3xu81nW7pFeus8XL9om6V3RvxBAV_4X6zUAvgVT6Gj52MMi0FYG1Psp0vyUi2YdqgyvFosbIb2LbT__PG9KU/embed"></iframe>
<!--kg-card-end: html-->
<p><strong>The other side of the sword:</strong> By default, CTEs in Postgres are materialized - they're calculated once and reused. This is usually great for analytics but can backfire if your CTE returns a large result set that's only used for a small subset of rows.</p><h3 id="parallel-query-execution">Parallel query execution</h3><p>Postgres can parallelize many analytics operations, but you need to tune it right:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="400" src="https://snippets.tinybird.co/XQAAAAJBAgAAAAAAAABBKUqGk9nLKzhQ1zgcvXfa6jn7HgVx48B2a0UR7oNc8jXR5Eitv2WRpwCW553MTqOcEpswNPPEv0IdNNavV_qWVNDMjMqboQA57lwbOEsJ9fxTWndGQA3_PYt-uPWkMorSQbJgAh360yuidgkziIpfezeVeC9mz0FfVvbnPBKvyTyZHygQ3H8lW812U3I6IYg2ITJL8KzJ-h7muwoqn4uXOMqDvY8ZdvBSdEZqlnCuR7DEsct3_ZQedNX5Y_cUYxxn62N6iuMniZp5TWysRRj3ZU1MOPzSUeDACIAPo2tlm32b1RhkEZTGdgvVGWgC2Z9ANLsVyjW1M2s9HEupksO727xBe0naAt9_kly_202TE125UW2T7ptADZcEmbNjysaHNYtmyCYu6DhqLAscUjjm4AzE_BMlRwfyZ5Erw2aHCIgDWlx_XaQ4hELgupDtRXRAoVfvRY0IhCtSwzTUmLEmjcrF1rRlFqtGiiWH_8OH-7j_1DpqAA/embed"></iframe>
<!--kg-card-end: html-->
<p>Some operations that benefit most from parallelism:</p><ul><li>Sequential scans on large tables</li><li>Parallel joins (with appropriate settings)</li><li>Large aggregations</li><li>Window functions</li></ul><p><strong>Keep in mind:</strong> While parallel queries use more total CPU, they can finish much faster. Monitor your system resources carefully when enabling parallelism.</p><h3 id="essential-explain-analysis">Essential EXPLAIN analysis</h3><p>When optimizing analytics queries, <code>EXPLAIN ANALYZE</code> is your best friend. These are some of the specific things to watch for:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAKtAQAAAAAAAABBKUqGk9nLKzhURj1K27VdnKnWojf2V2LMjPp921Blgl4013tuwI9w05BOHdR4yhyHWZBnklRs7OTQS4z1qy-GgY3HZQX_CmGjFXoUdhNcVVbYpe46RioIETH6Fi0kQJrxJVPXDZHK0G3je59JLYJIVsjQnXW0h-MVHG923WM8mEMnc_t0rPS8PYeb8fn6ngSzY0PSdTzce9zhWrBXc6ar3K2j_Jm26IVXPTQRFKugg90vTpGjf3J29YUJK-xOt0QiWbtbfq5QOEeAbxDDIM39Yv0A4SlyjgzYdDiPkX8Ao2gJtqqMN6fv109nMv0nHHVV4yIFFdDikaAnuiTLVqkuR5Q8akOAAtiIVyJ_pqLyurYc13UWZbd_itdrSc20D_655LgEt1-Ydx3RE3XGnmg3weGfDh8jXVxgWg5POqGUgpIpVGEU3KDHuUwSgny-e_rvLDPwWvnlW_16auVa__HhPGA/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="common-analytics-query-pitfalls">Common analytics query pitfalls</h3><ol><li>Unneeded <code>ORDER BY</code> in subqueries</li></ol>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAKuAQAAAAAAAABBKUqGk9nLKw9a3kmHQbSNU5V8sugiFGSsMUmwUlTToxnGYtpgxQBwYLGnU2eTneon4QaEEBOueLGOM8W9NHss6zSpurtvfGYbsmEiV87zi0DLee8od3pHMH1FtaPcza_y_FD6ZRCF5cpBIbU_XE3yJy08X7o9vKyU-vr0kE8wX2oMohwIuU7mbUeOmwjb2YJGzy9RpbBv-zsLjNrhcBQlildegJd-F3EWQYiDSq_GY4vymaZ1D64dGc8k-O5g14mnm7RW-4oT14koeW009jNBO9D5HjtPl9KuceCaYK3XeOaj_kzoQlVwqYuMSrOgw8HMrlvSZReTPP3BK9rxItL_28LU-g/embed"></iframe>
<!--kg-card-end: html-->
<ol start="2"><li>Missing <code>DISTINCT ON</code> for latest values</li></ol>
<!--kg-card-begin: html-->
<iframe width="100%" height="300" src="https://snippets.tinybird.co/XQAAAAKwAQAAAAAAAABBKUqGk9nLKvwu3KDl7uGAzck29-iCKO-D6i8zgCfZvOX1xXd3gkCGKM1VhQVGt2HPWk7ZBo2i1Uecy8uBI7caPwDj9OE9jevwwDY0p5xOx-vFQdmWyLJD1ArNL5-qHg2BM_11PXa8I_SONqLGIcGrZ-QMbDftsuxoVDUivrAN0oHB28Ii5eYHWPQ0sxIjeiKaM9_4rpsUz7g4QHxwKeLxdXbYToFUycEDfedXQSxbUh7ay0RLpSFDUQi7kGR3sWq1iC5eHVIhuEcXnl9hOpdSyjrQBe4wi3hFS9e6CAALr0sck-PAb7gkyWNzagENlXKPvOLYqczH39X5JQuG8D3-koEL/embed"></iframe>
<!--kg-card-end: html-->
<ol start="3"><li>Not using date truncation indexes</li></ol>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAK7AAAAAAAAAABBKUqGk9nLKzhV5EJ71u5GvyHoYrw9m_ASMiutFS_YEnfsklCwp3XpnNr4kAC3bxe5VphsXGKcQ_AtWg2fC1udUCNJ_t3tkp1FzW1HCUbed8MeANwMPTpu2Ga19Jlcgjn5butfSVXtvA32rqRVLqFNjQ37RoW7kzKaW9YCKZY0ib9DOXQkHjfEZeu9cMSZ_R9Rgc3a4v5_9InTY8_cBPnS2L_8jVV0_ncyQA/embed"></iframe>
<!--kg-card-end: html-->
<p>Query optimization is iterative. Start with the basics, measure, then optimize further based on real usage patterns. And always test with realistic data volumes - what works for 1M rows might fall apart at 100M. Lastly, it’s amazing how far you can get by simply following “<a href="https://www.tinybird.co/blog-posts/5-rules-for-writing-faster-sql-queries"><u>The 5 rules of writing faster SQL queries</u></a>.”</p><h2 id="indexing-for-analytics">Indexing for analytics</h2><p><a href="https://www.postgresql.org/docs/current/brin.html"><u>BRIN</u></a> indexes are Postgres’ secret weapon for analytical workloads. Unlike B-tree indexes that track every single row, BRIN (Block Range INdex) creates a tiny index that stores metadata about blocks of data. Think of it like chapter summaries in a book instead of a detailed index of every word.</p><p>For time-series data or any naturally ordered columns (like IDs that only go up), BRIN shines. It's perfect when physical storage order and the indexed column are highly correlated for your data.</p><p>In <a href="https://www.crunchydata.com/blog/postgres-indexing-when-does-brin-win"><u>Crunchy Data's testing</u></a>, on a 42MB table, the B-tree indexes were 21MB while the BRIN indexes were just 24KB! And for large result sets (100K+ rows), BRIN consistently outperformed B-tree indexes.</p><p>Here's how to create one:</p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAK3AAAAAAAAAABBKUqGk9nLKv9uGxOcee_Lhwbc6mSZ0ogsD48R1_i89vGqBwtdJsSbXRQYy1S1lMzxwPlHCpF33UNb-oX-dytKpnHHltx9UJE5wtldF6Y5yt-mapMWhBDtZ9ecAvNOVcKPxgC8TQmHxk3sPV9v-DBb3wy-dcJriLW9iXUnQXAlRZZP6Wmb1jnOSo98v8h4VVPVI_624q-gvClnRouGC1RdUxpV-_976oAA/embed"></iframe>
<!--kg-card-end: html-->
<p>The <code>pages_per_range</code> parameter is key - it defaults to 128 but tuning it can really boost performance. For narrow queries (returning ~100 rows), smaller values like 8 or 16 work better. For broader analytical queries, stick with larger values.</p><p>But BRIN isn't magic. Skip it when:</p><ul><li>Your data is randomly distributed</li><li>You need exact row lookups</li><li>Your queries are highly selective (returning &lt;1% of rows)</li><li>Your table gets lots of <code>UPDATE</code>s in the middle</li></ul><p>The sweet spot? Analytical queries that scan large chunks of naturally ordered data. Bonus points if you're tight on disk space - BRIN indexes are <strong>tiny</strong>.</p><p>Want to check if BRIN makes sense? Look at the correlation between storage order and your column:</p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAK7AAAAAAAAAABBKUqGk9nLKzha8Up1EQpYqibszaLHarOyQIPtxWtuy0QYK4VhhNsLRWloktL5hzvWU_fJsOceCkMqG0S_2WGVXkCSimT6crxBYpVkGC0rxDRA5Cq61gYmPvxwnMFE9SfWK0p96nu2-BB69raj4kaiT4EJ_xgQXuhAEwdxl4LKOGh4-YwdmR28TXJKRkX3x0eivcDj2bKHYukQtM9brq3__04SAAA/embed"></iframe>
<!--kg-card-end: html-->
<p>The closer to 1.0, the better BRIN will perform.</p><h2 id="performance-tuning-for-analytics">Performance tuning for analytics</h2><p>No matter how well you write your queries, your postgres instance needs proper tuning to handle analytics workloads efficiently. While I covered basic configuration in previous articles, analytics workloads have specific needs that require different trade-offs.</p><h3 id="memory-settings">Memory settings</h3><p>Memory configuration is where analytics workloads differ most from OLTP. You'll need larger memory allocations for complex operations, but you'll also need to be careful not to overallocate:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="350" src="https://snippets.tinybird.co/XQAAAALVAgAAAAAAAABBKUqGk9nLKvXy49VaWbyjA7qU3k40kusaKnI4rC6WKsiiJdYHA4cvTF57HTYDtX84FMwu3S-_rJD8ScqatA1XztGh24ML8nqQS9xEHLBJ3ux8mGH2dBjjZixXe85RzqLAXbhglTQElnU8_kgOb4uc7f1FP6r3q3PL4Pu30QtExMjFNJbdehOIrrvgoni_ybttLtPAgrdUVi5M8Duve-tFe4Bk8tr9BtdyFD0ddnUUT4UOQ7DaFxNJkpX7zHQaXlQqYANzLvgOHcr2Dr-gmzj0vMlLg5_1UnEJZ-jV9d5VodoGvQQcyVydYwrwsxJbb2wl_ykQM_5XUiw1-tPpJPfjLdLvwsbI8bzilfY5XPBum3O985qlDIBftfr0wbR2Wd3aRB-tLlolB5HU71VQQydxIbhqEuRUhjL6SKxhNZsY1TuUEQkWWVLclNee79Lm52KF8yr8klJ5sEyz5-oaqlxmGtPpEuBpyqny911rlHFCnbti3F_c2tEX_88xxUb-38GTwTiANUwSKEHjNHbID8oVdF42khjUfMnrgwPx_r4E7A/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="io-configuration">I/O configuration</h3><p>Analytics queries often read large amounts of data. These settings help optimize I/O:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="250" src="https://snippets.tinybird.co/XQAAAALDAQAAAAAAAABBKUqGk9nLKvxCCEDdAELT77m-1tPU07byxpbHbOLMgSovXUxbam7ypeAMpG1MAyy9VLMgfwI_IQziMBzu2mbvtDU99-6WxdtkW93Jm4gk_W1e3x_60UN4cB-iIljZvPvwqqbX7n6nDEy5F3nf2c7v-5uX0cAhpO6wisQZ4G4AeJydfFsWYSipAda6EQAHvdZjldXEut9zMg-AJWghEiArsODrh1Nb9keLtovdKQZthD8Hrgey5AQnb-6QK4XYweMPrYw1sm-yEVCBro4CFM35qt7saEy8ieo75GOi3WZ2tsXYHifGoGiVu1iC_GdcqscGapN0GqqbFvfNtB1Muz6s-9yWULdsLdexz7jWGuQTkSQuMBejVLcVnQDIMqUZ0hph_q1zcg/embed"></iframe>

<!--kg-card-end: html-->
<h3 id="query-planning">Query planning</h3><p>The query planner needs different settings for analytics:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="275" src="https://snippets.tinybird.co/XQAAAAJ5AgAAAAAAAABBKUqGk9nLKwKKcjoAR1cLhO3TbGDiEFUyydWcU2UUXQM-6LSTF5dggsScMdSf-HkiJVtHvxWA3JJR93Lq3ApyixV6N5zob8YgEC7idLIVfyp4rQMDyyU8K1uO1q8-g3ZUinir_lr30M4uk2mP6tPJMYWSDmvSpulxGWFpui9sUjIJUaOpWTWAG80Z3L35OOPtoPg6uowXBDw025IXy5E0xSPMywoUawR3sdLpaj6Z_AmKz2kxg_Rs992S5MIgYDUDfo8a8IWMbzsFCdys5yX7wskBPouXh41DN7u5TgoHRMvdeEPknUEgyVJVb5AmoRELNcMEDL-x26w7yy_1OtkmLCMs446GdzOzs2AXTPv0pDIcsfz_iQcgu-Cnx84s-ICLTOOuMfPFaCC4MGNefkZoB1k2G_qOogk/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="vacuum-settings">Vacuum settings</h3><p>Large analytics tables need different vacuum settings:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="200" src="https://snippets.tinybird.co/XQAAAAJeAQAAAAAAAABBKUqGk9nLKveRfxvq6wMzvvJJGNnWHTDlFKrmgs3OUQSwbPNxOaGEgAZqH3iE-pVl-MuaNooljjUNOot339LLR6RRPlzhj_oeMOR97sWe0m4iF8wSByshfsYrd0rqICgs49DdrCqshgo3DnNc8EGy81BBnKkZIaK2fnWrN1lZxjRYGQjb5xPzxr97k6N8CoZGarGjsUx71tYK5-XZYFRGRnG58pIA4Pb927Tuzot2BXlyJ4-ldnBDdJQXKzv-4PXKLFxXwttlMYHVKxlh6pYBrX9T1_jzSOA/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="monitoring-configuration">Monitoring configuration</h3><p>Monitoring becomes even more critical with analytics workloads:</p>
<!--kg-card-begin: html-->
<iframe width="100%" src="https://snippets.tinybird.co/XQAAAAIPAgAAAAAAAABBKUqGk9nLKvxK_tSpx7GVz8vKN6R97Xh8enKmBjjy7JXjdP2dU4AQ6KMk1ZUWaebIxWETxknil4NxVrFQ2mt60ZdNvnTek61ghGwFeYAiC1KSogog_aB8QoM3svyqjv5VLgJq34Yof34XK97tzzqtxHuSPVnqSkbxjCZd1BgJyIdkx2tbjv_U3v0U6Wzn1HW-DSOR7Na8GQydONXgyJvSLNseeKgtYIdvtwUXRaXp0CbWbiB6FNFXRnSmWTL3afnuZ1x4IrpehMx7YOVDC2PZqMw4qWMMYWHLVRyrIokA_k8oEIVKvGm7wOMXMcmd_rYj5lf3XxUF6Y98RJq7MmE2MUQMlNK7frX4iQSik5sf92F04BOWmjFK_HK2FvrH-IvfG9NmMStnHpT_9hlmBg/embed"></iframe>
<!--kg-card-end: html-->
<h3 id="real-world-example">Real-world example</h3><p>Here's what this might look like in practice. Let's say you have:</p><ul><li>32 CPU cores</li><li>128GB RAM</li><li>NVMe storage</li><li>Mixed analytics workload</li></ul><p>Your configuration might look like:</p>
<!--kg-card-begin: html-->
<iframe width="100%" height="400" src="https://snippets.tinybird.co/XQAAAALjAgAAAAAAAABBKUqGk9nLKzhNjjOGlK0VNKN8oclAOjox4v63Wm-xRjT_8r08n1f51OlgrmKCzjv1VgGb9fw5mTWGjw5mNQSG6vxY6NUvDQi5VkXdC_tgupOgbYpK9HUpEh5BKKNS38aDLId8T350-i_iHNWifuNgLrcWcZPPDo3WWrN0I5Uglg_2iZ9u0Iy2fs6b7aHlO9y5VPAbArkjoAmavnrjHcfmFEHpRJoDzVk71lWrkboiDk_zNXXhzL7JwI6RU5nw1qw4-7vinMZgNzivV3sJVy4U8hdZezA7e4l-Vy6j8zgJLl5N5Zi8W0uEF7Jg1rwFubYc7PxnWmZav4gp-n_OW4Dq6pbo45JfjEAtur-CBNECNs7J8jNdMGqtbYYU4fJZzYGre-NoNR_ujO9Ks4ycQ63yW5uH4QGGXGT_Q8VllVn3RuAZom-mP6dWI0oUeDLB13jDOWa_N7MVbMohjLx1vE9i96zmvKZARHyAAk6NFHI0IHujbgpGnx40ApNIqAD55D6y6vpr2-LqNz4RmFtWClXqBGFyROMzrpjGGc6ybmk5_wCbcrInw-n_N4kGAA/embed"></iframe>
<!--kg-card-end: html-->
<p>Don't just copy these numbers and YOLO it. Monitor your actual memory usage using tools like <a href="https://www.postgresql.org/docs/current/pgstatstatements.html"><u>pg_stat_statements</u></a> and adjust based on your workload. Below I’ll call out some gotchas and guidance for testing your settings.</p><h3 id="configuration-gotchas">Configuration gotchas</h3><p>Watch out for these common issues:</p><ol><li><strong>Memory overallocation</strong><ul><li>Too high work_mem can cause OOM kills</li><li>Remember it's per operation, not per connection</li></ul></li><li><strong>Excessive parallelism</strong><ul><li>Too many parallel workers can cause thrashing</li><li>Monitor CPU usage and adjust</li></ul></li><li><strong>Aggressive vacuum</strong><ul><li>Too aggressive settings can impact query performance</li><li>Monitor vacuum timing and adjust cost limits</li></ul></li><li><strong>Checkpoint tuning</strong><ul><li>Too large max_wal_size can cause long recovery times</li><li>Too frequent checkpoints can impact performance</li></ul></li></ol><h3 id="testing-your-configuration">Testing your configuration</h3><p>Always test configuration changes:</p><ol><li>Start with a test environment</li><li>Benchmark before and after</li><li>Monitor system resources</li><li>Test with realistic data volumes</li><li>Test failure scenarios</li></ol><p>Analytics workloads often reveal configuration problems that OLTP workloads miss. What works for millions of small transactions might fail spectacularly for one large analytical query.</p><h2 id="know-your-limits">Know your limits</h2><p>Even with all these optimizations, you'll eventually hit limits running analytics on Postgres. Watch for:</p><ul><li>Queries taking minutes or longer</li><li>High I/O wait times</li><li>Memory pressure from large sorts/joins</li><li>Vacuum struggling to keep up</li><li>Replication lag on analytics replicas</li></ul><p>When you see these signs, it's time to start thinking about dedicated analytics solutions. But that's a topic for an upcoming article!</p><h2 id="takeaways">Takeaways</h2><p>Postgres can handle many analytics workloads if you:</p><ul><li>Isolate analytics traffic to dedicated replicas</li><li>Optimize your schema and configuration</li><li>Use materialized views strategically</li><li>Use techniques for optimal analytics querying</li><li>Leverage BRIN indexes where they make sense</li><li>Monitor and maintain carefully</li><li>Know when to quit and move to specialized tools</li></ul><p>It's not a perfect analytics database. But with the right approach, it can take you surprisingly far. Just don't forget to keep an eye on those limits - they'll sneak up on you faster than an angry DBA during a production incident.</p><p>In the next article, I’ll walk you through when and how to move your analytics workloads off Postgres. Because sometimes, you really do need a hammer instead of that screwdriver.</p><p><strong>Need to move your analytics off Postgres?&nbsp;</strong></p><p>Tinybird is data infrastructure for software teams. You can query your Postgres tables with SQL and publish queries as dynamic, scalable REST API Endpoints to power in-product reporting, real-time dashboards, and more. <br><br>You can try Tinybird for free with no time limit. <a href="https://www.tinybird.co/signup?utm_source=blog&amp;utm_campaign=Outgrowing%20Postgres" rel="noreferrer">Sign up here</a>.</p>
