These are the main options for PostgreSQL vertical scaling:
- Hardware and config tuning — maximize what the current server delivers
- Connection pooling + query optimization — reduce resource contention without hardware changes
- Analytical query offload — CDC to ClickHouse® via Tinybird for queries Postgres cannot serve at scale
PostgreSQL vertical scaling means extracting more performance from a single server before distributing load across multiple nodes. Tuning PostgreSQL configuration to use available hardware efficiently is almost always the right first step before introducing distributed infrastructure.
But vertical scaling has hard limits. Analytical queries that scan billions of rows are structurally expensive regardless of hardware. Knowing when you have hit the ceiling — and what to do next — is the real engineering judgment required.
Before choosing an approach, identify the root cause: resource underutilization (conservative defaults not using available hardware), connection saturation (too many idle connections), or query patterns (analytical scans that no hardware can make fast). Profile with pg_stat_statements to confirm.
Three ways to implement PostgreSQL vertical scaling
Option 1: Hardware upgrade + configuration tuning
The most direct approach. PostgreSQL ships with conservative default configuration designed for minimal hardware. On production servers with 32–256 GB RAM and fast NVMe storage, the defaults leave most resources idle.
How it works: upgrade server hardware (more RAM, faster NVMe, more CPU cores), then recalibrate postgresql.conf to use those resources. Key parameters directly affect query performance and concurrency.
Critical postgresql.conf parameters to tune:
# postgresql.conf — tuned for a 64 GB RAM server
# Memory
shared_buffers = 16GB # 25% of RAM — PostgreSQL's own buffer cache
effective_cache_size = 48GB # 75% of RAM — hint to query planner for OS cache
work_mem = 64MB # per-sort/hash, multiply by active queries
maintenance_work_mem = 2GB # for VACUUM, CREATE INDEX, pg_restore
# WAL / checkpoint
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB
min_wal_size = 1GB
# Planner
random_page_cost = 1.1 # for NVMe; default 4.0 is tuned for spinning disk
effective_io_concurrency = 200 # for NVMe; 2 for spinning disk
# Autovacuum — more aggressive on busy tables
autovacuum_max_workers = 6
autovacuum_vacuum_cost_delay = 2ms
autovacuum_vacuum_scale_factor = 0.01
Check current resource utilization before tuning:
-- Top queries by total execution time (requires pg_stat_statements)
SELECT query, calls, total_exec_time::bigint AS total_ms,
mean_exec_time::bigint AS mean_ms,
rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- Current buffer cache hit rate (target > 99%)
SELECT round(100.0 * sum(heap_blks_hit) /
nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) AS hit_rate_pct
FROM pg_statio_user_tables;
When this fits:
- Your server has remaining capacity (CPU not consistently above 80%, and
pg_stat_statementsshows no obvious index or config gaps yet) - Buffer cache hit rate is already below its potential (e.g.,
shared_buffersis at the conservative default and would benefit from tuning for your actual RAM) - Queries are OLTP-style: point lookups, small range scans, indexed access
- You want the simplest path to more capacity without new infrastructure
Trade-offs: hardware upgrades have a cost ceiling — a 256 GB RAM server costs much more than a 64 GB server and provides limited incremental benefit beyond a certain point. Analytical queries scanning millions of rows are slow not because of RAM but because of PostgreSQL's row-oriented storage model. No hardware upgrade fixes a structural query mismatch.
Prerequisites: server access for hardware changes, PostgreSQL 14+, pg_stat_statements extension enabled.
Option 2: Connection pooling + query optimization
The second most common cause of PostgreSQL slowness is connection saturation — too many connections holding memory and creating contention — combined with unoptimized queries that execute more work than necessary.
How it works: add PgBouncer in transaction mode to cap the number of actual PostgreSQL server connections regardless of how many clients connect. Then systematically optimize slow queries using EXPLAIN ANALYZE and targeted indexing.
PgBouncer in transaction mode:
# pgbouncer.ini
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb
[pgbouncer]
pool_mode = transaction
max_client_conn = 2000
default_pool_size = 50 # actual connections to PostgreSQL
server_pool_size = 100
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
With transaction pooling, 2000 client connections share 50 actual PostgreSQL server connections — each connection uses ~5–10 MB of server memory, so this significantly reduces memory pressure.
Optimize a slow query with EXPLAIN ANALYZE:
-- Identify a slow query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT u.id, u.email, count(o.id) AS order_count, sum(o.amount) AS total_spent
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE o.created_at >= now() - INTERVAL '90 days'
AND u.status = 'active'
GROUP BY u.id, u.email
ORDER BY total_spent DESC
LIMIT 100;
Add targeted indexes for common patterns:
-- Partial index for active users (reduces index size)
CREATE INDEX CONCURRENTLY idx_users_active ON users (id, email)
WHERE status = 'active';
-- Covering index for order lookup by user + date (avoids heap fetch)
CREATE INDEX CONCURRENTLY idx_orders_user_date ON orders (user_id, created_at DESC)
INCLUDE (amount, status);
-- Partial index for recent orders only
CREATE INDEX CONCURRENTLY idx_orders_recent ON orders (created_at DESC, user_id)
WHERE created_at >= '2025-01-01';
When this fits:
- Connection counts are high (above 200 server connections) and memory is under pressure
pg_stat_statementsshows specific queries with hightotal_exec_timethat lack appropriate indexes- You want query-level improvements without hardware changes or new infrastructure
Trade-offs: PgBouncer transaction mode is incompatible with prepared statements in session mode and some PostgreSQL features (advisory locks, SET LOCAL). Query optimization has diminishing returns — every table gets its indexes, and the next bottleneck is query volume or data size. This approach does not help if queries are slow because they scan large amounts of data, not because they lack indexes.
Prerequisites: PgBouncer 1.18+, pg_stat_statements enabled, auto_explain for identifying slow queries in production logs.
Option 3: Analytical query offload — CDC to ClickHouse® via Tinybird
When vertical tuning reaches its limit, the remaining bottleneck is usually analytical queries that are structurally expensive in a row-oriented engine. Aggregations over millions of rows, dashboard queries scanning time ranges, and reporting pipelines compete with OLTP transactions for the same CPU and I/O — and no amount of vertical scaling eliminates that competition.
How it works: enable logical replication on PostgreSQL, stream changes into Tinybird via CDC, and route analytical queries to Tinybird's ClickHouse®-backed engine instead of hitting PostgreSQL. The OLTP database keeps serving transactional workloads; analytical queries move to an engine designed for them.
Enable logical replication and create a publication:
-- postgresql.conf: wal_level = logical (requires restart)
-- Verify WAL level
SHOW wal_level;
-- Create publication for analytical tables
CREATE PUBLICATION analytics_pub FOR TABLE orders, events, page_views, sessions;
-- A CDC connector (e.g. Debezium, Estuary, or Sequin) consumes this slot
-- and forwards change events to Tinybird via Kafka or HTTP ingest
SELECT pg_create_logical_replication_slot('tinybird_analytics', 'pgoutput');
ClickHouse® schema in Tinybird for the events table:
CREATE TABLE events_raw (
event_id UInt64,
user_id UInt64,
session_id UInt64,
event_type LowCardinality(String),
page_path String,
event_time DateTime,
updated_at DateTime
)
ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(event_time)
ORDER BY (user_id, event_id)
Analytical materialized view:
CREATE MATERIALIZED VIEW events_hourly_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (event_type, hour)
AS SELECT
toStartOfHour(event_time) AS hour,
event_type,
countState() AS total_events_state,
uniqState(user_id) AS unique_users_state,
uniqState(session_id) AS unique_sessions_state
FROM events_raw
GROUP BY hour, event_type
Query: SELECT hour, event_type, countMerge(total_events_state) AS events, uniqMerge(unique_users_state) AS users FROM events_hourly_mv WHERE hour >= today() - 7 GROUP BY hour, event_type ORDER BY hour.
When this fits:
- Vertical tuning is done and the remaining slow queries are analytical scans — aggregations, time-range reports, dashboard queries
- You need sub-second response times on those queries under concurrent load from dashboards or APIs
- You also need the same analytical data accessible via APIs for user-facing analytics
Trade-offs: adds CDC infrastructure and Tinybird ingestion pipeline. Data reflects PostgreSQL changes with a short lag (typically seconds). Requires modeling analytical queries as Tinybird Pipes. Does not help with OLTP write or read pressure directly — it removes analytical overhead.
Prerequisites: PostgreSQL 13+ with wal_level = logical, a CDC connector (Debezium + Kafka, Estuary Flow, or Sequin), Tinybird account.
Summary: picking the right option
| Criterion | Hardware + config | Connection pooling + indexes | Analytical offload (Tinybird) |
|---|---|---|---|
| Best for | Underutilized hardware | Connection saturation + slow indexed queries | Analytical scans competing with OLTP |
| Setup complexity | Low | Low | Medium |
| Ops overhead | Low | Low | Low (managed) |
| Sub-second analytics | No | No | Yes |
| Hardware ceiling | Yes | Yes | No (separate engine) |
| API reuse | No | No | Yes |
Decision framework: what to choose for PostgreSQL vertical scaling
Apply these in sequence — they are not mutually exclusive:
- Start with hardware + config tuning if you have not tuned
shared_buffers,work_mem,autovacuum, orrandom_page_costfor your actual hardware. This is almost always under-done and provides immediate gains. - Add PgBouncer + index optimization if connection counts are high, memory is under pressure, or
pg_stat_statementsshows high-execution-time queries without appropriate indexes. - Add analytical offload when tuning is complete and the remaining slow queries are analytical scans. This is the lever that breaks through the vertical ceiling for analytical workloads — not by adding hardware, but by removing that workload type from PostgreSQL entirely.
Bottom line: most PostgreSQL performance problems can be fixed with proper config + indexing. But when analytical query interference remains after tuning, ClickHouse® via Tinybird is the cleanest path — it removes the root cause rather than scaling around it.
What is PostgreSQL vertical scaling and when does it matter?
PostgreSQL vertical scaling means improving performance on a single database server — adding hardware and tuning configuration to use those resources effectively. It is the first and most cost-effective scaling approach before introducing distributed architecture.
Most PostgreSQL performance problems are caused by misconfiguration, connection overhead, or unoptimized queries — not true hardware limits. Fixing these first extracts the most leverage with the least complexity.
Vertical scaling has a ceiling, though. OLTP vs OLAP workload mixing means no single server can serve high-frequency transactional writes and analytical aggregations over billions of rows without one affecting the other. At that point, workload separation — not bigger hardware — is the right move. The goal after tuning is predictable low latency for both OLTP and analytical query tiers.
Schema and pipeline design
Practical schema rules for vertical scaling
Rule 1: use partial indexes for filtered queries. A partial index on WHERE status = 'active' is smaller and faster than a full index — if 5% of rows match, the partial index is 20x smaller.
Rule 2: avoid SELECT * in application queries. Explicit column selection reduces I/O and shared_buffers pressure, especially on wide tables.
Rule 3: tune autovacuum per table. High-write tables need autovacuum_vacuum_scale_factor = 0.01 and lower autovacuum_vacuum_cost_delay — defaults are designed for average load.
Rule 4: set CONNECTION LIMIT per role. Prevents a runaway process from exhausting all connections before PgBouncer intervenes.
Example: high-write table with per-table autovacuum tuning
-- High-write events table with custom autovacuum settings
CREATE TABLE events (
event_id BIGSERIAL PRIMARY KEY,
user_id BIGINT NOT NULL,
event_type TEXT NOT NULL,
payload JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
) WITH (
autovacuum_vacuum_scale_factor = 0.01, -- vacuum after 1% dead tuples (default 20%)
autovacuum_analyze_scale_factor = 0.005, -- analyze after 0.5%
autovacuum_vacuum_cost_delay = 2, -- less delay between vacuum pages
fillfactor = 80 -- leave 20% free space for HOT updates
);
-- Partial covering index for recent events (the most queried).
-- Note: static date predicates require periodic recreation as your "recent" window shifts.
-- For long-lived tables, prefer time-based partitioning instead of a fixed partial index.
CREATE INDEX CONCURRENTLY idx_events_recent_user
ON events (user_id, created_at DESC)
INCLUDE (event_type)
WHERE created_at >= '2026-01-01';
Monitor vacuum health on busy tables:
SELECT relname, n_dead_tup, n_live_tup,
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) AS dead_pct,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_dead_tup > 10000
ORDER BY n_dead_tup DESC
LIMIT 20;
Failure modes
work_memset too high causing OOM.work_memapplies per sort or hash operation, and queries can have multiple. With 200 concurrent connections atwork_mem = 256MB, the server runs out of memory quickly. Set conservatively (32–64 MB) and override per session for known heavy queries:SET LOCAL work_mem = '1GB'.PgBouncer transaction mode breaking prepared statements. Named prepared statements are scoped to a server connection, which is reassigned between transactions in pooling mode. Applications using named prepared statements fail. Use session mode for those applications, or configure the ORM to use protocol-level (unnamed) prepared statements.
Autovacuum unable to keep up. High update rates accumulate dead tuples faster than defaults clean them. Table bloat grows, queries degrade. Set
autovacuum_vacuum_scale_factor = 0.01on high-write tables and monitorpg_stat_user_tables.n_dead_tup.shared_bufferstoo large starving OS page cache. Many teams stay at 25% of RAM and let the OS cache handle the rest. Pushing above ~40% can reduce OS cache and hurt sequential reads. Seteffective_cache_sizecorrectly (75% of RAM) so the planner accounts for both.Vertical ceiling reached with analytical scans remaining. After full tuning, dashboard and reporting queries still consume most CPU and I/O. Adding RAM no longer helps — the queries are row-scanning by nature. Profile with
pg_stat_statementsto confirm the pattern, then implement analytical offload via CDC to ClickHouse®.
Why ClickHouse® for PostgreSQL vertical scaling
ClickHouse® is the right escape hatch when PostgreSQL vertical scaling reaches its ceiling for analytical workloads. PostgreSQL's row-oriented storage reads entire rows to aggregate values. ClickHouse®'s columnar engine reads only the columns referenced in the query — an aggregation over a wide table that touches 3 columns reads a fraction of the data.
This is the fastest database for analytics model: not faster hardware, but a storage format purpose-built for the analytical query pattern. Freeing PostgreSQL from analytical work — rather than adding more RAM — is the highest-leverage move when vertical tuning is complete.
Why Tinybird is a strong fit for PostgreSQL vertical scaling
When vertical scaling has been maximized and analytical queries remain the bottleneck, Tinybird provides the managed path to ClickHouse® without operational overhead. You get real-time change data capture from PostgreSQL, ClickHouse®-backed query execution, and REST API publishing in one platform.
Define the analytical queries overloading PostgreSQL as Tinybird Pipes, connect CDC, and route dashboard and API traffic to Tinybird endpoints. PostgreSQL handles OLTP; Tinybird handles real-time data processing for the analytical tier — without operating distributed PostgreSQL or a self-hosted ClickHouse® cluster.
Next step: run SELECT query, total_exec_time, calls FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 10. Identify which top queries are analytical aggregations vs. transactional lookups. The analytical ones are your candidates for Tinybird offload.
Frequently Asked Questions (FAQs)
What are the most impactful PostgreSQL vertical scaling config changes?
shared_buffers (25% of RAM), work_mem (32–64 MB), effective_cache_size (75% of RAM), random_page_cost (1.1 for NVMe), and autovacuum scale_factor = 0.01 for high-write tables. Then max_wal_size and checkpoint settings to reduce I/O spikes.
How do I know when PostgreSQL vertical scaling has reached its ceiling?
When indexes exist for all high-frequency queries, autovacuum is keeping up, and the remaining slow queries are analytical aggregations or time-range scans — those are structurally expensive regardless of hardware. pg_stat_statements will show them at the top by total_exec_time.
What is PgBouncer transaction mode and when should I use it?
PgBouncer transaction mode releases the server connection after each transaction, letting many clients share few server connections. Use it when client count exceeds ~200 and you see memory pressure or connection errors. Avoid it if your application uses named prepared statements, advisory locks, or session-scoped SET variables.
How does analytical offload via Tinybird complement PostgreSQL vertical scaling?
They are sequential steps: tune vertically first, then offload what tuning cannot fix. Vertical scaling improves PostgreSQL's ability to serve all workloads on one server. Analytical offload removes the workload type that vertical scaling cannot address — analytical scans over large datasets. The combination maximizes performance per dollar.
Does PostgreSQL vertical scaling help with connection pool exhaustion?
No. max_connections is a configured limit — each connection uses ~5–10 MB of shared memory regardless of RAM. The fix is PgBouncer, not more hardware. Set max_connections to 100–300 and let PgBouncer multiplex application connections on top.
What PostgreSQL metrics should I monitor to guide vertical scaling decisions?
Monitor: buffer cache hit rate (pg_statio_user_tables), dead tuple ratio (pg_stat_user_tables), top queries by total execution time (pg_stat_statements), checkpoint frequency (pg_stat_bgwriter), connection count vs. max_connections, and autovacuum_count per table. These point to the right lever — hardware, config, indexing, or analytical offload.
