---
title: "How to make your SQL database faster"
excerpt: "Most SQL performance problems come from the same handful of causes. Here's how to diagnose and fix them, and when the problem isn't query optimization at all."
authors: "Tinybird"
categories: "AI Resources"
createdOn: "2026-05-11 00:00:00"
publishedOn: "2026-05-11 00:00:00"
updatedOn: "2026-05-11 00:00:00"
status: "published"
---

Your database is slow. Before you throw hardware at it or start rewriting queries at random, it helps to know why.

Most SQL performance problems have a short list of causes: missing indexes, queries that scan more data than they need to, analytics patterns running against a transactional database, or connection handling that breaks down under load. Each has a fix. Some are a one-liner. A few require rethinking the architecture.

This post covers all of them, in order of how often they show up.

## Check what's actually slow first

Don't guess. Run `EXPLAIN ANALYZE` before touching anything.

```sql
EXPLAIN ANALYZE
SELECT u.name, count(o.id) AS order_count
FROM users u
JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2026-01-01'
GROUP BY u.name
ORDER BY order_count DESC
LIMIT 20;
```

In Postgres, the output shows you execution time, rows scanned vs. rows returned, and which nodes are doing the most work. Look for:

- **Seq Scan** on a large table: usually means a missing index
- **rows=1000000** with **actual rows=12**: the planner estimated wrong, often due to stale statistics
- **Hash Join** on a huge table: might be better served with a different join strategy or a filtered subquery
- High **actual time** on a node that should be cheap: points to data volume or a type mismatch slowing the index lookup

Postgres also has `pg_stat_statements`, which surfaces your slowest queries by total execution time across all calls:

```sql
SELECT query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
```

Fix the queries at the top of this list. A query taking 800ms that runs 10,000 times a day is more important than one taking 5 seconds that runs twice.

## Add the right indexes

The most common cause of slow queries is scanning rows you don't need to. Indexes fix this.

The simplest case: a `WHERE` clause on an unindexed column.

```sql
-- Slow: full table scan on 10M users
SELECT * FROM users WHERE email = 'ada@example.com';

-- Fix
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
```

`CONCURRENTLY` lets the index build without locking the table. Use it in production.

For multi-column filters, column order in the index matters. Put the most selective column first, and match the order of your `WHERE` clauses:

```sql
-- Query filters by status then created_at
SELECT * FROM orders
WHERE status = 'pending' AND created_at > now() - interval '7 days';

-- Index: status first (low cardinality but filters heavily), then created_at
CREATE INDEX idx_orders_status_created ON orders (status, created_at);
```

If your query includes `ORDER BY` on the same columns as the `WHERE`, a covering index avoids a sort step entirely:

```sql
-- This query both filters and sorts by created_at
SELECT id, user_id, total FROM orders
WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 10;

-- Covering index: user_id for the filter, created_at for the sort, include the projected columns
CREATE INDEX idx_orders_user_created ON orders (user_id, created_at DESC)
INCLUDE (id, total);
```

One warning: indexes aren't free. Every write to the table updates every index on it. Don't index every column. Index the columns that appear in `WHERE`, `JOIN ON`, and `ORDER BY` clauses in your slowest queries.

## Stop using SELECT *

`SELECT *` fetches every column from every matched row. If you need 3 columns from a 40-column table, you're reading 37 extra columns from disk on every row.

```sql
-- Slow: reads all 40 columns
SELECT * FROM events WHERE user_id = 42;

-- Fast: reads only what you need
SELECT id, event_type, created_at FROM events WHERE user_id = 42;
```

This matters more than it sounds. Wide tables with `JSONB` or `TEXT` columns balloon row sizes. Fetching those columns takes real I/O even when you immediately discard them in application code.

## Fix N+1 queries

N+1 is the most common application-level performance problem. It happens when you load a list of records and then query for related data on each one individually.

```typescript
// N+1: 1 query for users + N queries for their orders
const users = await db.query("SELECT id, name FROM users LIMIT 100");
for (const user of users) {
  user.orders = await db.query(
    "SELECT * FROM orders WHERE user_id = $1", [user.id]
  );
}
```

This fires 101 queries. Replace it with a JOIN or a single `IN` query:

```typescript
// 1 query
const rows = await db.query(`
  SELECT u.id, u.name, o.id AS order_id, o.total
  FROM users u
  LEFT JOIN orders o ON o.user_id = u.id
  WHERE u.id = ANY($1)
`, [userIds]);
```

ORM users: N+1 is almost always a missing `.include()` / `.with()` / `.preload()` call on a relationship. Turn on query logging in development and watch for repeated queries with different ID parameters.

## Use connection pooling

Each Postgres connection is an OS process. Opening a new one takes 20-50ms and consumes memory. If your application opens a new connection per request and you're handling 500 req/sec, you're spending 10-25 seconds per second just on connection setup.

Use a connection pool. In Node.js with `pg`:

```typescript
import { Pool } from "pg";

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,           // max pool size
  idleTimeoutMillis: 30_000,
  connectionTimeoutMillis: 2_000,
});

// Reuses connections from the pool
const result = await pool.query("SELECT ...");
```

For higher scale, put PgBouncer in front of Postgres in transaction-mode pooling. It multiplexes thousands of application connections onto a small number of actual Postgres connections. A Postgres instance that falls over at 200 direct connections can handle 5,000+ application connections behind PgBouncer with no code changes.

## Avoid functions on indexed columns in WHERE clauses

Wrapping a column in a function inside a `WHERE` clause disables the index on that column. The database can't use the index to find matching values when the values are being transformed at query time.

```sql
-- Slow: index on created_at can't be used
WHERE date_trunc('day', created_at) = '2026-05-11'

-- Fast: rewrite to a range query
WHERE created_at >= '2026-05-11' AND created_at < '2026-05-12'
```

```sql
-- Slow: index on email can't be used
WHERE lower(email) = 'ada@example.com'

-- Fast option 1: store emails lowercased on insert
-- Fast option 2: create a functional index
CREATE INDEX idx_users_email_lower ON users (lower(email));
WHERE lower(email) = 'ada@example.com'  -- now uses the functional index
```

## Partition large tables

Tables with hundreds of millions of rows need partitioning to stay fast over time. Partitioning splits the physical storage into smaller chunks the query planner can skip entirely based on filter values.

Range partitioning by time is the most common pattern for append-heavy tables:

```sql
CREATE TABLE events (
  id          BIGSERIAL,
  user_id     BIGINT,
  event_type  TEXT,
  created_at  TIMESTAMPTZ NOT NULL
) PARTITION BY RANGE (created_at);

CREATE TABLE events_2026_05 PARTITION OF events
  FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');

CREATE TABLE events_2026_06 PARTITION OF events
  FOR VALUES FROM ('2026-06-01') TO ('2026-07-01');
```

A query with `WHERE created_at >= '2026-05-01' AND created_at < '2026-06-01'` now scans only the May partition, not the entire table. For a 5-year table with monthly partitions, that's skipping 59 of 60 partitions.

Pair partitioning with indexes on each partition. Partition pruning and index scans together are far faster than either alone.

## Materialize expensive aggregations

If you're running the same expensive aggregate query repeatedly (dashboard totals, leaderboards, summary stats), compute the result once and store it. Recomputing from raw data on every request doesn't scale.

```sql
-- Materialized view: computed once, queried many times
CREATE MATERIALIZED VIEW daily_order_totals AS
SELECT
  date_trunc('day', created_at) AS day,
  status,
  count(*)                      AS order_count,
  sum(total)                    AS revenue
FROM orders
GROUP BY 1, 2;

CREATE INDEX ON daily_order_totals (day);

-- Refresh on a schedule or after bulk inserts
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_order_totals;
```

`CONCURRENTLY` allows reads during the refresh. It requires a unique index on the view.

## When the problem isn't the query

All of the above assumes your bottleneck is query execution. Sometimes it isn't.

**You're running analytics on a transactional database.** Postgres and MySQL are designed for OLTP: many small reads and writes, indexed by primary key, with strong consistency guarantees. Analytical queries — aggregations over millions of rows, GROUP BY across multiple dimensions, time-series rollups — are a different workload entirely. Running them on a transactional database puts load on your primary, competes with application queries for I/O, and doesn't get better with indexes alone.

If you're running queries like this against Postgres:

```sql
SELECT
  date_trunc('hour', created_at) AS hour,
  event_type,
  region,
  count(*) AS events,
  count(DISTINCT user_id) AS unique_users
FROM events
WHERE created_at > now() - interval '30 days'
GROUP BY 1, 2, 3
ORDER BY 1;
```

...and it's slow even after indexing, that's a workload mismatch. That query needs a columnar OLAP database, not a row-store transactional one.

Tinybird runs on ClickHouse{% sup %}®{% /sup %}, which stores data by column rather than by row. Analytical queries that scan billions of rows and aggregate across multiple dimensions return in milliseconds, because the database only reads the columns it needs, and reads them in compressed, vectorized chunks.

The same query in Tinybird, exposed as an API endpoint:

```sql
-- analytics_hourly.pipe
NODE hourly
SQL >
  %
  SELECT
    toStartOfHour(created_at)    AS hour,
    event_type,
    region,
    count()                      AS events,
    uniq(user_id)                AS unique_users
  FROM events
  WHERE created_at > now() - interval {{ Int32(days, 30) }} day
  GROUP BY hour, event_type, region
  ORDER BY hour

TYPE ENDPOINT
```

Parameters, auth, and HTTP serving are handled by Tinybird. Your application calls a URL. The analytics load moves off your Postgres primary entirely.

**You've hit hardware limits.** After indexes, connection pooling, query rewrites, and partitioning, the remaining lever is hardware: more RAM (so the working set fits in memory), faster storage (NVMe vs. SSD), or read replicas to distribute query load. These help. They're also expensive and don't fix architectural problems — a bad query on faster hardware is still a bad query.

Fix the queries first.

{% cta
  title="Queries still slow after all that?"
  text="Analytics workloads don't belong in Postgres. Move them to Tinybird and query billions of rows in milliseconds."
  button={href: "https://cloud.tinybird.co/signup", target: "_blank", text: "Try Tinybird free"}
/%}
