Migrate from PostgreSQL

PostgreSQL is usually where application data starts. Tinybird is where you move analytical workloads when PostgreSQL is doing too much: user-facing dashboards, high-concurrency API Endpoints, event analytics, operational reporting, or real-time aggregations over large tables.

There is no single migration path. The right choice depends on freshness requirements, table size, update patterns, operational constraints, and whether you already run Kafka-compatible infrastructure.

Choose a migration path

PathUse it forTradeoffsStart here
PostgreSQL table function with Copy PipesSmall tables, dimension tables, periodic imports, one-off backfills, prototypes.Simple to operate. Pulls from PostgreSQL on a schedule, so freshness is bounded by the schedule. Large tables need chunking and indexed filters. Deletes need an explicit strategy.PostgreSQL table function
Change data capture to Kafka, then Tinybird Kafka connectorReal-time integrations, large event streams, production sync, low-latency analytics, updates and deletes.More moving parts. Requires logical replication and a CDC tool such as Redpanda Connect, Debezium, Confluent, Estuary, or a managed equivalent. Best option when data must stay fresh.Postgres CDC with Redpanda Connect
Application writes to TinybirdNew events generated by your app, migrations where PostgreSQL should stop being the analytics source.Requires application changes. Avoids polling PostgreSQL and usually gives the cleanest event model for analytics.Events API
File export and batch importLarge historical backfills, archive imports, or migrations where database access is restricted.Operationally simple, but not a live sync. You manage export, storage, and import cadence.Files

Most migrations use more than one path:

  1. Backfill historical data with the PostgreSQL table function or exported files.
  2. Keep new data fresh with CDC or application writes.
  3. Model updates and deletes in Tinybird with a ReplacingMergeTree, soft deletes, or a final materialized Data Source.
  4. Move analytical queries from PostgreSQL to Tinybird Pipes and API Endpoints.

Use the PostgreSQL table function for periodic imports

Use the postgresql() table function when you want Tinybird to read directly from PostgreSQL and copy the result into a Data Source.

This is the simplest path when:

  • The source table is small enough to copy within Copy Pipe limits.
  • You can tolerate freshness measured in minutes or hours.
  • The table is mostly append-only.
  • The table has an indexed timestamp or monotonically increasing column for incremental filters.
  • You are importing dimension tables that can be replaced on each run.

It is not the best default for high-volume transactional tables with constant updates and deletes. For those, use CDC.

Example periodic import

Create a target Data Source and a scheduled Copy Pipe that reads recent PostgreSQL rows.

datasources/orders.datasource
SCHEMA >
    `order_id` UInt64,
    `customer_id` UInt64,
    `status` String,
    `amount` Float64,
    `updated_at` DateTime

ENGINE "ReplacingMergeTree(updated_at)"
ENGINE_SORTING_KEY "order_id"
pipes/orders_from_postgres.pipe
NODE updated_orders
SQL >
    %
    SELECT
        order_id,
        customer_id,
        status,
        amount,
        updated_at
    FROM postgresql(
        'postgres.example.com:5432',
        'app',
        'orders',
        {{ tb_secret("PG_USERNAME") }},
        {{ tb_secret("PG_PASSWORD") }}
    )
    WHERE updated_at >= now() - INTERVAL 10 MINUTE

TYPE copy
TARGET_DATASOURCE orders
COPY_MODE append
COPY_SCHEDULE */5 * * * *

Set credentials as secrets in each environment where the Copy Pipe runs:

tb --cloud secret set PG_USERNAME <username>
tb --cloud secret set PG_PASSWORD <password>

Deploy the project:

tb --cloud deploy

For the full syntax, SDK examples, type mapping, and scheduling details, see PostgreSQL table function.

Table function tradeoffs

The table function runs a query against PostgreSQL every time the Copy Pipe runs. Treat it like any other production query on your primary database.

  • Add an index on the column used in the WHERE clause.
  • Push filters down to PostgreSQL. Do not copy a large table every minute.
  • Keep schedule frequency realistic for the source table size.
  • Use COPY_MODE replace for small dimension tables.
  • Use COPY_MODE append for event-like tables.
  • Use ReplacingMergeTree(updated_at) when rows can be updated.
  • Keep a strategy for deletes. Scheduled append jobs do not remove rows that disappeared from PostgreSQL.

For large historical imports, split the backfill into chunks by time or ID range. Each chunk should finish within PostgreSQL statement timeouts and Tinybird Copy Pipe limits.

Use CDC for real-time PostgreSQL integration

Use change data capture when PostgreSQL remains the source of truth and Tinybird needs to stay current with inserts, updates, and deletes.

The usual architecture is:

The CDC tool reads PostgreSQL logical replication or WAL changes and writes events to a Kafka-compatible topic. Tinybird consumes that topic with the Kafka connector.

Use CDC when:

  • You need seconds-level freshness.
  • You need to capture updates and deletes.
  • The source table is large or changes continuously.
  • You already use Kafka, Redpanda, Confluent, Debezium, Estuary, or another CDC platform.
  • You want PostgreSQL query load to come from replication rather than repeated analytical scans.

CDC adds infrastructure. You need to operate or configure the CDC capture tool, Kafka-compatible storage, schemas, offsets, retries, and monitoring. The benefit is lower database impact and better freshness for production analytics.

CDC event modeling in Tinybird

CDC events are not always the same shape as your final analytical table. Most tools emit metadata such as operation type, source timestamp, transaction position, and before or after values.

Common patterns in Tinybird:

  • Store raw CDC events in a Kafka Data Source.
  • Use Materialized Views to extract the current row shape into a query-friendly Data Source.
  • Use ReplacingMergeTree with a version column for upserts.
  • Represent deletes with a deleted flag or tombstone rows, then filter them from serving Pipes.
  • Keep raw CDC events for replay and debugging.

For a practical setup with Redpanda Connect, see Postgres CDC with Redpanda Connect. For the Tinybird side of the stream, see Kafka connector, Confluent Cloud setup, AWS MSK setup, and Redpanda setup.

Use application writes for new analytical events

If you control the application code and the data is event-like, consider writing new events directly to Tinybird instead of writing them to PostgreSQL first and extracting them later.

This works well for:

  • Clickstream events.
  • Product usage events.
  • Logs and telemetry.
  • Audit events.
  • Analytics-only events that do not need PostgreSQL transactions.

The tradeoff is that you change the application write path. The benefit is a cleaner analytics model and less load on PostgreSQL.

Use the Events API, TypeScript SDK quickstart, or Python SDK quickstart to write events directly.

Backfill historical data

Even CDC migrations usually need a backfill. You have two common options:

  • Use the PostgreSQL table function and run Copy Pipes over time or ID ranges.
  • Export PostgreSQL data to files and import them into Tinybird.

For table function backfills, index the chunking column and check that PostgreSQL uses the index:

CREATE INDEX idx_orders_updated_at ON orders (updated_at);
VACUUM ANALYZE orders;

EXPLAIN
SELECT *
FROM orders
WHERE updated_at >= '2026-01-01 00:00:00'
  AND updated_at < '2026-01-02 00:00:00';

Then run one Copy Pipe job per chunk. Start with small chunks, measure runtime, then increase the interval if both PostgreSQL and Tinybird have headroom.

When using CDC, confirm how your CDC tool handles initial snapshots. Some tools can snapshot existing rows and then continue streaming changes from the same consistent point. If your tool does that reliably, you may not need a separate table function backfill.

Decide how to handle updates and deletes

PostgreSQL tables often have mutable rows. Tinybird Data Sources are optimized for analytical reads and append-heavy ingestion, so model mutability explicitly.

Use one of these patterns:

Source behaviorTinybird patternNotes
Append-only eventsMergeTree ordered by query filtersBest fit. Keep events immutable.
Updates by primary keyReplacingMergeTree(version_column) sorted by primary keyQuery with FINAL only when needed. Prefer materializing the latest state for serving paths.
DeletesSoft delete flag or tombstone eventFilter deleted rows in serving Pipes, or materialize active rows into a final Data Source.
Late-arriving changesVersion column from source timestamp, WAL position, or CDC event timestampMake the version monotonic for each primary key.

See Deduplication strategies and Materialized Views for the modeling details.

Migration checklist

  1. Identify the PostgreSQL tables used by analytical queries.
  2. Classify each table as append-only, mutable dimension, or transactional current state.
  3. Pick an ingestion path per table: table function, CDC, application writes, or file import.
  4. Create Tinybird Data Sources with sorting keys based on your query filters.
  5. Backfill historical data.
  6. Start the live sync path.
  7. Build Pipes and API Endpoints for application queries.
  8. Compare counts and key aggregates between PostgreSQL and Tinybird.
  9. Move read traffic to Tinybird.
  10. Remove old analytical queries from PostgreSQL when the new path is stable.

See also

Updated