Migrate from ClickHouse self-hosted to Tinybird¶

If you're running ClickHouse self-hosted and looking for a more streamlined, production-ready analytics platform, this guide will walk you through migrating to Tinybird. You'll learn how to translate your existing ClickHouse concepts, take advantage of Tinybird's unique features, and integrate the platform into your application architecture.

Why migrate from ClickHouse self-hosted to Tinybird?¶

While ClickHouse is an excellent OLAP database, self-hosting comes with significant operational overhead. You're responsible for:

Infrastructure management: Cluster operations, sharding, replication, and scaling
Security: Authentication, authorization, and network security
Monitoring: Performance tuning, query optimization, and observability
Maintenance: Updates, backups, disaster recovery, and capacity planning
Development workflow: Building APIs, handling authentication, and creating deployment pipelines

Tinybird abstracts away these operational complexities while providing additional capabilities that transform ClickHouse from a database into a complete analytics platform.

Benefits of Tinybird over ClickHouse self-hosted¶

Tinybird offers several key advantages over managing ClickHouse yourself:

Fully managed infrastructure¶

Tinybird handles all the operational complexity of running ClickHouse at scale:

Automatic scaling: Infrastructure scales transparently based on your workload
Zero cluster operations: No need to manage shards, replicas, or cluster topology
Built-in monitoring: Comprehensive observability with performance metrics and alerting
Managed updates: Automatic updates and patches without downtime

API-first architecture¶

Transform SQL queries into production-grade REST APIs:

Instant API endpoints: Turn any SQL query into a scalable HTTP API with authentication
Parameterized queries: Build dynamic APIs with template variables and type validation
Built-in authentication: Token-based and JWT authentication with fine-grained permissions
Rate limiting: Built-in protection against abuse and cost control

Real-time data ingestion¶

Simplified data ingestion with multiple options:

Events API: High-throughput streaming ingestion for real-time events via HTTP
Native connectors: Managed integrations for Kafka, S3, GCS, and more
Data Sources API: Batch ingestion for CSV, Parquet, and NDJSON files
Schema evolution: Handle schema changes without downtime

Development workflow¶

Modern development practices built in:

Local development: Full local environment with Tinybird Local container
Version control: Git-based workflow for all data assets
CI/CD integration: Deploy changes through your existing pipelines
Testing and validation: Built-in deployment validation and rollback capabilities

BI and tool integration¶

Connect to your existing analytics tools:

ClickHouse HTTP interface: Use existing ClickHouse drivers and tools
PostgreSQL compatibility: Connect BI tools like Grafana, Metabase, and Tableau
Query API: Direct SQL execution for existing applications

ClickHouse to Tinybird concept translation¶

Tinybird has an opinionated approach to structuring analytics projects using two primary file types: .datasource files (tables) and .pipe files (queries, views, and materializations).

Database → Workspace¶

In ClickHouse, you organize tables within databases. In Tinybird:

Organization: Top-level container for all your projects
Workspace: Equivalent to a database, contains all your data sources and pipes

Learn more about Tinybird Core Concepts.

Table → Data Source¶

ClickHouse tables become Tinybird data sources, defined in .datasource files. Data sources in Tinybird define table schemas, engines, sorting keys, and partition keys. They may also contain references to connectors from which they receive data.

ClickHouse CREATE TABLE statement:

ClickHouse table creation

CREATE TABLE user_events
(
    user_id UInt32 ,
    event_type String,
    timestamp DateTime,
    properties JSON
)
ENGINE = MergeTree
ORDER BY (user_id, timestamp)
PARTITION BY toYYYYMM(timestamp)

Tinybird datasources/user_events.datasource:

user_events.datasource

DESCRIPTION >
    User events tracking with properties for analytics

SCHEMA >
    `user_id` UInt32 `json:$.user_id`,
    `event_type` String `json:$.event_type`,
    `timestamp` DateTime `json:$.timestamp`,
    `properties` String `json:$.properties`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "user_id, timestamp"

# optional connector configuration
KAFKA_CONNECTION_NAME kafka_connection # The name of the .connection file
KAFKA_TOPIC topic_name
KAFKA_GROUP_ID {{ tb_secret("KAFKA_GROUP_ID") }}

Key differences:

JSONPath expressions: Map JSON fields to columns automatically
Declarative schema: Define structure without SQL DDL
Built-in description: Document your data sources inline
Source connector definitions: Define where the data comes from

Learn more about data sources.

Views and parametrized queries → Pipes¶

ClickHouse views become Tinybird pipes, which can be used for to define API endpoints, materializations, internal queries, scheduled copies, sinks, or as building blocks for other pipes.

ClickHouse parametrized view:

ClickHouse parametrized view

CREATE VIEW user_activity_summary
AS SELECT
    user_id,
    event_type,
    count() AS event_count,
    min(timestamp) AS first_event,
    max(timestamp) AS last_event
FROM user_events
WHERE timestamp >= {start_date:DateTime}
  AND timestamp <= {end_date:DateTime}
  AND user_id = {target_user:UInt32}
GROUP BY user_id, event_type
ORDER BY event_count DESC

Tinybird pipes/user_activity_summary.pipe:

user_activity_summary.pipe

DESCRIPTION >
    User activity summary with date range and user filtering

NODE activity_summary
SQL >
    %
    SELECT
        user_id,
        event_type,
        count() AS event_count,
        min(timestamp) AS first_event,
        max(timestamp) AS last_event
    FROM user_events
    WHERE timestamp >= {{DateTime(start_date)}}
      AND timestamp <= {{DateTime(end_date)}}
      AND user_id = {{UInt32(target_user)}}
    GROUP BY user_id, event_type
    ORDER BY event_count DESC

Key differences:

Template variables: Use % token at the beginning of a query and {{Type(variable)}} syntax for parameters.
Advanced logic: Support for {% %} advanced templating functions
Node structure: Organize complex queries into multiple nodes
Type safety: Automatic type validation for parameters

Learn more about pipes.

Materialized Views → Materialized Views¶

ClickHouse incremental materialized views translate to Tinybird materialized views with target data sources.

ClickHouse materialized view:

ClickHouse materialized view

-- Target table
CREATE TABLE user_metrics_hourly
(
    hour DateTime,
    user_id UInt32,
    event_count AggregateFunction(count),
    unique_events AggregateFunction(uniq, String)
)
ENGINE = AggregatingMergeTree()
ORDER BY (hour, user_id);

-- Materialized view
CREATE MATERIALIZED VIEW user_metrics_hourly_mv
TO user_metrics_hourly
AS SELECT
    toStartOfHour(timestamp) AS hour,
    user_id,
    countState() AS event_count,
    uniqState(event_type) AS unique_events
FROM user_events
GROUP BY hour, user_id;

Tinybird implementation:

Target data source (datasources/user_metrics_hourly.datasource):

user_metrics_hourly.datasource

DESCRIPTION >
    Hourly aggregated user metrics with state functions

SCHEMA >
    `hour` DateTime,
    `user_id` UInt32,
    `event_count` AggregateFunction(count, UInt64),
    `unique_events` AggregateFunction(uniq, String)

ENGINE "AggregatingMergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(hour)"
ENGINE_SORTING_KEY "hour, user_id"

Materialized view (materializations/user_metrics_hourly.pipe):

user_metrics_hourly.pipe

DESCRIPTION >
    Incremental materialization of user metrics

NODE hourly_aggregation
SQL >
    SELECT
        toStartOfHour(timestamp) AS hour,
        user_id,
        countState() AS event_count,
        uniqState(event_type) AS unique_events
    FROM user_events
    GROUP BY hour, user_id

TYPE MATERIALIZED
DATASOURCE user_metrics_hourly

Learn more about materialized views.

Refreshable Materialized Views → Copy Pipes¶

ClickHouse refreshable materialized views become Tinybird copy pipes with scheduling.

ClickHouse refreshable materialized view:

ClickHouse refreshable materialized view

CREATE MATERIALIZED VIEW daily_user_summary
REFRESH EVERY 1 HOUR
ENGINE = MergeTree()
ORDER BY (date, user_id)
AS SELECT
    toDate(timestamp) AS date,
    user_id,
    count() AS total_events,
    countDistinct(event_type) AS unique_event_types
FROM user_events
GROUP BY date, user_id;

Tinybird implementation:

Target Data Source (datasources/daily_user_summary.datasource):

daily_user_summary.datasource

DESCRIPTION >
    Daily user activity summary

SCHEMA >
    `date` Date,
    `user_id` UInt32,
    `total_events` UInt64,
    `unique_event_types` UInt64

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(date)"
ENGINE_SORTING_KEY "date, user_id"

Copy Pipe (copies/daily_user_summary.pipe):

daily_user_summary.pipe

DESCRIPTION >
    Hourly copy job for daily user summaries

NODE daily_aggregation
SQL >
    SELECT
        toDate(timestamp) AS date,
        user_id,
        count() AS total_events,
        countDistinct(event_type) AS unique_event_types
    FROM user_events
    GROUP BY date, user_id

TYPE COPY
TARGET_DATASOURCE daily_user_summary
COPY_SCHEDULE "0 * * * *"  -- Every hour

Learn more about copy pipes.

Schema evolution¶

Evolving your schema in Tinybird is handled through deployments. Edit your .datasource or .pipe files and deploy changes:

# Validate changes before deploying
tb deploy --check

# Deploy changes
tb deploy

Tinybird automatically handles backfills and schema migrations based on your changes.

Learn more about Schema Evolution and Deployments.

Integrating Tinybird into your application¶

Data ingestion services¶

Tinybird provides multiple ways to ingest data. You do not need to set up ingestion infrastructure or use INSERT statements. Tinybird offers several native ingestion connectors and APIs.

Events API¶

Best for real-time streaming data, and can be used to send data directly from the client to Tinyburd. Send JSON or NDJSON events via HTTP POST:

Events API ingestion

curl -X POST "https://api.tinybird.co/v0/events?name=user_events" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"user_id": 123, "event_type": "page_view", "timestamp": "2024-01-15 10:30:00", "properties": "{\"page\": \"/dashboard\"}"}'

Learn more about the Events API.

Data Sources API¶

For batch ingestion of CSV, Parquet, or NDJSON files:

Batch file ingestion

curl -X POST "https://api.tinybird.co/v0/datasources?mode=append&name=user_events" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  -F "csv=@user_events.csv"

Learn more about the Data Sources API.

Connectors¶

Managed integrations for popular data sources:

Kafka: Real-time streaming with automatic schema detection
S3/GCS: Automated file ingestion from cloud storage

Connectors handle authentication, schema mapping, and error handling automatically.

Learn more about Connectors.

Table functions¶

Tinybird supports ClickHouse table functions for specialized data ingestion and transformation:

url(): Load data directly from HTTP/HTTPS URLs
iceberg(): Import data from Apache Iceberg tables
postgres(): Import data from PostgreSQL databases
mysql(): Import data from MySQL databases

Example using postgres() to import data from PostgreSQL:

pipes/import_from_postgres.pipe

DESCRIPTION >
    Import user data from PostgreSQL database

NODE postgres_import
SQL >
    SELECT
        user_id,
        email,
        created_at,
        last_login
    FROM postgres(
        'postgresql://username:password@hostname:5432/database',
        'users'
    )
    WHERE created_at >= yesterday()

TYPE COPY
TARGET_DATASOURCE user_data
COPY_SCHEDULE "0 */6 * * *"  -- Every 6 hours

This example creates a scheduled copy pipe that imports user data from a PostgreSQL table every 6 hours.

Learn more about table functions.

Query development¶

Tinybird offers flexible options for querying data, from direct SQL to API endpoints.

Pipes vs Query API¶

Pipes are the default approach for production applications:

Type-safe parameters: Automatic validation and conversion
Versioning: Track changes and roll back if needed
Performance optimization: Automatic query optimization and caching
Security: Fine-grained access control with tokens

Query API for direct SQL execution:

Migration path: Easy transition from existing ClickHouse applications
Ad-hoc analysis: Quick data exploration and debugging
BI tools: Direct integration with SQL-based tools

Example API endpoint from a pipe:

endpoints/user_activity.pipe

TOKEN analytics_api READ

DESCRIPTION >
    User activity API endpoint with authentication

NODE user_metrics
SQL >
    %
    SELECT
        user_id,
        event_type,
        count() AS event_count,
        max(timestamp) AS last_activity
    FROM user_events
    WHERE user_id = {{UInt32(user_id)}}
      AND timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}}
    GROUP BY user_id, event_type
    ORDER BY event_count DESC

TYPE ENDPOINT

Access the endpoint:

curl "https://api.tinybird.co/v0/pipes/user_activity.json?user_id=123&start_date=2024-01-01" \
  -H "Authorization: Bearer READ_TOKEN"

Learn more about API Endpoints.

BI integration using ClickHouse interface¶

Connect your existing BI tools using the ClickHouse HTTP interface

Deployment options¶

Tinybird offers flexible deployment options to meet your infrastructure and compliance requirements.

Tinybird Cloud hosting¶

Fully-managed service with global regions:

No infrastructure management: Tinybird handles all operations
Global availability: Multiple regions across AWS and GCP
Automatic scaling: Infrastructure scales as needed based on selected plan
Enterprise features: SSO, audit logs, advanced security controls

Tinybird self-managed regions¶

Bring Your Own Cloud deployment for enhanced control:

Data sovereignty: Keep data in your cloud environment
Network isolation: Deploy within your VPC/VNet
Compliance: Meet specific regulatory requirements
Custom configurations: Tailored performance optimizations

Benefits of self-managed regions:

Complete control over infrastructure and data location
Integration with existing cloud commitments and reserved instances
Direct connection to internal data sources without network egress
Custom security policies and network configurations

Deployment process:

Register a self-managed region with Tinybird Cloud
Deploy Tinybird infrastructure in your cloud environment
Connect and manage through Tinybird CLI with --region flag

Self-managed regions are currently in beta. Contact Tinybird support for deployment assistance and requirements.

Learn more about self-managed Regions.

Open source projects using Tinybird¶

Many successful open source projects have adopted Tinybird for their analytics needs:

Ghost: Publishing platform using Tinybird for member analytics and engagement metrics
Papermark: Document sharing platform with real-time analytics on document views and interactions
Inbox Zero: Email management tool leveraging Tinybird for email analytics and automation insights
LocalStack: Cloud development platform using Tinybird for usage analytics and performance monitoring
OpenStatus: Website monitoring platform built on Tinybird for real-time uptime and performance analytics
Dub: Link management platform using Tinybird for click analytics and link performance tracking

Each leverages Tinybird's real-time capabilities, API-first architecture, and managed infrastructure to focus on their core product instead of analytics infrastructure, while offering the flexibility between hosted and self-hosted deployments..

Migration strategy¶

To successfully migrate from ClickHouse self-hosted to Tinybird:

1. Assessment and planning¶

Inventory your current setup: Document tables, views, materialized views, and queries
Identify integration points: Map how your application currently interacts with ClickHouse
Plan data migration: Choose between full migration or gradual transition

2. Local development setup¶

# Install Tinybird CLI
curl https://tinybird.co | sh

# Authenticate
tb login

# Start local development environment
tb local start

3. Schema translation¶

Convert ClickHouse tables to .datasource files
Transform views and materialized views to .pipe files
Set up materialized views and copy pipes for aggregations

4. Data migration¶

Use Data Sources API for bulk historical data
Set up real-time connectors for ongoing ingestion
Validate data consistency between systems

5. Application integration¶

Replace direct ClickHouse queries with Tinybird API endpoints or requests to the Query API
Update authentication to use Tinybird tokens
Configure BI tools to use Tinybird's ClickHouse interface

6. Production deployment¶

Deploy to Tinybird Cloud or set up self-managed region
Configure monitoring and alerting via Tinybird service data sources
Gradually shift traffic from ClickHouse to Tinybird

Next steps¶

Ready to start your migration? Here are some immediate next steps:

Sign up for a free Tinybird account and explore the platform
Follow the quick start guide to understand core concepts
Install the CLI and set up local development
Join our community for migration support and best practices

For complex migrations or enterprise requirements, contact our team for personalized assistance and migration planning.