Migrate from ClickHouse self-hosted to Tinybird¶
If you're running ClickHouse self-hosted and looking for a more streamlined, production-ready analytics platform, this guide will walk you through migrating to Tinybird. You'll learn how to translate your existing ClickHouse concepts, take advantage of Tinybird's unique features, and integrate the platform into your application architecture.
Why migrate from ClickHouse self-hosted to Tinybird?¶
While ClickHouse is an excellent OLAP database, self-hosting comes with significant operational overhead. You're responsible for:
- Infrastructure management: Cluster operations, sharding, replication, and scaling
- Security: Authentication, authorization, and network security
- Monitoring: Performance tuning, query optimization, and observability
- Maintenance: Updates, backups, disaster recovery, and capacity planning
- Development workflow: Building APIs, handling authentication, and creating deployment pipelines
Tinybird abstracts away these operational complexities while providing additional capabilities that transform ClickHouse from a database into a complete analytics platform.
Benefits of Tinybird over ClickHouse self-hosted¶
Tinybird offers several key advantages over managing ClickHouse yourself:
Fully managed infrastructure¶
Tinybird handles all the operational complexity of running ClickHouse at scale:
- Automatic scaling: Infrastructure scales transparently based on your workload
- Zero cluster operations: No need to manage shards, replicas, or cluster topology
- Built-in monitoring: Comprehensive observability with performance metrics and alerting
- Managed updates: Automatic updates and patches without downtime
API-first architecture¶
Transform SQL queries into production-grade REST APIs:
- Instant API endpoints: Turn any SQL query into a scalable HTTP API with authentication
- Parameterized queries: Build dynamic APIs with template variables and type validation
- Built-in authentication: Token-based and JWT authentication with fine-grained permissions
- Rate limiting: Built-in protection against abuse and cost control
Real-time data ingestion¶
Simplified data ingestion with multiple options:
- Events API: High-throughput streaming ingestion for real-time events via HTTP
- Native connectors: Managed integrations for Kafka, S3, GCS, and more
- Data Sources API: Batch ingestion for CSV, Parquet, and NDJSON files
- Schema evolution: Handle schema changes without downtime
Development workflow¶
Modern development practices built in:
- Local development: Full local environment with Tinybird Local container
- Version control: Git-based workflow for all data assets
- CI/CD integration: Deploy changes through your existing pipelines
- Testing and validation: Built-in deployment validation and rollback capabilities
BI and tool integration¶
Connect to your existing analytics tools:
- ClickHouse HTTP interface: Use existing ClickHouse drivers and tools
- PostgreSQL compatibility: Connect BI tools like Grafana, Metabase, and Tableau
- Query API: Direct SQL execution for existing applications
ClickHouse to Tinybird concept translation¶
Tinybird has an opinionated approach to structuring analytics projects using two primary file types: .datasource
files (tables) and .pipe
files (queries, views, and materializations).
Database → Workspace¶
In ClickHouse, you organize tables within databases. In Tinybird:
- Organization: Top-level container for all your projects
- Workspace: Equivalent to a database, contains all your data sources and pipes
Learn more about Tinybird Core Concepts.
Table → Data Source¶
ClickHouse tables become Tinybird data sources, defined in .datasource
files. Data sources in Tinybird define table schemas, engines, sorting keys, and partition keys. They may also contain references to connectors from which they receive data.
ClickHouse CREATE TABLE
statement:
ClickHouse table creation
CREATE TABLE user_events ( user_id UInt32 , event_type String, timestamp DateTime, properties JSON ) ENGINE = MergeTree ORDER BY (user_id, timestamp) PARTITION BY toYYYYMM(timestamp)
Tinybird datasources/user_events.datasource
:
user_events.datasource
DESCRIPTION > User events tracking with properties for analytics SCHEMA > `user_id` UInt32 `json:$.user_id`, `event_type` String `json:$.event_type`, `timestamp` DateTime `json:$.timestamp`, `properties` String `json:$.properties` ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(timestamp)" ENGINE_SORTING_KEY "user_id, timestamp" # optional connector configuration KAFKA_CONNECTION_NAME kafka_connection # The name of the .connection file KAFKA_TOPIC topic_name KAFKA_GROUP_ID {{ tb_secret("KAFKA_GROUP_ID") }}
Key differences:
- JSONPath expressions: Map JSON fields to columns automatically
- Declarative schema: Define structure without SQL DDL
- Built-in description: Document your data sources inline
- Source connector definitions: Define where the data comes from
Learn more about data sources.
Views and parametrized queries → Pipes¶
ClickHouse views become Tinybird pipes, which can be used for to define API endpoints, materializations, internal queries, scheduled copies, sinks, or as building blocks for other pipes.
ClickHouse parametrized view:
ClickHouse parametrized view
CREATE VIEW user_activity_summary AS SELECT user_id, event_type, count() AS event_count, min(timestamp) AS first_event, max(timestamp) AS last_event FROM user_events WHERE timestamp >= {start_date:DateTime} AND timestamp <= {end_date:DateTime} AND user_id = {target_user:UInt32} GROUP BY user_id, event_type ORDER BY event_count DESC
Tinybird pipes/user_activity_summary.pipe
:
user_activity_summary.pipe
DESCRIPTION > User activity summary with date range and user filtering NODE activity_summary SQL > % SELECT user_id, event_type, count() AS event_count, min(timestamp) AS first_event, max(timestamp) AS last_event FROM user_events WHERE timestamp >= {{DateTime(start_date)}} AND timestamp <= {{DateTime(end_date)}} AND user_id = {{UInt32(target_user)}} GROUP BY user_id, event_type ORDER BY event_count DESC
Key differences:
- Template variables: Use
%
token at the beginning of a query and{{Type(variable)}}
syntax for parameters. - Advanced logic: Support for
{% %}
advanced templating functions - Node structure: Organize complex queries into multiple nodes
- Type safety: Automatic type validation for parameters
Learn more about pipes.
Materialized Views → Materialized Views¶
ClickHouse incremental materialized views translate to Tinybird materialized views with target data sources.
ClickHouse materialized view:
ClickHouse materialized view
-- Target table CREATE TABLE user_metrics_hourly ( hour DateTime, user_id UInt32, event_count AggregateFunction(count), unique_events AggregateFunction(uniq, String) ) ENGINE = AggregatingMergeTree() ORDER BY (hour, user_id); -- Materialized view CREATE MATERIALIZED VIEW user_metrics_hourly_mv TO user_metrics_hourly AS SELECT toStartOfHour(timestamp) AS hour, user_id, countState() AS event_count, uniqState(event_type) AS unique_events FROM user_events GROUP BY hour, user_id;
Tinybird implementation:
Target data source (datasources/user_metrics_hourly.datasource
):
user_metrics_hourly.datasource
DESCRIPTION > Hourly aggregated user metrics with state functions SCHEMA > `hour` DateTime, `user_id` UInt32, `event_count` AggregateFunction(count, UInt64), `unique_events` AggregateFunction(uniq, String) ENGINE "AggregatingMergeTree" ENGINE_PARTITION_KEY "toYYYYMM(hour)" ENGINE_SORTING_KEY "hour, user_id"
Materialized view (materializations/user_metrics_hourly.pipe
):
user_metrics_hourly.pipe
DESCRIPTION > Incremental materialization of user metrics NODE hourly_aggregation SQL > SELECT toStartOfHour(timestamp) AS hour, user_id, countState() AS event_count, uniqState(event_type) AS unique_events FROM user_events GROUP BY hour, user_id TYPE MATERIALIZED DATASOURCE user_metrics_hourly
Learn more about materialized views.
Refreshable Materialized Views → Copy Pipes¶
ClickHouse refreshable materialized views become Tinybird copy pipes with scheduling.
ClickHouse refreshable materialized view:
ClickHouse refreshable materialized view
CREATE MATERIALIZED VIEW daily_user_summary REFRESH EVERY 1 HOUR ENGINE = MergeTree() ORDER BY (date, user_id) AS SELECT toDate(timestamp) AS date, user_id, count() AS total_events, countDistinct(event_type) AS unique_event_types FROM user_events GROUP BY date, user_id;
Tinybird implementation:
Target Data Source (datasources/daily_user_summary.datasource
):
daily_user_summary.datasource
DESCRIPTION > Daily user activity summary SCHEMA > `date` Date, `user_id` UInt32, `total_events` UInt64, `unique_event_types` UInt64 ENGINE "MergeTree" ENGINE_PARTITION_KEY "toYYYYMM(date)" ENGINE_SORTING_KEY "date, user_id"
Copy Pipe (copies/daily_user_summary.pipe
):
daily_user_summary.pipe
DESCRIPTION > Hourly copy job for daily user summaries NODE daily_aggregation SQL > SELECT toDate(timestamp) AS date, user_id, count() AS total_events, countDistinct(event_type) AS unique_event_types FROM user_events GROUP BY date, user_id TYPE COPY TARGET_DATASOURCE daily_user_summary COPY_SCHEDULE "0 * * * *" -- Every hour
Learn more about copy pipes.
Schema evolution¶
Evolving your schema in Tinybird is handled through deployments. Edit your .datasource
or .pipe
files and deploy changes:
# Validate changes before deploying tb deploy --check # Deploy changes tb deploy
Tinybird automatically handles backfills and schema migrations based on your changes.
Learn more about Schema Evolution and Deployments.
Integrating Tinybird into your application¶
Data ingestion services¶
Tinybird provides multiple ways to ingest data. You do not need to set up ingestion infrastructure or use INSERT
statements. Tinybird offers several native ingestion connectors and APIs.
Events API¶
Best for real-time streaming data, and can be used to send data directly from the client to Tinyburd. Send JSON or NDJSON events via HTTP POST:
Events API ingestion
curl -X POST "https://api.tinybird.co/v0/events?name=user_events" \ -H "Authorization: Bearer YOUR_TOKEN" \ -H "Content-Type: application/json" \ -d '{"user_id": 123, "event_type": "page_view", "timestamp": "2024-01-15 10:30:00", "properties": "{\"page\": \"/dashboard\"}"}'
Learn more about the Events API.
Data Sources API¶
For batch ingestion of CSV, Parquet, or NDJSON files:
Batch file ingestion
curl -X POST "https://api.tinybird.co/v0/datasources?mode=append&name=user_events" \ -H "Authorization: Bearer YOUR_TOKEN" \ -F "csv=@user_events.csv"
Learn more about the Data Sources API.
Connectors¶
Managed integrations for popular data sources:
- Kafka: Real-time streaming with automatic schema detection
- S3/GCS: Automated file ingestion from cloud storage
Connectors handle authentication, schema mapping, and error handling automatically.
Learn more about Connectors.
Table functions¶
Tinybird supports ClickHouse table functions for specialized data ingestion and transformation:
- url(): Load data directly from HTTP/HTTPS URLs
- iceberg(): Import data from Apache Iceberg tables
- postgres(): Import data from PostgreSQL databases
- mysql(): Import data from MySQL databases
Example using postgres() to import data from PostgreSQL:
pipes/import_from_postgres.pipe
DESCRIPTION > Import user data from PostgreSQL database NODE postgres_import SQL > SELECT user_id, email, created_at, last_login FROM postgres( 'postgresql://username:password@hostname:5432/database', 'users' ) WHERE created_at >= yesterday() TYPE COPY TARGET_DATASOURCE user_data COPY_SCHEDULE "0 */6 * * *" -- Every 6 hours
This example creates a scheduled copy pipe that imports user data from a PostgreSQL table every 6 hours.
Learn more about table functions.
Query development¶
Tinybird offers flexible options for querying data, from direct SQL to API endpoints.
Pipes vs Query API¶
Pipes are the default approach for production applications:
- Type-safe parameters: Automatic validation and conversion
- Versioning: Track changes and roll back if needed
- Performance optimization: Automatic query optimization and caching
- Security: Fine-grained access control with tokens
Query API for direct SQL execution:
- Migration path: Easy transition from existing ClickHouse applications
- Ad-hoc analysis: Quick data exploration and debugging
- BI tools: Direct integration with SQL-based tools
Example API endpoint from a pipe:
endpoints/user_activity.pipe
TOKEN analytics_api READ DESCRIPTION > User activity API endpoint with authentication NODE user_metrics SQL > % SELECT user_id, event_type, count() AS event_count, max(timestamp) AS last_activity FROM user_events WHERE user_id = {{UInt32(user_id)}} AND timestamp >= {{DateTime(start_date, '2024-01-01 00:00:00')}} GROUP BY user_id, event_type ORDER BY event_count DESC TYPE ENDPOINT
Access the endpoint:
curl "https://api.tinybird.co/v0/pipes/user_activity.json?user_id=123&start_date=2024-01-01" \ -H "Authorization: Bearer READ_TOKEN"
Learn more about API Endpoints.
BI integration using ClickHouse interface¶
Connect your existing BI tools using the ClickHouse HTTP interface:
Grafana configuration:
- Host:
clickhouse.tinybird.co
- Port:
443
(HTTPS) - Username: Your Workspace name
- Password: Your Tinybird token
Supported tools:
- Grafana
- Metabase
- Apache Superset
- Tableau (via ClickHouse ODBC driver)
- Any tool supporting ClickHouse HTTP protocol
The interface exposes four databases:
- organization: for the Organization data sources, like
organization.workspaces
,organization.pipe_stats_rt
- system: for
system.tables
,system.columns
- tinybird: for the workspace service data sources like
tinybird.datasources_ops_log
,tinybird.pipe_stats_rt
- your workspace name: for your workspace data sources.
See an example Grafana integration.
Deployment options¶
Tinybird offers flexible deployment options to meet your infrastructure and compliance requirements.
Tinybird Cloud hosting¶
Fully-managed service with global regions:
- No infrastructure management: Tinybird handles all operations
- Global availability: Multiple regions across AWS and GCP
- Automatic scaling: Infrastructure scales as needed based on selected plan
- Enterprise features: SSO, audit logs, advanced security controls
Tinybird self-managed regions¶
Bring Your Own Cloud deployment for enhanced control:
- Data sovereignty: Keep data in your cloud environment
- Network isolation: Deploy within your VPC/VNet
- Compliance: Meet specific regulatory requirements
- Custom configurations: Tailored performance optimizations
Benefits of self-managed regions:
- Complete control over infrastructure and data location
- Integration with existing cloud commitments and reserved instances
- Direct connection to internal data sources without network egress
- Custom security policies and network configurations
Deployment process:
- Register a self-managed region with Tinybird Cloud
- Deploy Tinybird infrastructure in your cloud environment
- Connect and manage through Tinybird CLI with
--region
flag
Self-managed regions are currently in beta. Contact Tinybird support for deployment assistance and requirements.
Learn more about self-managed Regions.
Open source projects using Tinybird¶
Many successful open source projects have adopted Tinybird for their analytics needs:
Ghost: Publishing platform using Tinybird for member analytics and engagement metrics
Papermark: Document sharing platform with real-time analytics on document views and interactions
Inbox Zero: Email management tool leveraging Tinybird for email analytics and automation insights
LocalStack: Cloud development platform using Tinybird for usage analytics and performance monitoring
OpenStatus: Website monitoring platform built on Tinybird for real-time uptime and performance analytics
Dub: Link management platform using Tinybird for click analytics and link performance tracking
Each leverages Tinybird's real-time capabilities, API-first architecture, and managed infrastructure to focus on their core product instead of analytics infrastructure, while offering the flexibility between hosted and self-hosted deployments..
Migration strategy¶
To successfully migrate from ClickHouse self-hosted to Tinybird:
1. Assessment and planning¶
- Inventory your current setup: Document tables, views, materialized views, and queries
- Identify integration points: Map how your application currently interacts with ClickHouse
- Plan data migration: Choose between full migration or gradual transition
2. Local development setup¶
# Install Tinybird CLI curl https://tinybird.co | sh # Authenticate tb login # Start local development environment tb local start
3. Schema translation¶
- Convert ClickHouse tables to
.datasource
files - Transform views and materialized views to
.pipe
files - Set up materialized views and copy pipes for aggregations
4. Data migration¶
- Use Data Sources API for bulk historical data
- Set up real-time connectors for ongoing ingestion
- Validate data consistency between systems
5. Application integration¶
- Replace direct ClickHouse queries with Tinybird API endpoints or requests to the Query API
- Update authentication to use Tinybird tokens
- Configure BI tools to use Tinybird's ClickHouse interface
6. Production deployment¶
- Deploy to Tinybird Cloud or set up self-managed region
- Configure monitoring and alerting via Tinybird service data sources
- Gradually shift traffic from ClickHouse to Tinybird
Next steps¶
Ready to start your migration? Here are some immediate next steps:
- Sign up for a free Tinybird account and explore the platform
- Follow the quick start guide to understand core concepts
- Install the CLI and set up local development
- Join our community for migration support and best practices
For complex migrations or enterprise requirements, contact our team for personalized assistance and migration planning.