ClickHouse® + Snowflake — 3 Ways to Connect in {{ year }}
These are the main options for a ClickHouse® integration Snowflake pipeline:
- Tinybird
- ClickHouse® Cloud + ClickPipes (S3 or Kafka source)
- Self-managed or custom (Snowflake Streams + Kafka, or batch export via COPY INTO S3)
Snowflake is a cloud data warehouse used for batch analytics, BI workloads, and structured data storage. Many teams want a copy of that data in ClickHouse® for sub-100ms analytical queries, real-time APIs, and user-facing analytics that Snowflake's job-pool model cannot support at scale.
A ClickHouse® integration Snowflake setup uses scheduled syncs, Snowflake Streams + Kafka, or S3 staging to move Snowflake data into ClickHouse® for real-time analytics at a fraction of Snowflake's query cost.
Below we compare all three options in depth—architecture, real code examples, trade-offs, and when to use each.
Looking for minimal ops and instant APIs?
Tinybird has a native Snowflake Connector—scheduled sync directly from Snowflake tables, managed ClickHouse®, and one-click API publishing from SQL. Your Snowflake to ClickHouse® pipeline can power real-time APIs at roughly 1/10th the cost of running those queries directly in Snowflake.
Three ways to implement ClickHouse® integration Snowflake
This section is the core: the three options to connect Snowflake to ClickHouse®, in order.
Option 1: Tinybird — native Snowflake Connector with API layer
Tinybird is a real-time data platform built on ClickHouse® with a native Snowflake Connector. It is the only option that requires zero pipeline infrastructure: no S3 buckets, no Kafka clusters, no custom extraction scripts.
How it works: authenticate Tinybird with your Snowflake account using the one-time SQL snippet Tinybird generates. Select a table or view and set a sync schedule (interval or cron expression). Tinybird runs a background job that extracts the delta from Snowflake and ingests it into a ClickHouse®-backed data source.
You then define Pipes (SQL) and publish them as REST endpoints with a single click.
One-time Snowflake setup (Tinybird-generated SQL):
-- Create a dedicated role and user for Tinybird
CREATE ROLE tinybird_role;
CREATE USER tinybird_user
PASSWORD = '<password>'
DEFAULT_ROLE = tinybird_role;
GRANT ROLE tinybird_role TO USER tinybird_user;
-- Grant access to your database and schema
GRANT USAGE ON DATABASE analytics TO ROLE tinybird_role;
GRANT USAGE ON SCHEMA analytics.public TO ROLE tinybird_role;
GRANT SELECT ON TABLE analytics.public.orders TO ROLE tinybird_role;
-- Create GCS integration (for Tinybird's internal staging)
CREATE STORAGE INTEGRATION tinybird_gcs_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = GCS
ENABLED = TRUE
STORAGE_ALLOWED_LOCATIONS = ('gcs://tinybird-staging/');
Tinybird handles the extraction, staging, and ingestion automatically after this one-time setup.
When Tinybird fits:
- You want getting Snowflake data into ClickHouse® with zero pipeline infrastructure
- You need low-latency APIs and real-time dashboards on top of Snowflake data
- You want a Snowflake to ClickHouse® pipeline that costs significantly less than running queries in Snowflake
Prerequisites: a Snowflake account (any cloud region), a Tinybird workspace, and the one-time SQL setup above. Data flows: Snowflake → Tinybird background job → Tinybird data sources → Pipes → REST API.
Option 2: ClickHouse® Cloud + ClickPipes (S3 or Kafka source)
ClickHouse® Cloud's ClickPipes supports S3 and Kafka as data sources. There is no native Snowflake connector.
How it works (S3 path): configure a Snowflake Task that runs COPY INTO @external_stage on a schedule, exporting to an S3 bucket in Parquet format. Then create an S3 ClickPipe in the ClickHouse® Cloud console.
Snowflake setup — export to S3 on schedule:
-- Create file format and external stage
CREATE FILE FORMAT parquet_format TYPE = PARQUET;
CREATE OR REPLACE STAGE s3_stage
URL = 's3://my-bucket/snowflake-export/'
CREDENTIALS = (AWS_KEY_ID = '<key>' AWS_SECRET_KEY = '<secret>')
FILE_FORMAT = parquet_format;
-- Create a scheduled Task to run every 15 minutes
CREATE OR REPLACE TASK export_orders_task
WAREHOUSE = COMPUTE_WH
SCHEDULE = '15 MINUTE'
AS
COPY INTO @s3_stage/orders/
FROM (SELECT * FROM orders WHERE updated_at > DATEADD(minute, -15, CURRENT_TIMESTAMP()))
MAX_FILE_SIZE = 157286400 -- 150 MB per file
HEADER = TRUE;
ALTER TASK export_orders_task RESUME;
Load from S3 into ClickHouse® Cloud (one-time or via ClickPipes):
-- Direct S3 load (one-time)
INSERT INTO orders
SELECT * FROM s3(
'https://my-bucket.s3.amazonaws.com/snowflake-export/orders/*.parquet',
'<aws_key>', '<aws_secret>',
'Parquet'
)
SETTINGS
input_format_null_as_default = 1,
input_format_parquet_case_insensitive_column_matching = 1;
For near-real-time CDC: use the Snowflake Kafka Connector to publish table changes to Kafka, then create a Kafka ClickPipe in ClickHouse® Cloud.
When it fits:
- You want managed ClickHouse® and control the export mechanism yourself
- You already have S3 staging infrastructure or Kafka running
- Your main need is set up ClickHouse® integration Snowflake for analytical replication
Prerequisites: Snowflake Task configured to export to S3, or Snowflake Kafka Connector publishing to Kafka. Data flows: Snowflake → S3 (or Kafka) → ClickPipes → ClickHouse® Cloud.
Option 3: Self-managed or custom (Snowflake Streams + Kafka, or batch S3)
With self-managed ClickHouse®, the common batch pattern is: a Snowflake Task exports to S3 via COPY INTO @external_stage, then ClickHouse® loads via the S3 table function.
For near-real-time CDC: Snowflake Streams capture row-level changes (inserts, updates, deletes). A Snowflake Task reads the stream and publishes to Kafka via the Snowflake Kafka Connector. ClickHouse® consumes via the Kafka table engine + materialized view → MergeTree.
ClickHouse® CDC via Kafka table engine + Snowflake Streams:
-- Kafka engine reads CDC from Snowflake Streams → Kafka
CREATE TABLE orders_kafka
(
order_id UInt64,
customer_id UInt64,
amount Float64,
status LowCardinality(String),
created_at DateTime,
METADATA$ACTION LowCardinality(String), -- INSERT / DELETE
METADATA$ISUPDATE UInt8
)
ENGINE = Kafka
SETTINGS
kafka_broker_list = 'kafka:9092',
kafka_topic_list = 'snowflake-orders-cdc',
kafka_group_name = 'clickhouse_snowflake_consumer',
kafka_format = 'JSONEachRow';
-- Target table with ReplacingMergeTree for upserts
CREATE TABLE orders
(
order_id UInt64,
customer_id UInt64,
amount Float64,
status LowCardinality(String),
created_at DateTime,
updated_at DateTime DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(created_at)
ORDER BY (customer_id, order_id);
CREATE MATERIALIZED VIEW orders_mv TO orders AS
SELECT
order_id, customer_id, amount, status, created_at, now() AS updated_at
FROM orders_kafka
WHERE METADATA$ACTION = 'INSERT' OR METADATA$ISUPDATE = 1;
When it fits:
- You already run ClickHouse® and want full control over the pipeline
- You have data-engineering capacity to operate Snowflake, Kafka, and ClickHouse®
- Batch S3 export fits when near-real-time is not required
Decision framework: which option fits your situation
The right choice depends on four variables: freshness requirement, team capacity, whether you need an API layer, and cost sensitivity.
| Situation | Recommended option |
|---|---|
| Real-time APIs, minimal ops, no pipeline infra | Tinybird |
| Already on ClickHouse® Cloud, have S3/Kafka | ClickPipes |
| Self-managed ClickHouse®, CDC required | Self-managed + Snowflake Streams |
| One-time migration or batch analytics | S3 export + S3 table function |
| Cost is driving the move from Snowflake | Tinybird (10× cheaper at high concurrency) |
Choose Tinybird when you need real-time data ingestion from Snowflake and REST APIs from the same data—without managing any pipeline infrastructure.
Choose ClickPipes when you're already on ClickHouse® Cloud and have S3 or Kafka configured. You get managed ClickHouse® ingestion but build your own serving layer.
Choose self-managed when your team has data-engineering capacity to operate Snowflake, Kafka, and ClickHouse®—and needs full control over CDC semantics and schema mapping.
Summary table
| Option | Ingestion path | API layer | Ops burden | Infrastructure required |
|---|---|---|---|---|
| Tinybird | Native Snowflake Connector | Built in (Pipes) | Low | None |
| ClickHouse® Cloud ClickPipes | S3 or Kafka | Build your own | Medium | S3 bucket or Kafka |
| Self-managed | S3 or Snowflake Streams → Kafka | Build your own | High | S3, Kafka, ClickHouse® |
What is Snowflake and why integrate it with ClickHouse®?
Snowflake as the data source
Snowflake is a cloud-agnostic data warehouse available on AWS, GCP, and Azure. It separates compute from storage, scales virtual warehouses independently, and supports diverse analytical workloads through standard SQL.
Teams use it for BI dashboards, ETL pipelines, data sharing, and structured analytics. Snowflake handles batch workloads well.
But Snowflake is not optimized for user-facing real-time APIs. Its job pool model introduces non-deterministic latency—queries queue before execution. Query concurrency is limited per virtual warehouse. For user-facing analytics needing sub-100ms responses at high concurrency, Snowflake's architecture creates a bottleneck.
How to get data out of Snowflake
COPY INTO @external_stage exports query results to S3, GCS, or Azure Blob in CSV, JSON, or Parquet format. Parquet is preferred—it preserves type information, compresses efficiently, and supports nested structures.
Snowflake Streams capture table-level changes (inserts, updates, deletes) for CDC-style replication. The Snowflake Kafka Connector reads Snowflake table data and publishes to Kafka topics for near-real-time replication.
Snowflake Connector for Python and the JDBC/ODBC drivers support direct SQL extraction for custom pipelines.
Why route Snowflake data to ClickHouse®
Snowflake processes queries in a job pool with shared-compute overhead. For 100 queries per minute at 10 MB per query over 1 TB of data, Snowflake requires multiple virtual warehouses running 24/7—costing roughly $30K/month.
Tinybird's ClickHouse®-backed platform for the same workload costs roughly $3K/month (10× cheaper). You keep Snowflake as the source of truth for batch analytics and use ClickHouse® as the real-time data processing and serving layer.
Schema and pipeline design
Mapping Snowflake types to ClickHouse® columns
Snowflake uses micro-partitions internally with a flexible VARIANT type for semi-structured data. In ClickHouse® you define an explicit schema with typed columns.
Key type mappings:
-- Snowflake → ClickHouse® type mapping
-- NUMBER(38,0) → Int64 or UInt64
-- NUMBER(18,6) → Float64
-- VARCHAR → String or LowCardinality(String)
-- BOOLEAN → UInt8
-- TIMESTAMP_NTZ → DateTime64(3)
-- DATE → Date
-- VARIANT → String (JSON), extract at query time
CREATE TABLE orders
(
order_id UInt64,
customer_id UInt64,
amount Float64,
status LowCardinality(String), -- Low-cardinality VARCHAR
metadata String, -- VARIANT → JSON String
created_at DateTime64(3),
updated_at DateTime64(3)
)
ENGINE = ReplacingMergeTree(updated_at)
PARTITION BY toYYYYMM(created_at)
ORDER BY (customer_id, order_id);
For VARIANT columns, extract fields at query time using ClickHouse®'s JSON functions:
SELECT
order_id,
JSONExtractString(metadata, 'shipping_carrier') AS carrier,
JSONExtractFloat(metadata, 'shipping_cost') AS shipping_cost
FROM orders
WHERE created_at >= today() - INTERVAL 7 DAY;
Incremental sync strategy
Full table exports from Snowflake are slow and consume credits. After the initial load, use incremental sync by delta:
-- Snowflake: export only rows updated since last sync
COPY INTO @s3_stage/orders/
FROM (
SELECT * FROM orders
WHERE updated_at > '{{ last_sync_timestamp }}'
)
MAX_FILE_SIZE = 157286400
HEADER = TRUE;
Tinybird's native Snowflake Connector handles incremental sync automatically using a watermark column.
Failure modes to plan for
- Sync failures: Tinybird's Snowflake Connector logs each sync job with success/failure status. Configure alerts via
datasource_ops_log. - Schema drift: new Snowflake columns require updating the ClickHouse® schema. Plan for nullable columns and monitor with
INFORMATION_SCHEMA.COLUMNS. - Stale data windows: scheduled syncs have a lag equal to your sync interval. For sub-interval freshness, use Snowflake Streams + Kafka for CDC.
- Large initial loads: full table exports from Snowflake take time and consume credits. Use incremental sync after the initial load.
Why ClickHouse® for Snowflake analytics
ClickHouse® is a columnar OLAP database built for analytical queries over large volumes. MergeTree tables and vectorized execution deliver sub-second queries on billions of rows.
Where Snowflake processes queries in a job pool with shared-compute overhead, ClickHouse® responds in 62ms p90 for real-time dashboards serving thousands of concurrent users (Resend case study). This is the fundamental difference: Snowflake is built for warehousing, ClickHouse® is built for serving.
A ClickHouse® integration Snowflake setup fits user-facing analytics, real-time APIs over warehouse data, and workloads where Snowflake's concurrency and latency limits are a bottleneck.
Why Tinybird is the best Snowflake to ClickHouse® option
Most teams moving from Snowflake to a real-time serving layer don't want to build a pipeline—they want fast APIs on Snowflake data.
Tinybird is purpose-built for this. The native Snowflake Connector eliminates S3 staging, Kafka clusters, and custom extraction scripts. Authenticate Tinybird with Snowflake, select a table, set a sync schedule—done.
Tinybird handles extraction, ingestion, storage, and API publishing in one product. Define Pipes in SQL, publish as REST endpoints, and serve dashboards or product features at sub-100ms latency and automatic scaling.
The cost advantage is real: 100 queries per minute at 10 MB per query costs ~$30K/month in Snowflake virtual warehouses. The same workload on Tinybird costs ~$3K/month. Exploratory queries in Tinybird are free. You keep Snowflake as the source of truth and use Tinybird for the real-time serving layer.
Frequently Asked Questions (FAQs)
Does ClickHouse® Cloud support Snowflake natively?
ClickHouse® Cloud does not have a native Snowflake connector in ClickPipes. You use S3 as the staging layer: configure a Snowflake Task that exports data to an S3 bucket via COPY INTO, then create an S3 ClickPipe to load into ClickHouse® Cloud.
Alternatively, use the Snowflake Kafka Connector to publish table data to Kafka, then a Kafka ClickPipe. You operate the Snowflake → S3 or Snowflake → Kafka path; ClickHouse® Cloud ingests from S3 or Kafka.
Can I use Tinybird for Snowflake to ClickHouse® without building a pipeline?
Yes—this is Tinybird's native Snowflake Connector use case. Authenticate Tinybird with your Snowflake account using the one-time SQL snippet, select a table or view, set a sync schedule (interval or cron), and Tinybird handles extraction, ingestion, and storage in ClickHouse®-backed data sources.
No S3 buckets, no Kafka clusters, no custom extraction scripts required.
How much cheaper is ClickHouse® vs Snowflake for real-time serving?
Significantly. For 100 queries per minute at 10 MB per query over 1 TB of data, Snowflake requires multiple virtual warehouses running 24/7—costing roughly $30K/month.
Tinybird's ClickHouse®-backed platform for the same workload costs roughly $3K/month (10× cheaper). Exploratory queries in Tinybird are free. You keep Snowflake for batch analytics and use Tinybird for real-time data ingestion and the serving layer.
What is the difference between Snowflake COPY INTO and Snowflake Streams?
COPY INTO @external_stage exports a snapshot of your data to S3 or GCS on a schedule. It is simple and works for batch analytics, but lag equals your export schedule (minutes to hours). Deletes and updates are not captured—only the latest state.
Snowflake Streams capture row-level changes (inserts, updates, deletes) in near real-time. Combined with the Snowflake Kafka Connector, they enable CDC-style replication into ClickHouse® with seconds of lag. Use Streams when you need accurate upsert/delete propagation and sub-minute freshness.
How do I handle Snowflake VARIANT columns in ClickHouse®?
Map VARIANT columns to String in ClickHouse® and store them as JSON. Use ClickHouse®'s JSON functions (JSONExtractString, JSONExtractFloat, JSONExtractArrayRaw) to extract fields at query time.
For frequently-accessed fields, extract them at ingest time using a Tinybird Pipe transformation or a ClickHouse® materialized view that extracts JSON fields into typed columns—improving query performance significantly.
Can I do real-time CDC from Snowflake into ClickHouse®?
Yes, using Snowflake Streams + the Snowflake Kafka Connector. Snowflake Streams capture inserts, updates, and deletes from a table. A Snowflake Task reads the stream and publishes changes to Kafka. ClickHouse® consumes via the Kafka table engine + materialized view into ReplacingMergeTree.
This provides CDC-style replication with seconds of latency. Deletes require application-level handling since ClickHouse® doesn't support true row deletion in MergeTree—use a soft-delete column and filter at query time.
