Snowflake Connector

The Snowflake Connector allows you to load data from your existing Snowflake account into Tinybird.

The Snowflake Connector is fully managed and requires no additional tooling. You can define a sync schedule inside Tinybird and execution is taken care of for you.

Loading a Snowflake table

Loading a Snowflake table in the UI

Open the Tinybird UI and add a new Data Source by clicking the + icon next to the Data Sources section on the left hand side navigation bar (see Mark 1 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-1.png

You will see the Data Sources modal appear. In the modal, click on the Snowflake connector.

../_images/ingest-snowflake-connector-sync-first-table-ui-2.png

In the next step you will need your Snowflake Account Identifier. To find this, log into Snowflake, find the account info in the bottom left of the screen (see Mark 1 below) and then copy the Account Identifier (see Mark 2 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-2_5.png

You will now see the Connection details modal. In this modal, you can configure authentication with your Snowflake. Enter your User, Password and Account Identifier into the form (see Mark 1 below).

Click Connect when you’re done (see Mark 2 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-3.png

The next screen allows you to configure access to your Snowflake resources. First, select the role (see Mark 1 below) and Warehouse (see Mark 2 below) that Tinybird should use to access your data. Next, copy the SQL snippet from the text box (see Mark 3 below).

This SQL snippet creates a new Snowflake Storage Integration linking your Snowflake account with a Tinybird staging area for your workspace.

It also grants permission to the given role to create new Stages to unload data from your Snowflake Account into Tinybird.

../_images/ingest-snowflake-connector-sync-first-table-ui-4.png

With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor (see Mark 1 below). Note that you must edit the query and replace the <your_database> fragment with the name of your Snowflake database. Finally, press the Run button (see Mark 2 below). This statement must be executed with a Snowflake ACCOUNTADMIN role, since Snowflake Integrations operate at Account level and usually need admin permissions.

Note: you must edit the query and replace the <your_database> fragment with the name of your Snowflake database.

../_images/ingest-snowflake-connector-sync-first-table-ui-5.png

You should see the response Statement executed successfully.

../_images/ingest-snowflake-connector-sync-first-table-ui-6.png

Return to your Tinybird tab to resume configuring the Snowflake connector. Set a name for the Snowflake connection (see Mark 4 below). Complete this step by clicking Next (see Mark 5 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-4.png

At this point, the Snowflake Connector should have enough permissions to instrospect your Snowflake objects available to the given role. The next screen will let you browse the tables available in Snowflake, and select the table you wish to load.

Start by selecting the database to which the table belongs (see Mark 1 below), then the schema (see Mark 2 below) and finally the table (see Mark 3 below). Finish by clicking Next (see Mark 4 below).

Note: the maximum allowed table size is 50 million rows, the result will be truncated if it exceeds that limit.

../_images/ingest-snowflake-connector-sync-first-table-ui-8.png

You can now configure the schedule on which you wish to load data. By default, the frequency is set to One-off which performs a one-time sync of the table. You can change this by selecting a different option from the dropdown menu (see Mark 1 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-9.png

To configure a schedule that runs a regular sync, select the Interval option. You can configure a schedule in minutes, hours, or days by using the drop down selector, and set the value for the schedule in the text field. The screenshot below shows a schedule of 5 minutes.

Next, you can choose whether the sync should run immediately, or if it should wait until the first scheduled sync. You can also configure the Import Strategy. The strategy Replace data is selected by default. Finish by clicking Next (see Mark 1 below).

Note: the maximum allowed frequency is 5 minutes.

../_images/ingest-snowflake-connector-sync-first-table-ui-10.png

The final screen of the modal shows you the interpreted schema of the table, which you can change as needed. You can also modify what the Data Source in Tinybird will be called by changing the name at the top. To finish, click Create Data Source (see Mark 1 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-11.png

Your Data Source has been created. You will be taken to the Data Source data page, where you can see a status chart showing executions of the loading schedule (see Mark 1 below). The Data Source will take a moment to create the resources required to perform the first sync.

../_images/ingest-snowflake-connector-sync-first-table-ui-12.png

When the first sync has completed, you’ll see a green bar indicating the status (see Mark 1 below). Details about the data, such as storage size and number of rows, is shown above (see Mark 2 below). You can see a preview of the data below this (see Mark 3 below).

../_images/ingest-snowflake-connector-sync-first-table-ui-13.png

Loading a Snowflake table in the CLI

To load a Snowflake table to Tinybird using the CLI, you first need to create a connection. Creating a connection will grant your Tinybird Workspace the appropriate permissions to view data from Snowflake.

Authenticate your CLI, and switch to the desired Workspace. Then run:

tb connection create snowflake

In the output of this command, you will find instructions to configure read only access to your data in Snowflake.

You will be prompted to enter your user, password, account identifier, role, warehouse and a name for the connection.

Lastly, you will see a block of SQL. Before proceeding, copy this this SQL.

** Creating a new Snowflake connection at the xxxx workspace.

User (must have create stage and create integration in Snowflake):
Password:
Account identifier:
Role (optional):
Warehouse (optional):
Connection name (optional, current xxxx):

Enter this SQL statement in Snowflake using your admin account to create the connection:

------
create storage integration if not exists "tinybird_integration_role"
    type = external_stage
    storage_provider = 'GCS'
    enabled = true
    comment = 'Tinybird Snowflake Connector Integration'
    storage_allowed_locations = ('gcs://tinybird-cdk-production-europe-west3/id');

grant create stage on all schemas in database <your_database> to role ACCOUNTADMIN;

grant ownership on integration "tinybird_integration_ACCOUNTADMIN" to role ACCOUNTADMIN;
------

Ready?  (y, N):

** Validating connection...
** xxxx.connection created successfully! Connection details saved into the .env file and referenced automatically in your connection file.

With the SQL query copied, open a new SQL Worksheet inside Snowflake. Paste the SQL into the Worksheet query editor (see Mark 1 below). Note that you must edit the query and replace the <your_database> fragment with the name of your Snowflake database. Finally, press the Run button (see Mark 2 below). This statement must be executed with a Snowflake ACCOUNTADMIN role, since Snowflake Integrations operate at Account level and usually need admin permissions.

Note: you must edit the query and replace the <your_database> fragment with the name of your Snowflake database.

../_images/ingest-snowflake-connector-sync-first-table-ui-5.png

You should see the response Statement executed successfully.

../_images/ingest-snowflake-connector-sync-first-table-ui-6.png

Once done, return to your terminal. Select yes (y) option and the connection will be created.

A new snowflake.connection file will be created in your project files.

Note: At the moment, the .connection file is not used and cannot be pushed to Tinybird. It is safe to delete this file. A future release will allow you to push this file to Tinybird to automate creation of connections, similar to Kafka connection.

Now that your connection is created, you can create a Data Source and configure the schedule to import data from Snowflake.

The Snowflake import is configured using the following options, which can be added at the end of your .datasource file:

  • IMPORT_SERVICE: name of the import service to use, in this case, snowflake

  • IMPORT_CONNECTION_NAME: the name given to the Snowflake connection inside Tinybird, e.g. 'my_connection'

  • IMPORT_EXTERNAL_DATASOURCE: the fully qualified name of the source table in Snowflake, e.g. database.schema.table

  • IMPORT_SCHEDULE: a cron expression with the frequency to run imports, must be higher than 5 minutes, e.g. */5 * * * *

  • IMPORT_STRATEGY: the strategy to use when inserting data, either REPLACE or APPEND

  • IMPORT_QUERY: (optional) the SELECT query to extract your data from Snowflake when you don’t need all the columns or want to make a transformation before ingestion. The FROM must reference a table using the full scope: database.schema.table

Note: For IMPORT_STRATEGY only REPLACE is supported today. The APPEND strategy will be enabled in a future release.

For example:

snowflake.datasource file
DESCRIPTION >
    Snowflake demo data source

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `id` Integer `json:$.id`,
    `orderid` LowCardinality(String) `json:$.orderid`,
    `status` LowCardinality(String) `json:$.status`,
    `amount` Integer `json:$.amount`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"

IMPORT_SERVICE snowflake
IMPORT_CONNECTION_NAME my_snowflake_connection
IMPORT_EXTERNAL_DATASOURCE mydb.raw.events
IMPORT_SCHEDULE */5 * * * *
IMPORT_STRATEGY REPLACE
IMPORT_QUERY >
    select
    timestamp,
    id,
    orderid,
    status,
    amount
        from
        mydb.raw.events

The columns you select in the IMPORT_QUERY must match the columns defined in the Data Source schema. For example, if you Data Source has columns ColumnA, ColumnB then your IMPORT_QUERY must contain SELECT ColumnA, ColumnB FROM .... A mismatch of columns will cause data to arrive in the quarantine Data Source.

With your connection created and Data Source defined, you can now push your project to Tinybird using:

tb push

The first run of the import will begin on the next lapse of the CRON expression.

Schema Evolution

The Snowflake Connector supports backwards compatible changes made in the source table. This means that, if you add a new column in Snowflake, the next sync job will automatically add it to the Tinybird Data Source.

Non-backwards compatible changes, such as dropping or renaming columns, are not supported and will cause the next sync to fail.

Limits

The import jobs are run in a pool, with capacity for up to 2 concurrent jobs. If more scheluded jobs overlap, they’ll be queued.

Snowflake Connector usage limits

Description

Limit and time window

Maximum frequency for the scheduled jobs

5 minutes

Maximum rows per append or replace

50 million rows