---
title: Datasource files
meta:
  description: Datasource files describe your data sources. Define the schema, engine, and other settings.
---

# Datasource files (.datasource)

Datasource files describe your data sources. You can use .datasource files to define the schema, engine, and other settings of your data sources.

## Available instructions

The following instructions are available for .datasource files.

{% table %}
   * Declaration
   * Required
   * Description
   ---
   * `SCHEMA <indented_schema_definition>`
   * Yes
   * Defines a block for a data source schema. The block must be indented.
   ---
   * `ENGINE <engine_type>`
   * No
   * Sets the engine for data source. Default value is `MergeTree`.
   ---
   * `ENGINE_SORTING_KEY <sql>`
   * No
   * Sets the `ORDER BY` expression for the data source.
   ---
   * `ENGINE_PARTITION_KEY <sql>`
   * No
   * Sets the `PARTITION` expression for the data source. **Use monthly (`toYYYYMM(date_column)`) or yearly (`toYear(date_column)`) partitions only.** Overly granular partition keys severely degrade write performance. See [MergeTree engine](/sql-reference/engines/mergetree#engine-settings) for details.
   ---
   * `ENGINE_TTL <sql>`
   * No
   * Sets the `TTL` expression for the data source.
   ---
   * `ENGINE_VER <column_name>`
   * No
   * Column with the version of the object state. Required when using `ENGINE ReplacingMergeTree`.
   ---
   * `ENGINE_SIGN <column_name>`
   * No
   * Column to compute the state. Required when using `ENGINE CollapsingMergeTree` or `ENGINE VersionedCollapsingMergeTree`. 
   ---
   * `ENGINE_VERSION <column_name>`
   * No
   * Column with the version of the object state. Required when `ENGINE VersionedCollapsingMergeTree`. 
   ---
   * `ENGINE_SETTINGS <settings>`
   * No
   * Comma-separated list of key-value pairs that describe engine settings for the data source. 
   ---
   * `FORWARD_QUERY <sql>`
   * No
   * Defines a query to execute on the data source. The results of the query are returned instead of the original schema defined in the `SCHEMA` declaration. See [Evolve data sources](/forward/test-and-deploy/evolve-data-source#forward-query).
   ---
   * `BACKFILL skip`
   * No
   * Skips the initial data backfill when creating a new Data Source that is the destination for a Materialized View. This instruction is ignored if the Data Source already exists.
   ---
   * `TOKEN <token_name> READ|APPEND`
   * No
   * Grants read or append access to a datasource to the token named <token_name>. If the token isn't specified or <token_name> doesn't exist, it will be automatically created.
   ---
   * `SHARED_WITH <workspace_name>`
   * No
   * Shares the Data Source with one or more Workspaces. Workspaces need to be in the same organization.
{% /table %}

The following example shows a typical .datasource file:

```tb {% title="tinybird/datasources/example.datasource" %}
# A comment
SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE "MergeTree"
ENGINE_PARTITION_KEY "toYYYYMM(timestamp)"
ENGINE_SORTING_KEY "timestamp"
ENGINE_TTL "timestamp + toIntervalDay(60)"
ENGINE_SETTINGS "index_granularity=8192"
```

### Schema

A `SCHEMA` declaration is a newline, comma-separated list of columns definitions. For example:

```tb {% title="Example SCHEMA declaration" %}
SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` String `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`
```

#### Column definition

Each column in a `SCHEMA` declaration has the following format:

```tb
`<column_name>` <data_type> `<json_path>` DEFAULT <default_value> CODEC(<codec>)
```

{% table %}
   * Field
   * Required
   * Description
   ---
   * `<column_name>`
   * Yes
   * The name of the column in the data source.
   ---
   * `<data_type>`
   * Yes
   * One of the supported [Data types](/sql-reference/data-types).
   ---
   * `<json_path>`
   * No
   * JSONPath expression for NDJSON or Parquet data. See [JSONPath expressions](#jsonpath-expressions).
   ---
   * `DEFAULT <default_value>`
   * No
   * Sets a default value for the column when the value is null or missing.
   ---
   * `CODEC(<codec>)`
   * No
   * Overrides the default compression codec. See [Column compression codecs](#column-compression-codecs).
{% /table %}

The following example shows a column definition using all available fields:

```tb {% title="Complete column definition example" %}
SCHEMA >
    `timestamp` DateTime64(3) `json:$.timestamp` DEFAULT now() CODEC(DoubleDelta, ZSTD(1)),
    `session_id` String `json:$.session_id` DEFAULT '' CODEC(ZSTD(1)),
    `status` LowCardinality(String) `json:$.status` DEFAULT 'unknown',
    `count` Int32 `json:$.count` DEFAULT 0,
    `payload` String `json:$`
```

#### Default values

Use `DEFAULT` to set a value for a column when the incoming data is null or missing. This is useful for:

- Providing fallback values for optional fields
- Setting automatic timestamps with `now()`
- Avoiding nullable types when a default makes sense

```tb {% title="Example with DEFAULT values" %}
SCHEMA >
    `timestamp` DateTime DEFAULT now(),
    `status` String DEFAULT 'pending',
    `count` Int32 DEFAULT 0,
    `is_active` UInt8 DEFAULT 1
```

#### JSONPath expressions

`SCHEMA` definitions need JSONPath expressions when working with Parquet or NDJSON data.

It supports base fields `json:$.field`, arrays `json:$.an_array[:]`, nested fields `json:$.nested.nested_field`, and storing the whole object `json:$`.

For example, given this JSON object:

```json
{
  "field": "test",
  "nested": { "nested_field": "bla" },
  "an_array": [1, 2, 3],
  "a_nested_array": { "nested_array": [1, 2, 3] }
}
```

The schema would be something like this:

```tb {% title="jsonpaths.datasource" %}
SCHEMA >
    field String `json:$.field`,
    nested_nested_field String `json:$.nested.nested_field`,
    an_array Array(Int16) `json:$.an_array[:]`,
    a_nested_array_nested_array Array(Int16) `json:$.a_nested_array.nested_array[:]`,
    whole_message String `json:$`
```

Use brackets for attributes with dots that are not actual nested attributes, like this:

```json
{
    "attributes": {
        "otel.attributes": {
            "cli_command": "datasource ls",
        }
    }
}
```

```tb {% title="jsonpaths.datasource" %}
SCHEMA >
    cli_command String `json:$.attributes.['otel.attributes'].cli_command`
```

{% callout %}
Tinybird's JSONPath syntax support has some limitations: It support nested objects at multiple levels, but it __supports nested arrays only at the first level__, as in the example before. To ingest and transform more complex JSON objects, store the whole JSON as a String "<column_name> String `json:$`", and use [JSONExtract functions](/sql-reference/functions/json-functions#jsonextract-functions) to parse at query time or in materializations.
{% /callout %}

#### Column compression codecs

Tinybird applies compression codecs to data types to optimize storage. You can override the default compression codecs by adding the `CODEC(<codec>)` statement after the type declarations in your .datasource schema. For example:

```tb
SCHEMA >
    `product_id` Int32 `json:$.product_id`,
    `timestamp` DateTime64(3) `json:$.timestamp` CODEC(DoubleDelta, ZSTD(1)),
```

### Engine settings

`ENGINE` declares the engine used for the data source. The default value is `MergeTree`.

See [Engines](/sql-reference/engines) for more information.

### Connector settings

A data source file can contain connector settings for certain type of sources, such as Kafka or S3. See [Connectors](/forward/get-data-in/connectors).

## Forward query

If you make changes to a .datasource file that are incompatible with the live version, you must use the `FORWARD_QUERY` instruction to transform the data from the live schema to the new one. Otherwise, your deployment will fail due to a schema mismatch.

{% callout type="warning" %}
You can't use `ALTER` to change the source of a Materialized View. To do so, you must drop the Materialized View and recreate it with the new source. To prevent data loss during this process, add a `SELECT *` `FORWARD_QUERY` to the destination `.datasource` file.
{% /callout %}

See [Evolve data sources](/forward/test-and-deploy/evolve-data-source#forward-query) for more information.
