---
title: Organize files in data projects
meta:
  description: Learn how to best organize your Tinybird CLI files in versioned projects.
---

# Organize files in data projects

A data project is a set of files that describes how your data must be stored, processed, and exposed through APIs.

In the same way you maintain source code files in a repository, use a CI, make deployments, run tests, and so on, Tinybird provides tools to work following a similar pattern but with data pipelines. The source code of your project are the [datafiles](/classic/cli/datafiles) in Tinybird.

With a data project you can:

- Define how the data should flow, from schemas to API Endpoints.
- Manage your datafiles using version control.
- Branch your datafiles.
- Run tests.
- Deploy data projects.

## Ecommerce site example

Consider an ecommerce site where you have events from users and a list of products with their attributes. Your goal is to expose several API endpoints to return sales per day and top product per day.

The data project file structure would look like the following:

```
ecommerce_data_project/
    datasources/
        events.datasource
        products.datasource
        fixtures/
            events.csv
            products.csv
    pipes/
        top_product_per_day.pipe

    endpoints/
        sales.pipe
        top_products.pipe
```

To follow this tutorial, download and open the example using the following commands:

```shell {% title="Clone demo" %}
git clone https://github.com/tinybirdco/ecommerce_data_project.git
cd ecommerce_data_project
```

### Upload the project

You can push the whole project to your Tinybird account to check everything is fine. The `tb push` command uploads the data to Tinybird, previously checking the project dependencies and the SQL syntax. In this case, use the `--push-deps` flag to push everything:

```shell {% title="Push dependencies" %}
tb push --push-deps
```

After the upload completes, the endpoints defined in our project, `sales` and `top_products`, are available and you can start pushing data to the different Data Sources.

### Define Data Sources

Data Sources define how your data is ingested and stored. You can add data to Data Sources using the [Data Sources API](/api-reference/datasource-api).

Each Data Source is defined by a schema and other properties. See [Datasource files](/classic/cli/datafiles/datasource-files).

The following snippet shows the content of the `event.datasource` file from the ecommerce example:

```tb
DESCRIPTION >
    # Events from users
    This contains all the events produced by Kafka, there are 4 fixed columns.
    plus a `json` column which contains the rest of the data for that event.
    See [documentation](url_for_docs) for the different events.

SCHEMA >
    timestamp DateTime,
    product String,
    user_id String,
    action String
    json String

ENGINE MergeTree
ENGINE_SORTING_KEY timestamp
```

The file describes the schema and how the data is sorted. In this case, the access pattern is most of the time by the `timestamp` column. If no `SORTING_KEY` is set, Tinybird picks one by default, date or datetime columns in most cases.

To push the Data Source, run:

```shell {% title="Push the events Data Source" %}
tb push datasources/events.datasource
```

{% callout type="info" %}
You can't override Data Sources. If you try to push a Data Source that already exists in your account you get an error. To override a Data Source, remove it or upload a new one with a different name.
{% /callout %}

### Define data Pipes

The content of the `pipes/top_product_per_day.pipe` file creates a data Pipe that transforms the data as it's inserted:

```tb
NODE only_buy_events
DESCRIPTION >
    filters all the buy events

SQL >
    SELECT
        toDate(timestamp) date,
        product,
        JSONExtractFloat(json, 'price') AS price
    FROM events
    WHERE action = 'buy'


NODE top_per_day
SQL >
   SELECT date,
          topKState(10)(product) top_10,
          sumState(price) total_sales
    FROM only_buy_events
    GROUP BY date

TYPE materialized
DATASOURCE top_per_day_mv
ENGINE AggregatingMergeTree
ENGINE_SORTING_KEY date
```

Each Pipe can have one or more nodes. The previous Pipe defines two nodes, `only_buy_events` and `top_per_day`.

- The first node filters `buy` events and extracts some data from the `json` column.
- The second node runs the aggregation.

In general, use `NODE` to start a new node and then use `SQL >` to define the SQL for that Node. You can use other nodes inside the SQL. In this case, the second node uses the first one `only_buy_events`.

To push the Pipe, run:

```shell {% title="Populate" %}
tb push pipes/top_product_per_day.pipe --populate
```

If you want to populate with the existing data in `events` table, use the `--populate` flag.

{% callout type="info" %}
When using the `--populate` flag you get a job URL so you can check the status of the job by checking the URL provided. See [Populate and copy data](/classic/work-with-data/process-and-copy/populate-data) for more information on how populate jobs work.
{% /callout %}

### Define API Endpoints

API Endpoints are the way you expose the data to be consumed.

The following snippet shows the content of the `endpoints/top_products.pipe` file:

```tb
NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date > today() - interval 7 day
    GROUP BY date
```

The syntax is the same as in the data transformation Pipes, though you can access the results through the `{% user("apiHost") %}/v0/top_products.json?token=TOKEN` endpoint.

{% callout type="info" %}
When you push an endpoint a Token with `PIPE:READ` permissions is automatically created. You can see it from the [Tokens UI](https://app.tinybird.co/tokens) or directly from the CLI with the command `tb pipe token_read <endpoint_name>`.
{% /callout %}

Alternatively, you can use the `TOKEN token_name READ` command to automatically create a Token with name `token_name` with `READ` permissions over the endpoint or add `READ` permissions to the existing `token_name` over the endpoint. For example:

```tb
TOKEN public_read_token READ

NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date > today() - interval 7 day
    GROUP BY date
```

To push the endpoint, run:

```shell {% title="Push the top products Pipe" %}
tb push endpoints/top_products.pipe
```

{% callout type="info" %}
The Token `public_read_token` was created automatically and it's provided in the test URL.
{% /callout %}

You can add parameters to any endpoint. For example, parametrize the dates to be able to filter the data between two dates:

```tb
NODE endpoint
DESCRIPTION >
    returns top 10 products for the last week
SQL >
    %
    SELECT
        date,
        topKMerge(10)(top_10) AS top_10
    FROM top_per_day
    WHERE date between {{Date(start)}} AND {{Date(end)}}
    GRUP BY date
```

Now, the endpoint can receive `start` and `end` parameters: `{% user("apiHost") %}/v0/top_products.json?start=2018-09-07&end=2018-09-17&token=TOKEN`.

You can print the results from the CLI using the `pipe data` command. For instance, for the previous example:

```shell {% title="Print the results of the top products endpoint" %}
tb pipe data top_products --start '2018-09-07' --end '2018-09-17' --format CSV
```

For the parameters templating to work you need to start your NODE SQL definition using the character `%`.

### Override an endpoint or a data Pipe

When working on a project, you might need to push several versions of the same file. You can override a Pipe that has already been pushed using the `--force` flag. For example:

```shell {% title="Override the Pipe" %}
tb push endpoints/top_products_params.pipe --force
```

If the endpoint has been called before, it runs regression tests with the most frequent requests. If the new version doesn't return the same data, then it's not pushed. You can see in the example how to run all the requests tested.

You can force the push without running the checks using the `--no-check` flag if needed. For example:

```shell {% title="Force override" %}
tb push endpoints/top_products_params.pipe --force --no-check
```

### Downloading datafiles from Tinybird

You can download datafiles using the `pull` command. For example:

```shell {% title="Pull a specific file" %}
tb pull --match endpoint_im_working_on
```

The previous command downloads the `endpoint_im_working_on.pipe` to the current folder.
