---
title: "Idea to Production"
meta:
    description: Build the first version of your data project locally and deploy it to Tinybird Cloud.
---

# Idea to Production

## What you'll build

In this chapter, you'll build the first version of your data project locally and deploy it to Tinybird Cloud.

The desired features are:

- `total-revenue`: for the merchant to see their total revenue in realtime.
- `products-usually-bought-together`: to increase ticket size based on other buyers' behavior.

## Data structure

You'll work with two event types:

- `orders`: Overall order information
- `order_items`: Items within each order

### Example data

__Orders:__

```json
{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50}
{"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}
```

__Order Items:__

```json
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
```

{% steps %}

## Set up the project

First, create a new directory for your Tinybird project. `tb init` will scaffold it for you.

```bash
mkdir ecommerce-analytics && cd ecommerce-analytics
```

```bash
tb init --type cli --dev-mode manual --folder . --cicd github
```

You just created the folder structure (/datasources, /endpoints, /fixtures…) where the code will be stored. Also, note the CI/CD files; __You're prototyping now, so don't pay too much attention to them. You'll need them when you go to production__.

## Inserting and storing data

Next, define the tables and how the events are going to be stored in them.

Create a tmp-seed-data directory, and add samples for orders and order-items events.

```bash
mkdir tmp-seed-data

touch tmp-seed-data/orders.ndjson

echo '{"storeId": 1, "orderId": "order123", "userId": "user456", "timestamp": "2023-04-24T10:30:00Z", "totalAmount": 125.50}
{"storeId": 1, "orderId": "order124", "userId": "user789", "timestamp": "2023-04-24T11:45:00Z", "totalAmount": 75.25}' > tmp-seed-data/orders.ndjson

touch tmp-seed-data/order-items.ndjson

echo '{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}' > tmp-seed-data/order-items.ndjson
```

Create two [.datasource files](/forward/dev-reference/datafiles/datasource-files). __Data sources are the definition of the database tables where you will store the order events__. They specify how to access the data and how to store it.

You can proceed directly if you know the syntax, or use `tb datasource create --file` to infer schemas from sample data.

```bash
tb datasource create --file tmp-seed-data/orders.ndjson --name orders

# ✓ /datasources/orders.datasource

tb datasource create --file tmp-seed-data/order-items.ndjson --name order_items

# ✓ /datasources/order_items.datasource
```

Now that datasource files are created, you're good to delete `tmp-seed-data`.

Examining the .datasource files, they should look like this:

```tb {% title="datasources/orders.datasource" %}
SCHEMA >
	orderId String `json:$.orderId`,
	storeId UInt32 `json:$.storeId`,
	timestamp String `json:$.timestamp`,
	totalAmount Float32 `json:$.totalAmount`,
	userId String `json:$.userId`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "userId, orderId"
```

__SCHEMA__ defines the column names, types, and [jsonpath](/forward/dev-reference/datafiles/datasource-files#jsonpath-expressions) to extract the fields where column values are stored in the json.

__ENGINE__ is set to [MergeTree](/sql-reference/engines/mergetree), the default table engine that is append only

And __ENGINE_SORTING_KEY__ defines the order in which to physically store the data.

```tb {% title="datasources/order_items.datasource" %}
SCHEMA >
	orderId String `json:$.orderId`,
	storeId UInt32 `json:$.storeId`,
	items__productId Array(String) `json:$.items[:].productId`,
	items__priceAtTime Array(Float32) `json:$.items[:].priceAtTime`,
	items__quantity Array(Int16) `json:$.items[:].quantity`

ENGINE "MergeTree"
ENGINE_SORTING_KEY "orderId, storeId"
```

Similar to the previous one, but note the Array types and jsonpaths that will convert this JSON:

```json
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
```

Into this row:

```txt
| storeId | orderId  | items__productId | items__priceAtTime | items__quantity |
|---------|----------|------------------|--------------------|-----------------|
|      1  | order123 | [prod1, prod2]   | [45.00, 35.00]     | [2, 1]          |
```

## Validate the project locally

Run the docker image from CLI with `tb local start`. Orbstack, Docker desktop, or the runtime of your preference is needed. 

```bash
tb local start

# » Starting Tinybird Local...
# ✓ Tinybird Local is ready!
```

Build the project to validate your datafiles. If something is wrong (like the syntax), you'll receive an error and can correct it before deploying to the cloud.

```bash
tb build

# » Building project...
# ✓ datasources/orders.datasource created
# ✓ datasources/order_items.datasource created

# ✓ Build completed in 0.3s
```

The build step also ingests your fixtures so you can test locally with data. Verify it is OK, too:

```bash
tb sql "select * from orders"

# Running against Tinybird Local
#  storeId  orderId   timestamp             totalAmount  userId
#  UInt32   String    String                Float32      String
#  ───────────────────────────────────────────────────────────────
#       1    order123  2023-04-24T10:30:00Z  125.5        user456
# ───────────────────────────────────────────────────────────────
#       1    order124  2023-04-24T11:45:00Z  75.25        user789
```

## Create the "total_revenue" endpoint

Now, create your first endpoint to calculate total revenue metrics. __Endpoints are [.pipe files](/forward/dev-reference/datafiles/pipe-files)__, a convenient way to chain SQL queries together like a notebook. With the `TYPE` command, you can state the desired behavior. You need to use `endpoint` for exposing an API endpoint that you can call from other services.

```tb {% title="endpoints/total_revenue.pipe" %}
NODE total_revenue_endpoint
SQL >
    SELECT 
        count() AS orders,
        sum(totalAmount) AS revenue,
        revenue / orders AS average_order_value
    FROM orders

TYPE endpoint
```

## Test the API endpoint

Verify the API endpoint works as expected. __You need the url__ (by default, tb local runs on port 7181) __and the admin user token__. The path is `v0/pipes/<endpoint name>.<format>?token=<token>` and you need a token to authenticate. The token can be passed as a query parameter or in the header.

```bash
tb token copy "admin local_testing@tinybird.co" && TB_LOCAL_TOKEN=$(pbpaste)

curl -X GET "http://localhost:7181/v0/pipes/total_revenue.json?token=$TB_LOCAL_TOKEN"
# {
#     "meta":
#     [
#             {
#                    	"name": "orders",
#                    	"type": "UInt64"
#            	},
#             {
#                    	"name": "revenue",
#                    	"type": "Float64"
#            	},
#             {
#                    	"name": "average_order_value",
#                    	"type": "Float64"
#            	}
#    	],
#     "data":
#     [
#             {
#                    	"orders": 2,
#                    	"revenue": 200.75,
#                    	"average_order_value": 100.375
#            	}
#    	],
#     "rows": 1,
#     "statistics":
#     {
#            	"elapsed": 0.005939524,
#            	"rows_read": 2,
#            	"bytes_read": 8
#     	}
# }
```

Check meta and statistics, and more importantly, data. Data is in data array: 2 orders, numbers are OK, you're good to go.

## Create the "products_usually_bought_together" endpoint

Next, let's create your product recommendation endpoint:

The idea is to end up with a table that contains the product you will be looking at, and a ranking of the products that are bought with it.

So, for these items:

```txt
["prod1", "prod2"]
["prod3", "prod2"]
["prod2", "prod6", "prod1"]
["prod2", "prod1"]
["prod7"]
```

We want a table like this:

```txt
productId, boughtWith, count
----------------------------
prod1, prod2, 3
prod1, prod6, 1
prod2, prod1, 3
prod2, prod3, 1
prod2, prod6, 1
prod3, prod2, 1
prod6, prod1, 1
prod6, prod2, 1
```

There isn't enough sample data in `order-items.ndjson`, so let's remove the data present in 1 and add new rows:

```tb {% title="sample-order-items.ndjson" %}
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
{"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]}
{"storeId": 1, "orderId": "order123", "items": [{"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}, {"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}]}
{"storeId": 1, "orderId": "order124", "items": [{"productId": "prod3", "quantity": 1, "priceAtTime": 50.25}, {"productId": "prod2", "quantity": 1, "priceAtTime": 25.00}]}
{"storeId": 1, "orderId": "order126", "items": [{"productId": "prod2", "quantity": 1, "priceAtTime": 35.50}, {"productId": "prod6", "quantity": 2, "priceAtTime": 75.00}, {"productId": "prod1", "quantity": 2, "priceAtTime": 45.00}]}
{"storeId": 1, "orderId": "order127", "items": [{"productId": "prod7", "quantity": 1, "priceAtTime": 40.00}]}
```

```bash
tb datasource truncate order_items --yes && tb datasource append order_items order-items.ndjson

# Running against Tinybird Local
# ** Data Source 'order_items' truncated
# Running against Tinybird Local
# Importing data to order_items...
# ✓ Done!

tb sql "select items__productId from order_items"

# Running against Tinybird Local
#  items__productId             
#  Array(String)                
#  ───────────────────────────────
#   ['prod1', 'prod2']           
#  ───────────────────────────────
#   ['prod3', 'prod2']           
#  ───────────────────────────────
#   ['prod2', 'prod6', 'prod1']  
#  ───────────────────────────────
#   ['prod1', 'prod2']           
#  ───────────────────────────────
#   ['prod3', 'prod2']           
#  ───────────────────────────────
#   ['prod2', 'prod6', 'prod1']  
#  ───────────────────────────────
#   ['prod7']                  
```

The data is ready, so let's proceed with the .pipe and SQL logic. Node by node:

- __orders_multiprod__: first, let's filter out the orders that only have 1 product.
- __unrolled_prods__: flatten the arrays to generate one row per array element.
- __prod_pairs__: take the products of the same order in sets of 2
- __ranking__: count coincidences

```tb {% title="endpoints/products_bought_together.pipe" %}
NODE orders_multiprod

SQL >
	SELECT orderId, items__productId, FROM order_items WHERE length(items__productId) > 1

NODE unrolled_prods

SQL >
	SELECT orderId, prods FROM orders_multiprod ARRAY JOIN items__productId as prods

NODE prod_pairs

SQL >
	SELECT distinct t1.orderId, t1.prods as product1, t2.prods as product2
    	FROM unrolled_prods t1
    	JOIN unrolled_prods t2 ON t1.orderId = t2.orderId
    	WHERE product1 != product2

NODE ranking

SQL >
	SELECT product1, product2, count() as pair_frequency
	FROM prod_pairs
	GROUP BY product1, product2
	ORDER BY product1, product2

TYPE ENDPOINT
```

{% callout type="tip" %}
Tip: as this pipe is more complex, checking the results of the nodes in the UI is super helpful, so I'd do `tb dev --ui` and edit nodes in the UI.
{% /callout %}

As always, verify the API endpoint works by making a request:

```bash
curl -X GET "http://localhost:7181/v0/pipes/products_bought_together.csv?token=$TB_LOCAL_TOKEN"

# "product1","product2","pair_frequency"
# "prod1","prod2",3
# "prod1","prod6",1
# "prod2","prod1",3
# "prod2","prod3",1
# "prod2","prod6",1
# "prod3","prod2",1
# "prod6","prod1",1
# "prod6","prod2",1
```

## Deploy to Tinybird Cloud

You have the endpoints, so __you're ready to deploy to production__. Use the Tinybird CLI to push your changes.

Since your project is not in production yet, it is safe to YOLO and deploy directly with `tb --cloud deploy`. However, it is a good practice to run `--check` before.

```bash
tb --cloud deploy --check

# Running against Tinybird Cloud: Workspace ecommerce-analytics

# » Validating deployment...


# * Changes to be deployed:
# -------------------------------------------------------------------------------
# | status | name                 	| path                                	|
# -------------------------------------------------------------------------------
# | new	| orders               	| datasources/orders.datasource       	|
# | new	| order_items          	| datasources/order_items.datasource  	|
# | new	| products_bought_together | endpoints/products_bought_together.pipe |
# | new	| total_revenue        	| endpoints/total_revenue.pipe        	|
# -------------------------------------------------------------------------------
# * No changes in tokens to be deployed


# ✓ Deployment is valid
```

{% /steps %}

## Conclusion

You started with some sample data, created a project, and now have a working project in production.

Next steps:

- [Send data](/forward/get-data-in) to Tinybird Cloud.
- Add [query parameters](/forward/work-with-data/query-parameters) to make the API dynamic. Filter on a store so you don't expose data from store 3 to store 2's owner.
- Secure the endpoint with a [token](/forward/administration/tokens) that is not the admin token. 
- [Optimize](/forward/work-with-data/optimize) the project. You want the API calls to be fast.
- Get ready for production with [testing and CI/CD](/forward/test-and-deploy). You want your existing APIs to keep working while you develop new features.









