Best practices to build analytics agents with the Tinybird MCP server

Tinybird Workspaces are fully managed remote MCP servers that you can instantly connect to LLMs and agents with no additional setup.

Here are some best practices for effectively building analytics agents using your data in Tinybird.

Add context to your APIs for language models

The Tinybird MCP server provides list_datasources and list_endpoints tools to publish useful context for LLMs.

Add LLM-friendly descriptions to your resources. Helping the LLM understand what a data source contains or how an API endpoint works without having to infer it from the schema or raw data makes it much easier to generate accurate responses.

Use the DESCRIPTION field in .datasource and .pipe files, and the description, required, and example metadata fields in pipe parameters.

.datasource example:

DESCRIPTION >
    - `analytics_events` contains web analytics events, such as `page_hit` actions or custom events.
    - The `action` column specifies the event type for each `session_id` and `timestamp`.
    - The `payload` is a JSON string with metadata about the action, such as the `user_agent`.

TOKEN "tracker" APPEND

SCHEMA >
    `timestamp` DateTime `json:$.timestamp`,
    `session_id` Nullable(String) `json:$.session_id`,
    `action` LowCardinality(String) `json:$.action`,
    `version` LowCardinality(String) `json:$.version`,
    `payload` String `json:$.payload`

ENGINE MergeTree
ENGINE_PARTITION_KEY toYYYYMM(timestamp)
ENGINE_SORTING_KEY timestamp
ENGINE_TTL timestamp + toIntervalDay(60)

.pipe example:

DESCRIPTION >
    - Use this tool when you need to get most visited pages for a given period.
    - Parameters:
        - `date_from` and `date_to`: Optional date filters, defaulting to the last 7 days.
        - `skip` and `limit`: Pagination parameters.
    - Response: `pathname`, unique `visits`, and total `hits` for the given period.

TOKEN "dashboard" READ

NODE endpoint
SQL >
    %
    select pathname, uniqMerge(visits) as visits, countMerge(hits) as hits
    from analytics_pages_mv
    where
        {% if defined(date_from) %}
            date
            >=
            {{ Date(date_from, description="Starting day for filtering a date range", required=False, example="2025-05-01") }}
        {% else %} date >= timestampAdd(today(), interval -7 day)
        {% end %}
        {% if defined(date_to) %}
            and date
            <=
            {{ Date(date_to, description="Finishing day for filtering a date range", required=False, example="2025-05-01") }}
        {% else %} and date <= today()
        {% end %}
    group by pathname
    order by visits desc
    limit {{ Int32(skip, 0) }},{{ Int32(limit, 50) }}

TYPE endpoint

2. Build APIs for semantic context

Your data represents semantic context for LLMs.

If the LLM cannot infer the meaning of certain terms or formats, you can create dedicated tools to provide that context. For example, if your analytics data encodes device types with internal notations, create a pipe that lists the possible device values and document it clearly:

DESCRIPTION >
  - Retrieves the list of available devices.
  - Use this list to filter other endpoints that require a `device` parameter.

NODE uniq_devices
SQL >
    SELECT distinct(device) as device
    FROM analytics_events

TYPE endpoint

3. Return LLM friendly errors

Requests from the MCP server include a from=mcp parameter. Use this to return error messages optimized for LLM interpretation:

NODE validate_endpoint
SQL >
    %
    {% if defined(from) and from == 'mcp' %}
        {% if defined(device) and device not in ['Android', 'iPhone'] %}
              {{error('Parameter error. There was an error with the parameter you provided. The supported values for the `device` parameters are Android or iPhone, fix the issue by retrying again the request with a valid `device` value.')}}
        {% end %}
    {% end %}
    SELECT * FROM endpoint
    
TYPE endpoint

4. Mind your LLM tokens

The MCP server returns data in CSV format to reduce LLM token usage. Additional tips to minimize tokens:

  • Use pagination (skip, limit) and ranks in APIs.
  • Instruct agents to SELECT only necessary columns.
  • Avoid high-precision types: round numbers to 1–2 decimals; avoid DateTime64.
  • Leverage aggregated materialized views.

5. More tools ≠ better agents

More tools does not correlate with better agents. In Tinybird all your API endpoints in a workspace are published as MCP tools, but you have control over them with authorization tokens.

Use resource-scoped tokens to restrict tool access and build domain-specific agents.

6. Limit agent steps

Agents degrade rapidly when more than three steps are required. A simple, effective workflow is:

  • Data Retrieval: Use Tinybird tools to gather data based on user inputs.
  • LLM Processing: Let the LLM summarize, explain, or analyze the data.
  • Automation: Use third-party tools for alerts or workflows based on the results.

7. Build public facing analytics agents

Use JSON Web Tokens (JWTs) for multi-tenant public agents.

Tinybird JWTs allow fine-grained access control, automatically filtering data by tenant ID. For example, for an org_id parameter in your pipe, use a JWT like:

{
    "workspace_id": "<workspaces_id>",
    "name": "frontend_jwt",
    "exp": 123123123123,
    "scopes": [
        {
            "type": "PIPES:READ",
            "resource": "top_pages",
            "fixed_params": {
                "org_id": "<org_uid>"
            }
        }
    ],
    "limits": {
      "rps": 10
    }
}

Include the token in your MCP URL to support multi-tenancy: https://mcp.tinybird.co?token=<jwt_token>

For integrations, check out:

Updated