Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Birdwatcher is an AI agent that talks to your data in Tinybird, automating data exploration, monitoring, and complex real-time data analysis.

It can run as a standalone agent, on a schedule using GitHub Actions, or in conversational mode (via CLI Slack bot)

Here's a quick usage example for web analytics:

uv run python birdwatcher.py \
  --prompt "Tell me the top 5 pages in terms of visits in the last 24 hours"

┏━ Message ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃ pages with more visits in last 24 hours                                       ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┏━ Tool Calls ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃ • top_pages(parameters={'date_from': '2025-05-18', 'date_to': '2025-05-19'})  ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┏━ Response ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃                                                                               ┃
┃ 1. **/** (Homepage) - 11167 visits, 12160 hits                                ┃
┃ 2. **/pricing** - 3901 visits, 5840 hits                                      ┃
┃ 3. **/docs/forward/get-started/quick-start** - 3812 visits, 5304 hits         ┃
┃ 4. **/product** - 2622 visits, 2867 hits                                      ┃
┃ 5. **/blog-posts/which-llm-writes-the-best-sql** - 2512 visits, 2686 hits     ┃
┃                                                                               ┃
┃ The homepage is by far the most visited page with 11167 visits, followed by   ┃
┃ the pricing page and documentation quick-start guide.                         ┃
┃ The blog posts about LLMs and SQL, and MCP vs APIs are also popular, showing  ┃
┃ interest in AI-related content.                                               ┃
┃                                                                               ┃
┃ Note: The difference between "visits" and "hits" is that visits represent     ┃
┃ unique sessions, while hits include all page views (including multiple        ┃
┃ views in the same session).                                                   ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

Let’s explore how it works and how you can build your own domain-specific agent.

Technical implementation

I used Agno a framework that offers:

Agent loop management
Multiple LLM backends (Claude, Gemini, etc.)
A Tool integration system, including MCP
Optional Memory and Storage management
Other advanced features, such as reasoning, thinking or knowledge tools

Setting up a basic Agent is straightforward:

agent = Agent(
   model=model,
   tools=tools,
   description=system_prompt,
   instructions=mission,
)

await agent.aprint_response(
   "make a question about your data",
   user_id=user_id,
   stream=True
)

I focused on configuring the model, tools, and prompts.

I'm using gemini for its larger context window, but claude also works well.

For data analytics, I'm using the Tinybird MCP Server, which turns your Tinybird workspace into a remote, hosted MCP Server, complete with various tools to help agents fetch data and perform complex data analysis.

For notifications, I use Agno's built-in Slack and Resend tools.

This is how you set them up:

mcp_tools = MCPTools(
   transport="streamable-http",
   url=f"https://mcp.tinybird.co?token={tinybird_api_key}&host={tinybird_host}",
   timeout_seconds=120
)

tools=[
   mcp_tools,
   ResendTools(from_email=<YOUR_EMAIL>),
   SlackTools(),
]

This sets up a functioning Agent: an LLM running in a loop with access to tools. Now it just needs a mission.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Prompts

The prompts are key to shaping the behavior and performance of Birdwatcher (or any agent), and they can be adjusted to make the agent more or less autonomous.

There are three prompt layers:

System Prompt: Defines the agent's role as a data analyst and sets basic rules. This is usually a generic prompt.
Instructions: The detailed instructions that define Birdwatcher's mission and the steps it must take to accomplish it. You have one set of instructions per agent and mission.
User prompt: The user input question or message. This can change on each run of the agent.

To build useful agents, you must define clear missions and domain-specific prompts. Vagueness leads to mediocre results.

Missions

Birdwatcher supports missions passed at runtime:

uv run python birdwatcher.py \
  --mission explore \
  --prompt "top 5 pages with more visits in the last 24 hours, \
  notify to #birdwatcher-notifications"

Missions are markdown files or inline text containing detailed, step-by-step instructions for the agent to complete it's task. There are several predefined missions in the missions folder of the repository. We intend to add more as we develop and test them (feel free to submit your own with a PR).

The --prompt flag allows you to adjust the agent with more specific context, such as a timeframe or where to notify, specific tools to use, etc.

Let's see how missions instruct the agent to solve problems, with two examples.

Example 1: Data Exploration

Building a data exploration agent is hard, it requires:

Understaning schemas, APIs, parameters, data types, and semantic context of the data
Generating valid dynamic SQL queries in the correct dialect (ClickHouse in this case)
Providing contextual and accurate responses

Tinybird's MCP Server includes an explore_data tool that does exactly this. It's a server-side agent that runs autonomously to analyze data based on preloaded context and understanding of all the resources you have in a Tinybird workspace.

I created an explore mission to leverage it.

uv run python birdwatcher.py \
  --mission explore \
  --prompt "analyze top referrers in the last month"

📝 Prompt: analyze top referrers in the last month
--------------------------------------------------
DEBUG ********** Agent ID: de5f68b9-8b4a-4288-9620-ebc36972f90d **********
DEBUG ********** Session ID: 24d20777-0bfc-45b8-8a3c-e45953a4ad14 **********
DEBUG Processing tools for model
DEBUG Added tool explore_data from MCPTools
DEBUG Added tool text_to_sql from MCPTools
DEBUG Added tool analytics_hits from MCPTools
DEBUG Added tool trend from MCPTools
DEBUG Added tool top_pages from MCPTools
DEBUG Added tool top_browsers from MCPTools
DEBUG Added tool top_locations from MCPTools
DEBUG Added tool kpis from MCPTools
DEBUG Added tool top_devices from MCPTools
DEBUG Added tool top_sources from MCPTools
DEBUG Added tool list_endpoints from MCPTools
DEBUG Added tool list_datasources from MCPTools
DEBUG Added tool execute_query from MCPTools
DEBUG Added tool list_service_datasources from MCPTools
DEBUG Added tool send_email from resend_tools
DEBUG Added tool send_message from slack
DEBUG Added tool send_message_thread from slack
DEBUG Added tool list_channels from slack
DEBUG Added tool get_channel_history from slack
DEBUG ********** Agent Run Start: e9c1cebd-d367-45b0-8579-082661631c5b **********
DEBUG ---------- Google Async Response Stream Start ----------
DEBUG ---------- Model: gemini-2.0-flash ----------
DEBUG ========== system ==========
DEBUG You are a data analyst for Tinybird metrics. You have MCP tools to get schemas,
      endpoints and data.

      <rules>
      - Retry failed tools once, add errors to prompt to auto-fix
      - Datetime format: YYYY-MM-DD HH:MM:SS
      - Date format: YYYY-MM-DD
      - Now is 2025-06-19 12:51:06
      - Auto-fix SQL syntax errors
      - Use ClickHouse dialect
      - Use toStartOfInterval(toDateTime(timestamp_column), interval 1 minute)
        to aggregate by minute (use second, hour, day, etc. for other intervals)
      - Use now() to get the current time
      - When asked about a specific pipe or datasource, use list_datasources and
        list_endpoints to check the content
      - service data sources columns with duration metrics are in seconds
      - format bytes to MB, GB, TB, etc.
      </rules>
      <resend_rules>
      - You MUST send an email to the user ONLY when requested.
      - The email body MUST be the investigation report in HTML format.
      - Include a summary of the investigation in the email body.
      </resend_rules>


      <your_role>
      Autonomous Data Analyst
      </your_role>

      <instructions>
      You are in a Slack thread with a user and you are a bot capable to do
      complex analytical queries to Tinybird.

      Either the user has received a message from the bot and is asking for
      follow up questions related to the conversation or has started a new
      conversation with the bot.
      <exploration_instructions>
      - You MUST explicitly answer just the user request using the explore_data
        tool once and only once
      - Don't do more than one call to explore_data tool
      - If list_service_datasources returns organization data sources, you must
        append "use organization service data sources" in the explore_data tool
        call
      - If not timeframe is provided, use the last hour and report to the user
        in the response
      - If there's any error or the user insists on similar questions, tell them
        to be more specific
      - Report errors gracefully, asking to retry or to provide a more specific
        prompt
      - You have the full context of the thread
      - Summarize the thread context including ONLY relevant information for the
        user request (dates, pipe names, datasource names, metric names and
        values), keep it short and concise. Do NOT include superflous
        information, recommendations or conclusions, just facts.
      - Append the thread summary to the explore_data tool call if it's relevant
        to the user request. Example: if the user asked for top 5 pipes by error
        rate, and then asks in the last hour, you MUST do one and only one call
        to explore_data with a prompt like this: "Top 5 pipes by error rate in
        the last hour"
      </exploration_instructions>
      <text_to_sql_instructions>
      - You MUST use the text_to_sql tool when the user specifically asks for
        SQL response.
      - If list_service_datasources returns organization data sources, indicate
        "use organization service data sources" in the text_to_sql tool call
      - You MUST use the execute_query tool when the user specifically asks to
        run a given SQL query
      </text_to_sql_instructions>
      <slack_instructions>
      - You report messages in a Slack thread with the user
      - You MUST send a structured slack message
      - Use backticks and Slack formatting for names, table names and code blocks
      - Format tables with Slack formatting
      </slack_instructions>
      </instructions>

      <additional_information>
      - Use markdown to format your answers.
      </additional_information>
DEBUG ================ user ================
DEBUG analyze top referrers in the last month
INFO Using Vertex AI API
DEBUG ================ assistant ================
DEBUG Tool Calls:
        - ID: '2d147d58-4d23-4742-8ccf-8cdece668f86'
          Name: 'explore_data'
          Arguments: 'prompt: Top referrers in the last month'
DEBUG ********** METRICS **********
DEBUG * Tokens:                      input=6045, output=12, total=6057
DEBUG * Time:                        1.5816s
DEBUG * Tokens per second:           7.5871 tokens/s
DEBUG * Time to first token:         1.5799s
DEBUG ********** METRICS **********
DEBUG Running: explore_data(prompt=Top referrers in the last month)
DEBUG Calling MCP Tool 'explore_data' with args:
      {'prompt': 'Top referrers in the last month'}
DEBUG ========== tool ==========
DEBUG Tool call Id: 2d147d58-4d23-4742-8ccf-8cdece668f86
DEBUG "Here are the top 10 referrers for the last 30 days:\n\n1. **google.com**:
      5,970 visits, 12,243 hits\n2. **(direct traffic)**: 5,933 visits, 19,168
      hits\n3. **t.co** (Twitter): 711 visits, 1,152 hits\n4. **api.daily.dev**:
      461 visits, 556 hits\n5. **linkedin.com**: 195 visits, 241 hits\n6.
      **cloud.tinybird.co**: 145 visits, 565 hits\n7. **github.com**: 139
      visits, 257 hits\n8. **chatgpt.com**: 97 visits, 149 hits\n9. **bing.com**
      : 86 visits, 159 hits\n10. **duckduckgo.com**: 81 visits, 192 hits
      \n\nGoogle is the top referrer, followed closely by direct traffic
      (visitors who typed the URL directly or have no referrer information).
      Social media platforms like Twitter (t.co) and LinkedIn also drive
      significant traffic to the site."
DEBUG ********** TOOL METRICS **********
DEBUG * Time:                        29.0464s
DEBUG ********** TOOL METRICS **********
DEBUG ================ assistant ================
DEBUG The top referrers for the last 30 days are:

      *   `google.com`: 5,970 visits, 12,243 hits
      *   `(direct traffic)`: 5,933 visits, 19,168 hits
      *   `t.co` (Twitter): 711 visits, 1,152 hits
      *   `api.daily.dev`: 461 visits, 556 hits
      *   `linkedin.com`: 195 visits, 241 hits
      *   `cloud.tinybird.co`: 145 visits, 565 hits
      *   `github.com`: 139 visits, 257 hits
      *   `chatgpt.com`: 97 visits, 149 hits
      *   `bing.com`: 86 visits, 159 hits
      *   `duckduckgo.com`: 81 visits, 192 hits
DEBUG ********** METRICS **********
DEBUG * Tokens:                      input=6351, output=224, total=6575
DEBUG * Time:                        1.6367s
DEBUG * Tokens per second:           136.8606 tokens/s
DEBUG * Time to first token:         0.6283s
DEBUG ********** METRICS **********
DEBUG ---------- Google Async Response Stream End ----------
DEBUG Added RunResponse to Memory
DEBUG Creating session summary.
INFO Using Vertex AI API
┏━ Message ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃ analyze top referrers in the last month                                       ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┏━ Response (35.7s) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃ The top referrers for the last 30 days are:                                   ┃
┃                                                                               ┃
┃  • google.com: 5,970 visits, 12,243 hits                                      ┃
┃  • (direct traffic): 5,933 visits, 19,168 hits                                ┃
┃  • t.co (Twitter): 711 visits, 1,152 hits                                     ┃
┃  • api.daily.dev: 461 visits, 556 hits                                        ┃
┃  • linkedin.com: 195 visits, 241 hits                                         ┃
┃  • cloud.tinybird.co: 145 visits, 565 hits                                    ┃
┃  • github.com: 139 visits, 257 hits                                           ┃
┃  • chatgpt.com: 97 visits, 149 hits                                           ┃
┃  • bing.com: 86 visits, 159 hits                                              ┃
┃  • duckduckgo.com: 81 visits, 192 hits                                        ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
┏━ Session Summary ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃                                                                               ┃
┃ Session summary updated                                                       ┃
┃                                                                               ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

It runs a tool like this:

explore_data(prompt="Top referrers in the last month")

You can see the agent’s inner workings via the debug traces:

Lists tools
Sets ups mission instructions
Runs one explore_data call as instructed
Sends the structured response

Example 2: Cluster Health Monitoring

Usually you want an agent to solve more complex problems, where it gathers data from several sources, correlates or detects anomalies, and suggests improvements.

As an example, I created a mission for an agent that monitors your Tinybird workspace cluster for CPU spikes and reports operations (jobs, ingestion, request to pipes) that correlate to the spikes, along with some mitigation advice.

Let's run Birdwatcher with a different mission to investigate CPU spikes:

uv run python birdwatcher.py \
  --mission cpu_spikes \
  --prompt "Investigate CPU spikes in the last hour and report to #monitor-cpu"

This mission uses tools to analyze metrics like job executions, ingestion volumes, endpoint latency, etc. The full cpu_spikes.md mission defines the investigation flow.

Automated Monitoring

You can run agents on a schedule using GitHub Actions.

Here's an example GitHub Action to run Birdwatcher.

And this is how you make use of it for a specific mission:

name: Birdwatcher Agent

on:
  schedule:
    - cron: '0 * * * *'  # Run hourly

jobs:
   monitor-cpu-spikes:
    runs-on: ubuntu-latest
    steps:
      - uses: tinybirdco/ai@main
        with:
          slack_token: ${{ secrets.SLACK_TOKEN }}
          anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
          tinybird_token: ${{ secrets.TINYBIRD_TOKEN }}
          tinybird_host: ${{ secrets.TINYBIRD_HOST }}
          prompt: |
            Report cpu spikes in the last hour. Send a Slack message to
            #tmp-birdwatcher with the results. No markdown.
          mission: cpu_spikes
          model: claude-4-sonnet-20250514

You can spin up different scheduled agents for CPU issues, endpoint errors, signup conversions, or your domain specific investigations.

Takeaways

Some challenges and takeaways while building my first AI agent.

Frameworks: Agno worked well, but don't overcommit early. Others like Vercel AI SDK or PydanticAI are similar.
Data exploration: This is foundational for analytics agents. Having an explore_data tool inside Tinybird MCP simplified everything.
Prompt engineering is hard. General prompts perform worse. Go for mission-specific ones with 2–3 clear steps. Prompt engineering is a skill you can develop.
Context management Still an unsolved problem. Use larger-context models or summaries cautiously.
Advanced tools. Start with simple agents that solve very specific tasks. Focus on how they address real-world problems better than existing tools, rather than getting caught in the tech-hype cycle of testing new things.
Production: Deploying, configuring, and operating agents has the same (or worse) challenges as any other software. Local is easy but production takes work.

Subscribe to our newsletter

Get 10 links weekly to the Data and AI articles the Tinybird team is reading.

Loading…

Build your own analytics agent

With Birdwatcher, the combination of:

a basic Agent that can use tools
the Tinybird MCP Server to give the agent domain-specific data analysis tools
Missions prompts to instruct the agent on specific tasks
GitHub Actions to create and automate multiple expert agents that run on schedule

You can build your own analytics agents in just a few minutes.

Just check the GitHub repository quickstart:

Define a mission
Set your environment variables
Use Tinybird MCP for domain access
Deploy via GitHub Actions
Start solving

Skip the infra work. Ship your first API today.

Product /

Company /

Resources /

Integrations /

Use Cases /

Building an autonomous analytics agent with Agno and Tinybird