My team and I recently shipped a CLI agent called Tinybird Code. We use Claude Code every day, and we were inspired to create something similar, but specialized for working with Tinybird projects and packed with everything we've learned about developing with ClickHouse at scale.
Here's a demo of Tinybird Code:
Why?
Do we really need another CLI agent? Why can't Claude Code handle it? Good questions.
General purpose agents like Claude Code aren't good at working with large-scale, real-time data. Operating ClickHouse in production is really hard; we wanted to build an agent that was good at it. People choose Tinybird so they can focus less on ClickHouse and more on feature development. Tinybird Code is an agent for those AI-native developers who want to speed up feature delivery when working with data.
Below is how we built Tinybird Code, the "Claude Code for data", and what we learned so far about building agents for working with data.
Framework selection
Our CLI is built in Python, so we looked at some of the more popular Python agent frameworks (e.g. Agno, Pydantic, LangGraph). We ended up choosing Pydantic AI because we felt it best addressed our requirements:
- Custom model providers: We host and manage models via Vertex AI rather than user-provided API keys
- Message processing between calls: Ability to limit message history or summarize it for long sessions
- Multi-agent support: Different tasks need different specialists with their own context
- Sync methods: Tinybird CLI is mostly synchronous. We wanted to keep the same experience when developing our agent.
- Flexible message rendering: We have different kinds of messages to show - model responses, tool calls, validations,
tb
commands, and file previews + diffs. We needed to be able to render each one with proper syntax highlighting, formatting, and interactive components.
A good framework simplifies building agents. Pydantic gave us a basic foundation for agent building, with enough flexibility to support our needs, so we could focus less on the agent boilerplate and more on improving the LLM-data interaction.
Secure model provider authentication
Since we were hosting and managing the models on our backend, we needed a way to securely handle model provider credentials while allowing users to authenticate. Users needed to authenticate with just their Tinybird tokens and the agent needed to respect workspace contexts and permissions
We created a server proxy between the model provider and the CLI client:
from anthropic import AsyncAnthropic
from httpx import AsyncClient
from pydantic_ai.models.anthropic import AnthropicModel, AnthropicModelName
from pydantic_ai.providers.anthropic import AnthropicProvider
client = AsyncAnthropic(
base_url=tinybird_api_host,
auth_token=tinybird_token,
http_client=AsyncClient(params={"workspace_id": workspace_id}),
)
model = AnthropicModel(
model_name=model,
provider=AnthropicProvider(anthropic_client=client),
)
This lets us:
- Manage AI provider credentials server-side
- Decide which models to use based on the request (for example, some models are better than others at SQL generation)
- Authenticate our users with the same kind of tokens used for every other Tinybird service and reuse our existing, production-hardened security layer.
Multi-agent architecture
Our work building Explorations, a natural language exploratory data analysis interface, convinced us that a single agent wouldn't be sufficient for Tinybird Code. We recognized the need to both preserve the main agent's context window and develop specialized sub-agents with prompts and tools specific to their domain. Unlike general coding where you're mostly manipulating files and running commands, data engineering involves distinct phases with very different context requirements:
- Main agent: Orchestrates the overall workflow and coordinates other agents
- Exploration agent: Executes SQL queries and explores available endpoints
- Testing agent: Creates, updates, and runs data quality tests
- Mock agent: Creates, updates, and analyzes mock data fixtures
- Command agent: Works as a default or fallback, handles tasks that don't have specific tools by using the Tinybird CLI.
Each sub-agent can operate with its own context window and state, preventing the main agent from getting overwhelmed with large query results or complex test outputs. The architecture also makes it easier to optimize each agent for its specific domain.
Technically, the leading agent will use the subagents as tools and these ones will return the result of the task with a summary of the work done. If some error happens in the process, each subagent is responsible for handling it by itself and fix it.
@self.main_agent.tool
def manage_tests(ctx: RunContext[AgentContext], task: str) -> str:
"""Delegate test management to the TestingAgent:
Args:
task (str): The detailed task to perform. Required.
Returns:
str: The result of the test task.
"""
result = self.testing_agent.run(task, deps=ctx.deps, usage=ctx.usage)
return result.output
...
class TestingAgent:
def __init__(self, model: Model, project: Project):
self.messages: list[ModelMessage] = []
self.agent = Agent(
model=model,
deps_type=AgentContext,
instructions=[
tone_and_style_instructions,
test_instructions,
],
tools=[
Tool(create_tests, docstring_format="google", require_parameter_descriptions=True, takes_ctx=True),
Tool(update_test, docstring_format="google", require_parameter_descriptions=True, takes_ctx=True),
Tool(run_tests, docstring_format="google", require_parameter_descriptions=True, takes_ctx=True),
Tool(remove_test, docstring_format="google", require_parameter_descriptions=True, takes_ctx=True),
],
)
@self.agent.instructions
def get_tests_files(ctx: RunContext[AgentContext]) -> str:
return test_files_instructions(project)
def run(self, task: str, deps: AgentContext, usage: Usage):
result = self.agent.run_sync(task, deps=deps, usage=usage, message_history=self.messages)
new_messages = result.new_messages()
self.messages.extend(new_messages)
return result
Bimodal usage patterns
We built two distinct CLI modes to support different usage patterns.
The interactive mode is just like Claude Code. You type tb
and the interactive console opens - ideal for human-in-the-loop development and extended "pair programming" sessions.
The "one-shot" mode, triggered with tb --prompt/-p "some prompt"
, can be more useful for very specific, delimited tasks that don't require conversational interaction. In addition, this mode can be useful to allow Tinybird Code to act as a specialized sub-agent within larger workflows - for example, Claude Code can delegate analytics tasks to Tinybird Code while maintaining its own context and focus. This composability between agents is key for complex workflows that span multiple domains.
Interactive mode (default): Opens an agent shell for iterative work
try:
while True:
user_input = show_input(workspace_name)
if user_input == "exit":
break
agent.run(input)
except KeyboardInterrupt:
click.echo(FeedbackManager.info(message="Goodbye!"))
break
One-shot mode: Triggered with a direct prompt
tb --prompt="Add a new test suite for the events_by_user endpoint"
Error recovery and autofix
One thing we learned early on is that LLMs struggle with analytics. They're also not particularly good at understanding data schemas and writing SQL queries, especially as data complexity grows. LLMs make different (and often more egregious) mistakes in specialized domains like data engineering than they do in general programming. This makes sense: there are millions of React component examples floating around the internet on which to train. There are orders of magnitude fewer ClickHouse schema optimization examples.
Any good agent should automatically handle errors, so we built error recovery into Tinybird Code's core workflow using three approaches:
First, we validate every change using Tinybird's own tools. Every time we modify project files, we immediately run a build to catch issues:
def create_datafile_tool(ctx: RunContext[AgentContext], name: str, type: str, content: str, pathname: str) -> str:
try:
create_datafile(name=name, type=type, content=content, pathname=pathname)
ctx.deps.build_project() # Validate immediately
return f"Datafile {name} created in {pathname} and project built successfully."
except BuildException as e:
return f"Error building project: {e}. If the error is related to another resource, fix it and try again."
Second, we took a hard look at error messages. We've always prioritized valuable, human-readable error messages with specific feedback, but building an autonomous agent helped us refine our product error messages to be as detailed and context-specific as possible. Generic errors teach the developer, whether human or agent, nothing about the domain. Good error messages allow the agent to immediately act on feedback received:
Third, we built deterministic error handling for common failure patterns. For example, when data hits quarantine due to an ingestion failure (usually a type mismatch), we automatically query the quarantine table for error data and provide that context to the agent:
def handle_quarantine_error(ctx: RunContext[AgentContext], error_message: str, datasource_name: str) -> str:
is_quarantine_error = "in quarantine" in error_message
if not is_quarantine_error:
return error_message
query = f"select * from {datasource_name}_quarantine order by insertion_date desc FORMAT JSON"
result = ctx.deps.execute_query(query=query)
return f"The data has been quarantined in {datasource_name}_quarantine because of the following errors:\n{result['data']}"
Modular rendering architecture
We found that Tinybird's workflows needed a modular approach to message rendering that could interleave LLM responses with our own validations, file previews, and real-time feedback from the Tinybird platform.
Rather than waiting for complete responses or setting up server-sent events, we discovered we could iterate over the agent's underlying graph nodes to get much finer control:
async def run(self, user_prompt: str, config: dict[str, str]) -> None:
deps = self._build_deps(config)
async with self.agent.iter(
user_prompt, deps=deps, message_history=self.messages, model=self.model
) as agent_run:
async for node in agent_run:
if hasattr(node, "model_response"):
for _i, part in enumerate(node.model_response.parts):
if hasattr(part, "content") and not agent_run.result:
# Custom rendering logic for our domain
should_continue_chirping = self.chirping.running
if should_continue_chirping:
self.chirping.stop()
click.echo(part.content)
if should_continue_chirping:
self.chirping.start()
This approach gives us complete control over when and how we display different types of content, which turned out to be essential for the user experience we wanted Tinybird Code to have.
For example, the excerpt below shows a mix of agent response, Tinybird CLI output, and custom diff rendering for file changes:
Chat history compaction
Context compaction is an important pattern in agentic workflows, especially long-running chats that can span days or even weeks of development. We knew that context window management would be important. It's made increasingly challenging by the fact that Tinybird users are often working with billions of rows of data - a context window nightmare. Tinybird Code sessions can get very token-heavy very quickly - large query results, comprehensive schema information, test outputs, etc.
We built context management into the architecture using Pydantic AI's history_processors
parameter, so we can process message history before sending it to the model and keep essential context while staying under token limits. The compaction process uses a summarizing agent to take the existing chat history and shrink it down into essential summaries to preserve the context window:
agent = Agent(
model=model,
instructions=[...],
tools=[...],
history_processors=[compact_messages],
)
def compact_messages(
ctx: RunContext[TinybirdAgentContext],
messages: list[ModelMessage],
) -> list[ModelMessage]:
if not need_compact(messages):
return messages
compacted_messages = summarize_agent.run(
messages=messages,
model=model,
instructions=[...],
tools=[...],
)
return compacted_messages
Our compactor agent maintains:
- User requests: a summary of all previous user requests.
- Problems solved: a summary of all problems solved by the agent.
- Pending tasks: a summary of all pending tasks that the agent needs to complete.
- Current work: a summary of the current work the agent is doing.
- Next step: a summary of the next step the agent will take.
Workflow-oriented tools
Tinybird Code is a productivity tool, so it is important that its tools match how engineers actually think about the domain. Rather than organizing tools around technical capabilities, we organized them around engineering workflows:
- File tools: Create, read, update, delete
.datasource
and.pipe
files - Fixture tools: Analyze and append fixtures to datasources
- Query tools: Execute SQL queries or request endpoints with parameters
- Testing tools: Create, update, and run tests
- Command tools: Run any CLI command when specialized tools aren't enough
- Planning tool: Help the agent break down complex multi-file operations
- Secret tools: Manage secrets locally
- Build/deploy tools: Build locally or deploy to production
These tools generally map more to how engineers think about development work, rather than around database- or API-focused technical tasks. This aligns generally with our developer-focused product principles and emphasis on good developer experience.
Safe environment handling
One unique aspect of Tinybird Code is its ability to read/write directly to/from the production environment, Tinybird Cloud, via CLI commands like tb --cloud deploy
. Claude Code will checkout new branches and make PRs, but Tinybird Code was instead given autonomy to handle deployments directly via CLI if requested. That introduces some strict safety requirements; nobody wants an agent deploying to production accidentally or without confirmation.
We built environment awareness into Tinybird Code's core:
- Defaulting to local environment
- Detecting environment automatically, when possible, based on the user request or context
- Asking users to choose an environment when the request or context is ambiguous
- Requiring confirmation for actions that affect production
- Making environment switching explicit and safe
Try Tinybird Code
We built Tinybird Code to be like your own "AI ClickHouse engineer". If you're already using Claude Code or other agents in your development worklfow, but you need something more specialized for integrating real-time analytics into your project, check Tinybird Code:
curl https://tinybird.co | sh
tb