Every Tinybird user can ingest thousands of events per second, with 10x traffic spikes during launches or news cycles. Our platform needs to scale accordingly. CPU-based scaling was too slow. Memory-based, too vague. We needed to scale based on real signals, not lagging resource metrics. Traditional autoscaling broke the moment our queues backed up before Prometheus metrics could react.
So we turned to Tinybird and built a custom autoscaling system using live ingestion metrics and Kubernetes Event-driven Autoscaling (KEDA). No scraping delays, no extra monitoring stack to run.
The Challenge: Unpredictable real-time workloads
Real-time analytics workloads are inherently unpredictable. Customer traffic can spike 10x during product launches, marketing campaigns, or breaking news events. The traditional autoscaling playbook fails because:
- It's reactive, not predictive: CPU spikes after your system is already overwhelmed.
- It measures the wrong thing: High CPU doesn't always mean you need more pods, sometimes you need smarter data routing.
- It's painfully slow: When new pods finally spin up, user requests may have already been delayed.
The Kafka Bottleneck
Our kafka
service is critical, it processes terabytes of data every day, ingesting data from external Kafka clusters and feeding it into our ClickHouse infrastructure. During peak hours, we might see:
- High-volume event streams from customer Kafka topics.
- Sudden spikes in data volume during customer campaigns.
- Varying message sizes and processing complexity.
We needed a solution that could scale based on the actual data processing demand, not just generic resource utilization.
Enter KEDA: Kubernetes Event-Driven Autoscaling
What Makes KEDA Different
KEDA extends Horizontal Pod Autoscaler (HPA) to work with event-driven metrics:
- Custom Metrics: Instead of generic CPU/Memory metrics, scale based on what actually matters - queue depth, message lag, API response times, or any custom business metric.
- Multiple Scalers: Combine different triggers (CPU, custom metrics, external APIs).
Two Approaches: Traditional vs. Self-Reliance
We explored two different approaches for implementing KEDA autoscaling, each with distinct tradeoffs. Here's how both work and why we chose to use our own platform.
Traditional Approach: Prometheus + KEDA
The typical setup involves running Prometheus to collect and expose application metrics. Your app publishes metrics at a /metrics
endpoint in Prometheus format, which Prometheus scrapes at regular intervals. KEDA then queries Prometheus to retrieve these metrics and make scaling decisions based on them.
KEDA Configuration
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
metricName: lag
threshold: '1000'
query: avg(lag)
We gave Prometheus a fair shot, but moved on because:
- Multi-hop delays: application â local Prometheus scrape â central Prometheus aggregation â federation to monitoring cluster â KEDA query â scaling decision. Each hop adds latency and potential failure points.
- Query overhead: KEDA polling Prometheus adds another layer of latency.
- Stale data during spikes: Metrics are most outdated when you need scaling most.
Self-Reliance: Tinybird + KEDA
Instead of managing a Prometheus stack, we plugged KEDA directly into Tinybirdâs real-time metrics API. No scraping. No delays. Just fresh ingestion data powering scaling decisions in seconds.
Because Tinybird can expose Prometheus-compatible endpoints, KEDA can pull live metrics from the source. This means faster scaling, simpler infrastructure, and autoscaling based on the same streaming data we already trust for analytics.
Step 1: Defining the right metrics
We identified a key metric for intelligent autoscaling:
- Kafka Lag: How far behind are our consumers?
Step 2: Tinybird-native metrics pipeline
Tinybird's native Prometheus endpoint support made it easy to expose this metric in the right format for KEDA.
Here's how we created our scaling metrics endpoint:
TOKEN "metric_lag" READ
NODE kafka_stats
SQL >
%
SELECT
max(lag) as max_lag
FROM kafka_ops_log
where timestamp > now() - interval {{ Int32(seconds, 10) }} seconds
{\% if defined(user_id) and user_id != '' \%}
and user_id {{ String(operator, '=') }} {{ String(user_id) }}
{\% end \%}
NODE kafka_metrics
SQL >
SELECT
arrayJoin(
[
map(
'name',
'max_lag',
'type',
'gauge',
'help',
'max ingestion lag',
'value',
toString(max_lag)
)
]
) as metric
FROM kafka_stats
NODE kafka_pre_prometheus
SQL >
SELECT
metric['name'] as name,
metric['type'] as type,
metric['help'] as help,
toInt64(metric['value']) as value
FROM kafka_metrics
This pipe returns data in Prometheus format when accessed via the .prometheus
endpoint, e.g.:
curl -X GET \
"${TINYBIRD_HOST}/v0/pipes/kafka_scaling_metrics.prometheus?seconds=30&user_id=user123&operator=%3D" \
-H "Authorization: Bearer ${TINYBIRD_TOKEN}"
This approach allows us to compute scaling metrics in real time from the same data powering customer-facing analytics:
- Zero scraping lag: Metrics computed fresh when KEDA requests them.
- Always fresh: Every KEDA poll gets the latest data state.
- No metric storage needed: Metrics computed from streaming data, not pre-aggregated.
Step 3: KEDA configuration with metrics-api scaler
Here's how we wired everything together, connecting KEDA directly to our Tinybird Prometheus endpoint using the metrics-api
scaler:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kafka-scaler
spec:
scaleTargetRef:
name: kafka-deployment
kind: StatefulSet
minReplicaCount: 2
maxReplicaCount: 20
pollingInterval: 30
cooldownPeriod: 300
triggers:
- type: metrics-api
metricType: AverageValue
metadata:
url: https://example.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
format: prometheus
targetValue: '1000'
valueLocation: 'max_lag'
authMode: 'apiKey'
method: 'query'
keyParamName: 'token'
authenticationRef:
name: kafka-keda-auth
Authentication Setup
For secure access to Tinybird endpoints, we set up proper authentication:
apiVersion: v1
kind: Secret
metadata:
name: keda-kafka-token
data:
token: <base64-encoded-tinybird-token>
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: kafka-keda-auth
spec:
secretTargetRef:
- parameter: apiKey
name: keda-kafka-token
key: token
What broke with Prometheus
Running Prometheus at scale isn't just about the server, it's about the entire ecosystem:
Component | Traditional Prometheus | Tinybird Approach |
---|---|---|
Metrics Storage | Prometheus + persistent volumes | Optimized ClickHouse |
High Availability | Multiple Prometheus replicas + federation | Built-in HA |
Data Retention | Configure retention policies, manage disk | Configure with SQL pipes |
Operational Overhead | High: 3-4 services to manage | Low: Update SQL queries |
How Tinybird fixed it
- No metric infrastructure: No exporters, no Prometheus, no additional storage layers.
- Metrics calculated on request, not scraped periodically: Metrics are computed fresh from live data every time KEDA polls.
- Logic in SQL: Update scaling behavior by editing a query, not redeploying code.
- Built-in HA: Tinybird handles availability.
How running it ourselves made the product better
Every autoscaling issue impacted us directly, just as it would our customers. This led to:
- Faster fixes (because they affected us directly).
- Clearer error messages (we had to debug them ourselves).
- More reliable service (our uptime depended on it).
These discoveries directly improved our product for all customers.
The Simulator
To pressure-test our scaling setup and validate edge-case behavior, we built a metrics simulation tool, written in Golang.
It generates metrics and displays them in a terminal UI with realâtime visualization and configurable patterns, and exposes an endpoint to get them.
[This is a demo. We intentionally set low thresholds and increased the scaling speed to showcase the behavior quickly.]
What We Learned
- Stabilization windows: 10-minute scale-up, 30-minute scale-down prevents thrashing.
- Single metrics lie: CPU alone scales too late; combining lag + CPU gives better signal-to-noise ratio.
- Thresholds are workload-specific: what works for batch processing fails for real-time streams.
1. Choosing bad metrics will kill your autoscaling
Not all metrics are equal for autoscaling:
- Good metrics: Queue depth, processing lag, business KPIs.
- Poor metrics: CPU utilization alone, memory usage without context.
2. Tune stabilization windows
Prevent scaling flapping with proper stabilization:
behavior:
scaleUp:
stabilizationWindowSeconds: 600 # 10 minutes
scaleDown:
stabilizationWindowSeconds: 1800 # 30 minutes
3. Test with real traffic patterns
Our simulator helped us discover edge cases:
- Gradual vs. sudden traffic spikes behave differently.
- Weekend vs. weekday patterns require different thresholds.
4. Monitor everything
Use your own tools to monitor autoscaling:
- Track scaling events and their triggers.
- Measure time-to-scale and effectiveness.
- Set alerts for scaling failures or delays.
Advanced patterns: Multi-trigger scaling
Combining multiple metrics
Our production configuration uses mixed triggers: metrics-api
and traditional CPU scaling.
triggers:
- type: metrics-api
metricType: AverageValue
metadata:
url: https://api.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
format: prometheus
targetValue: '1000'
valueLocation: 'max_lag'
authMode: 'apiKey'
method: 'query'
keyParamName: 'token'
authenticationRef:
name: kafka-keda-auth
- type: cpu
metricType: Utilization
metadata:
value: '70'
Regional scaling strategies
For our multi-region deployment, we create region-specific Tinybird endpoints:
# us-east-1 configuration
triggers:
- type: metrics-api
metadata:
url: https://api.us-east-1.tinybird.co/v0/pipes/kafka_scaling_metrics_us_east.prometheus
targetValue: '2000' # Higher threshold for region 1 (to accommodate higher baseline traffic and prevent unnecessary scaling)
valueLocation: 'max_lag'
# eu-west-1 configuration
triggers:
- type: metrics-api
metadata:
url: https://api.eu-west-1.tinybird.co/v0/pipes/kafka_scaling_metrics_eu_west.prometheus
targetValue: '500' # Lower threshold for region 2 (to respond quickly in regions with less baseline traffic)
valueLocation: 'max_lag'
Troubleshooting common issues
Scaling too aggressively
- Problem: Constant scaling up/down.
- Solution: Increase stabilization windows and adjust thresholds.
Metrics not available
- Problem: KEDA can't reach Tinybird API endpoint.
- Solution: Check authentication token, endpoint URL, and any network policy restrictions.
Conclusion: Scaling smarter
Combining KEDA with Tinybird gave us faster, simpler, and more reliable autoscaling, driven entirely by real-time data.
The combination of KEDA's event-driven scaling and Tinybird's real-time metrics pipeline created a feedback loop that actively improves our infrastructure's performance and cost-effectiveness.
- Custom metrics work better than CPU/memory for workload-specific scaling.
- Real-time data beats pre-aggregated metrics for scaling responsiveness.
- Dogfooding drives product improvement when your uptime depends on your platform.
Try it yourself
You don't need to replace your monitoring stack. Just expose one Tinybird endpoint, wire it into KEDA, and autoscale with real-time data. Start small. Move fast.