Name: Tinybird
Brand: Tinybird
Rating: 5.0 (10 reviews)

Every Tinybird user can ingest thousands of events per second, with 10x traffic spikes during launches or news cycles. Our platform needs to scale accordingly. CPU-based scaling was too slow. Memory-based, too vague. We needed to scale based on real signals, not lagging resource metrics. Traditional autoscaling broke the moment our queues backed up before Prometheus metrics could react.

So we turned to Tinybird and built a custom autoscaling system using live ingestion metrics and Kubernetes Event-driven Autoscaling (KEDA). No scraping delays, no extra monitoring stack to run.

The Challenge: Unpredictable real-time workloads

Real-time analytics workloads are inherently unpredictable. Customer traffic can spike 10x during product launches, marketing campaigns, or breaking news events. The traditional autoscaling playbook fails because:

It's reactive, not predictive: CPU spikes after your system is already overwhelmed.
It measures the wrong thing: High CPU doesn't always mean you need more pods, sometimes you need smarter data routing.
It's painfully slow: When new pods finally spin up, user requests may have already been delayed.

The Kafka Bottleneck

Our kafka service is critical, it processes terabytes of data every day, ingesting data from external Kafka clusters and feeding it into our ClickHouse infrastructure. During peak hours, we might see:

High-volume event streams from customer Kafka topics.
Sudden spikes in data volume during customer campaigns.
Varying message sizes and processing complexity.

We needed a solution that could scale based on the actual data processing demand, not just generic resource utilization.

Start building with Tinybird!

If you've read this far, you might want to use Tinybird as your analytics backend. You can just get started, on the free plan.

Enter KEDA: Kubernetes Event-Driven Autoscaling

What Makes KEDA Different

KEDA extends Horizontal Pod Autoscaler (HPA) to work with event-driven metrics:

Custom Metrics: Instead of generic CPU/Memory metrics, scale based on what actually matters - queue depth, message lag, API response times, or any custom business metric.
Multiple Scalers: Combine different triggers (CPU, custom metrics, external APIs).

Two Approaches: Traditional vs. Self-Reliance

We explored two different approaches for implementing KEDA autoscaling, each with distinct tradeoffs. Here's how both work and why we chose to use our own platform.

Traditional Approach: Prometheus + KEDA

The typical setup involves running Prometheus to collect and expose application metrics. Your app publishes metrics at a /metrics endpoint in Prometheus format, which Prometheus scrapes at regular intervals. KEDA then queries Prometheus to retrieve these metrics and make scaling decisions based on them.

KEDA Configuration

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: lag
      threshold: '1000'
      query: avg(lag)

We gave Prometheus a fair shot, but moved on because:

Multi-hop delays: application → local Prometheus scrape → central Prometheus aggregation → federation to monitoring cluster → KEDA query → scaling decision. Each hop adds latency and potential failure points.
Query overhead: KEDA polling Prometheus adds another layer of latency.
Stale data during spikes: Metrics are most outdated when you need scaling most.

Self-Reliance: Tinybird + KEDA

Instead of managing a Prometheus stack, we plugged KEDA directly into Tinybird’s real-time metrics API. No scraping. No delays. Just fresh ingestion data powering scaling decisions in seconds.

Because Tinybird can expose Prometheus-compatible endpoints, KEDA can pull live metrics from the source. This means faster scaling, simpler infrastructure, and autoscaling based on the same streaming data we already trust for analytics.

Step 1: Defining the right metrics

We identified a key metric for intelligent autoscaling:

Kafka Lag: How far behind are our consumers?

Step 2: Tinybird-native metrics pipeline

Tinybird's native Prometheus endpoint support made it easy to expose this metric in the right format for KEDA.

Here's how we created our scaling metrics endpoint:

TOKEN "metric_lag" READ

NODE kafka_stats
SQL >
    %
    SELECT
        max(lag) as max_lag
    FROM kafka_ops_log
    where timestamp > now() - interval {{ Int32(seconds, 10) }} seconds
    {\% if defined(user_id) and user_id != '' \%}
        and user_id {{ String(operator, '=') }} {{ String(user_id) }}
    {\% end \%}

NODE kafka_metrics
SQL >
    SELECT
        arrayJoin(
            [
                map(
                    'name',
                    'max_lag',
                    'type',
                    'gauge',
                    'help',
                    'max ingestion lag',
                    'value',
                    toString(max_lag)
                )
            ]
        ) as metric
    FROM kafka_stats

NODE kafka_pre_prometheus
SQL >

    SELECT 
        metric['name'] as name,
        metric['type'] as type,
        metric['help'] as help,
        toInt64(metric['value']) as value
    FROM kafka_metrics

This pipe returns data in Prometheus format when accessed via the .prometheus endpoint, e.g.:

curl -X GET \
  "${TINYBIRD_HOST}/v0/pipes/kafka_scaling_metrics.prometheus?seconds=30&user_id=user123&operator=%3D" \
  -H "Authorization: Bearer ${TINYBIRD_TOKEN}"

This approach allows us to compute scaling metrics in real time from the same data powering customer-facing analytics:

Zero scraping lag: Metrics computed fresh when KEDA requests them.
Always fresh: Every KEDA poll gets the latest data state.
No metric storage needed: Metrics computed from streaming data, not pre-aggregated.

Step 3: KEDA configuration with metrics-api scaler

Here's how we wired everything together, connecting KEDA directly to our Tinybird Prometheus endpoint using the metrics-api scaler:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaler
spec:
  scaleTargetRef:
    name: kafka-deployment
    kind: StatefulSet
  minReplicaCount: 2
  maxReplicaCount: 20
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
  - type: metrics-api
    metricType: AverageValue
    metadata:
      url: https://example.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
      format: prometheus
      targetValue: '1000'
      valueLocation: 'max_lag'
      authMode: 'apiKey'
      method: 'query'
      keyParamName: 'token'
    authenticationRef:
      name: kafka-keda-auth

Authentication Setup

For secure access to Tinybird endpoints, we set up proper authentication:

apiVersion: v1
kind: Secret
metadata:
  name: keda-kafka-token
data:
  token: <base64-encoded-tinybird-token>
---
apiVersion: keda.sh/v1alpha1  
kind: TriggerAuthentication
metadata:
  name: kafka-keda-auth
spec:
  secretTargetRef:
  - parameter: apiKey
    name: keda-kafka-token
    key: token

What broke with Prometheus

Running Prometheus at scale isn't just about the server, it's about the entire ecosystem:

Component	Traditional Prometheus	Tinybird Approach
Metrics Storage	Prometheus + persistent volumes	Optimized ClickHouse
High Availability	Multiple Prometheus replicas + federation	Built-in HA
Data Retention	Configure retention policies, manage disk	Configure with SQL pipes
Operational Overhead	High: 3-4 services to manage	Low: Update SQL queries

How Tinybird fixed it

No metric infrastructure: No exporters, no Prometheus, no additional storage layers.
Metrics calculated on request, not scraped periodically: Metrics are computed fresh from live data every time KEDA polls.
Logic in SQL: Update scaling behavior by editing a query, not redeploying code.
Built-in HA: Tinybird handles availability.

How running it ourselves made the product better

Every autoscaling issue impacted us directly, just as it would our customers. This led to:

Faster fixes (because they affected us directly).
Clearer error messages (we had to debug them ourselves).
More reliable service (our uptime depended on it).

These discoveries directly improved our product for all customers.

The Simulator

To pressure-test our scaling setup and validate edge-case behavior, we built a metrics simulation tool, written in Golang.

It generates metrics and displays them in a terminal UI with real‑time visualization and configurable patterns, and exposes an endpoint to get them.

Metrics Simulator [This is a demo. We intentionally set low thresholds and increased the scaling speed to showcase the behavior quickly.]

What We Learned

Stabilization windows: 10-minute scale-up, 30-minute scale-down prevents thrashing.
Single metrics lie: CPU alone scales too late; combining lag + CPU gives better signal-to-noise ratio.
Thresholds are workload-specific: what works for batch processing fails for real-time streams.

1. Choosing bad metrics will kill your autoscaling

Not all metrics are equal for autoscaling:

Good metrics: Queue depth, processing lag, business KPIs.
Poor metrics: CPU utilization alone, memory usage without context.

2. Tune stabilization windows

Prevent scaling flapping with proper stabilization:

behavior:
  scaleUp:
    stabilizationWindowSeconds: 600    # 10 minutes
  scaleDown:
    stabilizationWindowSeconds: 1800   # 30 minutes

3. Test with real traffic patterns

Our simulator helped us discover edge cases:

Gradual vs. sudden traffic spikes behave differently.
Weekend vs. weekday patterns require different thresholds.

4. Monitor everything

Use your own tools to monitor autoscaling:

Track scaling events and their triggers.
Measure time-to-scale and effectiveness.
Set alerts for scaling failures or delays.

Advanced patterns: Multi-trigger scaling

Combining multiple metrics

Our production configuration uses mixed triggers: metrics-api and traditional CPU scaling.

triggers:
- type: metrics-api
  metricType: AverageValue
  metadata:
    url: https://api.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
    format: prometheus
    targetValue: '1000'
    valueLocation: 'max_lag'
    authMode: 'apiKey'
    method: 'query'
    keyParamName: 'token'
  authenticationRef:
    name: kafka-keda-auth
- type: cpu
  metricType: Utilization
  metadata:
    value: '70'

Regional scaling strategies

For our multi-region deployment, we create region-specific Tinybird endpoints:

# us-east-1 configuration
triggers:
- type: metrics-api
  metadata:
    url: https://api.us-east-1.tinybird.co/v0/pipes/kafka_scaling_metrics_us_east.prometheus
    targetValue: '2000'  # Higher threshold for region 1 (to accommodate higher baseline traffic and prevent unnecessary scaling)
    valueLocation: 'max_lag'

# eu-west-1 configuration  
triggers:
- type: metrics-api
  metadata:
    url: https://api.eu-west-1.tinybird.co/v0/pipes/kafka_scaling_metrics_eu_west.prometheus
    targetValue: '500'   # Lower threshold for region 2 (to respond quickly in regions with less baseline traffic)
    valueLocation: 'max_lag'

Troubleshooting common issues

Scaling too aggressively

Problem: Constant scaling up/down.
Solution: Increase stabilization windows and adjust thresholds.

Metrics not available

Problem: KEDA can't reach Tinybird API endpoint.
Solution: Check authentication token, endpoint URL, and any network policy restrictions.

Subscribe to our newsletter

Get 10 links weekly to the Data and AI articles the Tinybird team is reading.

Loading…

Conclusion: Scaling smarter

Combining KEDA with Tinybird gave us faster, simpler, and more reliable autoscaling, driven entirely by real-time data.

The combination of KEDA's event-driven scaling and Tinybird's real-time metrics pipeline created a feedback loop that actively improves our infrastructure's performance and cost-effectiveness.

Custom metrics work better than CPU/memory for workload-specific scaling.
Real-time data beats pre-aggregated metrics for scaling responsiveness.
Dogfooding drives product improvement when your uptime depends on your platform.

Try it yourself

You don't need to replace your monitoring stack. Just expose one Tinybird endpoint, wire it into KEDA, and autoscale with real-time data. Start small. Move fast.

Skip the infra work. Ship your first API today.

Product /

Company /

Resources /

Integrations /

Use Cases /

Why we ditched Prometheus for autoscaling (and don't miss it)