---
title: "Why we ditched Prometheus for autoscaling (and don't miss it)"
excerpt: "Tinybird uses KEDA and its own real-time analytics platform to autoscale Kafka workloads. Learn how we made it work."
authors: "Victor M. Fernandez"
categories: "Scalable Analytics Architecture"
createdOn: "2025-06-11 10:00:00"
publishedOn: "2025-06-27 10:00:00"
updatedOn: "2025-06-27 10:00:00"
status: "published"
---

Every Tinybird user can ingest thousands of events per second, with 10x traffic spikes during launches or news cycles. Our platform needs to scale accordingly. CPU-based scaling was too slow. Memory-based, too vague. We needed to scale based on real signals, not lagging resource metrics. Traditional autoscaling broke the moment our queues backed up before Prometheus metrics could react.

So we turned to Tinybird and built a custom autoscaling system using live ingestion metrics and Kubernetes Event-driven Autoscaling (KEDA). No scraping delays, no extra monitoring stack to run.

## The Challenge: Unpredictable real-time workloads

Real-time analytics workloads are inherently unpredictable. Customer traffic can spike 10x during product launches, marketing campaigns, or breaking news events. The traditional autoscaling playbook fails because:

- **It's reactive, not predictive**: CPU spikes *after* your system is already overwhelmed.
- **It measures the wrong thing**: High CPU doesn't always mean you need more pods, sometimes you need smarter data routing.
- **It's painfully slow**: When new pods finally spin up, user requests may have already been delayed.

### The Kafka Bottleneck

Our `kafka` service is critical, it processes terabytes of data every day, ingesting data from external Kafka clusters and feeding it into our ClickHouse® infrastructure. For effective keda kafka autoscaling, metrics need to reflect actual processing load rather than generic resource usage. During peak hours, we might see:

- High-volume event streams from customer Kafka topics.
- Sudden spikes in data volume during customer campaigns.
- Varying message sizes and processing complexity.

We needed a solution that could scale based on the actual data processing demand, especially for keda kafka autoscaling scenarios, not just generic resource utilization.

## Enter KEDA: Kubernetes Event-Driven Autoscaling

### What Makes KEDA Different

[KEDA](https://keda.sh/) extends Horizontal Pod Autoscaler (HPA) to work with event-driven metrics. When comparing KEDA vs prometheus adapter, KEDA provides superior flexibility through custom metrics without the overhead of maintaining a separate monitoring system:

- **Custom Metrics**: Instead of generic CPU/Memory metrics, scale based on what actually matters - queue depth, message lag, API response times, or any custom business metric.
- **Multiple Scalers**: Combine different triggers (CPU, custom metrics, external APIs).

## Two Approaches: Traditional vs. Self-Reliance

We explored two different approaches for implementing KEDA autoscaling, each with distinct tradeoffs. Here's how both work and why we chose to use our own platform.

### Traditional Approach: Prometheus + KEDA

The typical setup involves running Prometheus to collect and expose application metrics. Your app publishes metrics at a `/metrics` endpoint in Prometheus format, which Prometheus scrapes at regular intervals. While KEDA prometheus integration is possible, it adds extra latency through multiple hops. KEDA then queries Prometheus to retrieve these metrics and make scaling decisions based on them.

#### KEDA Configuration

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  triggers:
  - type: prometheus
    metadata:
      serverAddress: http://prometheus:9090
      metricName: lag
      threshold: '1000'
      query: avg(lag)
```

We gave Prometheus a fair shot, but moved on because:

- **Multi-hop delays**: application → local Prometheus scrape → central Prometheus aggregation → federation to monitoring cluster → KEDA query → scaling decision. Each hop adds latency and potential failure points.
- **Query overhead**: KEDA polling Prometheus adds another layer of latency.
- **Stale data during spikes**: Metrics are most outdated when you need scaling most.

### Self-Reliance: Tinybird + KEDA

Instead of managing a Prometheus stack, we plugged KEDA directly into Tinybird’s **real-time metrics API**. No scraping. No delays. Just fresh ingestion data powering scaling decisions in seconds.

Because Tinybird can expose Prometheus-compatible endpoints, KEDA can pull live metrics from the source. This means faster scaling, simpler infrastructure, and autoscaling based on the same streaming data we already trust for analytics.

#### Step 1: Defining the right metrics

We identified a key metric for intelligent autoscaling:

- **Kafka Lag**: How far behind are our consumers?

#### Step 2: Tinybird-native metrics pipeline

Tinybird's native [Prometheus endpoint support](https://www.tinybird.co/blog-posts/tinybird-prometheus-endpoint-format) made it easy to expose this metric in the right format for KEDA.

Here's how we created our scaling metrics endpoint:

```tinybird
TOKEN "metric_lag" READ

NODE kafka_stats
SQL >
    %
    SELECT
        max(lag) as max_lag
    FROM kafka_ops_log
    where timestamp > now() - interval {{ Int32(seconds, 10) }} seconds
    {\% if defined(user_id) and user_id != '' \%}
        and user_id {{ String(operator, '=') }} {{ String(user_id) }}
    {\% end \%}

NODE kafka_metrics
SQL >
    SELECT
        arrayJoin(
            [
                map(
                    'name',
                    'max_lag',
                    'type',
                    'gauge',
                    'help',
                    'max ingestion lag',
                    'value',
                    toString(max_lag)
                )
            ]
        ) as metric
    FROM kafka_stats

NODE kafka_pre_prometheus
SQL >

    SELECT
        metric['name'] as name,
        metric['type'] as type,
        metric['help'] as help,
        toInt64(metric['value']) as value
    FROM kafka_metrics
```

This pipe returns data in Prometheus format when accessed via the `.prometheus` endpoint, e.g.:

```sh
curl -X GET \
  "${TINYBIRD_HOST}/v0/pipes/kafka_scaling_metrics.prometheus?seconds=30&user_id=user123&operator=%3D" \
  -H "Authorization: Bearer ${TINYBIRD_TOKEN}"
```

This approach allows us to compute scaling metrics in real time from the same data powering customer-facing analytics:

- **Zero scraping lag**: Metrics computed fresh when KEDA requests them.
- **Always fresh**: Every KEDA poll gets the latest data state.
- **No metric storage needed**: Metrics computed from streaming data, not pre-aggregated.

#### Step 3: KEDA configuration with metrics-api scaler

Here's how we wired everything together, connecting KEDA directly to our Tinybird Prometheus endpoint using the `metrics-api` scaler:

```yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: kafka-scaler
spec:
  scaleTargetRef:
    name: kafka-deployment
    kind: StatefulSet
  minReplicaCount: 2
  maxReplicaCount: 20
  pollingInterval: 30
  cooldownPeriod: 300
  triggers:
  - type: metrics-api
    metricType: AverageValue
    metadata:
      url: https://example.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
      format: prometheus
      targetValue: '1000'
      valueLocation: 'max_lag'
      authMode: 'apiKey'
      method: 'query'
      keyParamName: 'token'
    authenticationRef:
      name: kafka-keda-auth
```

#### Authentication Setup

For secure access to Tinybird endpoints, we set up proper authentication:

```yaml
apiVersion: v1
kind: Secret
metadata:
  name: keda-kafka-token
data:
  token: <base64-encoded-tinybird-token>
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
  name: kafka-keda-auth
spec:
  secretTargetRef:
  - parameter: apiKey
    name: keda-kafka-token
    key: token
```

## What broke with Prometheus

Running Prometheus at scale isn't just about the server, it's about the entire ecosystem:

| Component | Traditional Prometheus | Tinybird Approach |
| --------- | ---------------------- | ------------------- |
| **Metrics Storage** | Prometheus + persistent volumes | Optimized ClickHouse® |
| **High Availability** | Multiple Prometheus replicas + federation | Built-in HA |
| **Data Retention** | Configure retention policies, manage disk | Configure with SQL pipes |
| **Operational Overhead** | High: 3-4 services to manage | Low: Update SQL queries |

## How Tinybird fixed it

- **No metric infrastructure**: No exporters, no Prometheus, no additional storage layers.
- **Metrics calculated on request, not scraped periodically**: Metrics are computed fresh from live data every time KEDA polls.
- **Logic in SQL**: Update scaling behavior by editing a query, not redeploying code.
- **Built-in HA**: Tinybird handles availability.

## How running it ourselves made the product better

Every autoscaling issue impacted us directly, just as it would our customers. This led to:

- Faster fixes (because they affected us directly).
- Clearer error messages (we had to debug them ourselves).
- More reliable service (our uptime depended on it).

These discoveries directly improved our product for all customers.

## The Simulator

To pressure-test our scaling setup and validate edge-case behavior, we built a metrics simulation tool, written in Golang.

It generates metrics and displays them in a terminal UI with real‑time visualization and configurable patterns, and exposes an endpoint to get them.

![Metrics Simulator](demo.gif)
*[This is a demo. We intentionally set low thresholds and increased the scaling speed to showcase the behavior quickly.]*

### What We Learned

- **Stabilization windows**: 10-minute scale-up, 30-minute scale-down prevents thrashing.
- **Single metrics lie**: CPU alone scales too late; combining lag + CPU gives better signal-to-noise ratio.
- **Thresholds are workload-specific**: what works for batch processing fails for real-time streams.

### 1. Choosing bad metrics will kill your autoscaling

Not all metrics are equal for autoscaling:

- **Good metrics**: Queue depth, processing lag, business KPIs.
- **Poor metrics**: CPU utilization alone, memory usage without context.

### 2. Tune stabilization windows

Prevent scaling flapping with proper stabilization:

```yaml
behavior:
  scaleUp:
    stabilizationWindowSeconds: 600    # 10 minutes
  scaleDown:
    stabilizationWindowSeconds: 1800   # 30 minutes
```

### 3. Test with real traffic patterns

Our simulator helped us discover edge cases:

- Gradual vs. sudden traffic spikes behave differently.
- Weekend vs. weekday patterns require different thresholds.

### 4. Monitor everything

Use your own tools to monitor autoscaling:

- Track scaling events and their triggers.
- Measure time-to-scale and effectiveness.
- Set alerts for scaling failures or delays.

## Advanced patterns: Multi-trigger scaling

### Combining multiple metrics

Our production configuration uses mixed triggers: `metrics-api` and traditional CPU scaling.

```yaml
triggers:
- type: metrics-api
  metricType: AverageValue
  metadata:
    url: https://api.tinybird.co/v0/pipes/kafka_scaling_metrics.prometheus
    format: prometheus
    targetValue: '1000'
    valueLocation: 'max_lag'
    authMode: 'apiKey'
    method: 'query'
    keyParamName: 'token'
  authenticationRef:
    name: kafka-keda-auth
- type: cpu
  metricType: Utilization
  metadata:
    value: '70'
```

### Regional scaling strategies

For our multi-region deployment, we create region-specific Tinybird endpoints:

```yaml
# us-east-1 configuration
triggers:
- type: metrics-api
  metadata:
    url: https://api.us-east-1.tinybird.co/v0/pipes/kafka_scaling_metrics_us_east.prometheus
    targetValue: '2000'  # Higher threshold for region 1 (to accommodate higher baseline traffic and prevent unnecessary scaling)
    valueLocation: 'max_lag'

# eu-west-1 configuration
triggers:
- type: metrics-api
  metadata:
    url: https://api.eu-west-1.tinybird.co/v0/pipes/kafka_scaling_metrics_eu_west.prometheus
    targetValue: '500'   # Lower threshold for region 2 (to respond quickly in regions with less baseline traffic)
    valueLocation: 'max_lag'
```

## Troubleshooting common issues

### Scaling too aggressively

- **Problem**: Constant scaling up/down.
- **Solution**: Increase stabilization windows and adjust thresholds.

### Metrics not available

- **Problem**: KEDA can't reach Tinybird API endpoint.
- **Solution**: Check authentication token, endpoint URL, and any network policy restrictions.

## Conclusion: Scaling smarter

Combining KEDA with Tinybird gave us faster, simpler, and more reliable autoscaling, driven entirely by real-time data.

The combination of **KEDA's event-driven scaling** and **Tinybird's real-time metrics pipeline** created a feedback loop that actively improves our infrastructure's performance and cost-effectiveness.

1. **Custom metrics work better** than CPU/memory for workload-specific scaling.
2. **Real-time data beats pre-aggregated metrics** for scaling responsiveness.
3. **Dogfooding drives product improvement** when your uptime depends on your platform.

### Try it yourself

You don't need to replace your monitoring stack. Just expose one Tinybird endpoint, wire it into KEDA, and autoscale with real-time data. Start small. Move fast.

### Resources and Links

- [KEDA Documentation](https://keda.sh/)
- [Open observability in Tinybird with Prometheus endpoints](https://www.tinybird.co/blog-posts/tinybird-prometheus-endpoint-format)
- [Consume API endpoints in Prometheus format](https://www.tinybird.co/docs/forward/work-with-data/publish-data/guides/consume-api-endpoints-in-prometheus-format)
- [Prometheus Metrics Best Practices](https://prometheus.io/docs/practices/naming/)
