Kubernetes HPA vs KEDA - Which Autoscaler Should You Use?

stepscale · April 4, 2026 ·

#kubernetes#hpa#keda#autoscaling

If you’re running workloads on Kubernetes, you have two main autoscaling options: the built-in Horizontal Pod Autoscaler (HPA) or KEDA (Kubernetes Event-Driven Autoscaling). The right choice depends on what your pods actually do.

HPA is enough for most HTTP services that scale on CPU or memory. KEDA is what you need when your workload is driven by external events - queue messages, database rows, cron schedules, or custom metrics from outside the cluster.

HPA: What It Does Well

HPA ships with every Kubernetes cluster. No installation, no extra operators, no CRDs to manage. It watches a metric, compares it to your target, and adjusts replica count.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-service
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-service
  minReplicas: 3
  maxReplicas: 50
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 65
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15

A few things worth noting in this config:

The behavior section was added in autoscaling/v2 and most teams don’t use it. That’s a mistake. Without it, HPA uses conservative defaults that scale up slowly. The config above says: scale up immediately (0s stabilization) by up to 100% every 15 seconds, but scale down only 10% per minute after waiting 5 minutes. Aggressive up, cautious down.

HPA evaluates metrics every 15 seconds by default (controlled by --horizontal-pod-autoscaler-sync-period on the controller manager). That’s fast enough for most request-driven workloads.

HPA with Custom Metrics

HPA can scale on custom metrics through the Kubernetes metrics API, not just CPU and memory. You need a metrics adapter like Prometheus Adapter or Datadog Cluster Agent:

metrics:
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: 1000

This works, but the setup is fiddly. You need:

Prometheus (or similar) collecting your application metrics
A metrics adapter translating Prometheus queries into the Kubernetes custom metrics API
The correct RBAC for the adapter to register with the API aggregation layer

We’ve spent hours debugging “metric not found” errors that turned out to be a mismatch between the Prometheus metric name and what the adapter was exposing. It works once you get it right, but the initial setup is painful.

KEDA: When External Events Drive Your Workload

KEDA installs as an operator and adds a ScaledObject CRD. The difference from HPA: KEDA natively integrates with 60+ external event sources. It creates and manages HPA resources under the hood, so it’s not replacing HPA - it’s building on top of it.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor
spec:
  scaleTargetRef:
    name: order-processor
  minReplicaCount: 0
  maxReplicaCount: 100
  pollingInterval: 15
  cooldownPeriod: 300
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
      queueLength: "5"
      awsRegion: us-east-1
    authenticationRef:
      name: aws-credentials

That queueLength: "5" means KEDA targets 5 messages per replica. If the queue has 50 messages, KEDA sets desired replicas to 10. Simple, direct, and exactly what queue consumers need.

Scale to Zero

The biggest feature KEDA has over plain HPA: it can scale your deployment to zero replicas and bring it back when events arrive. HPA’s minimum is 1.

For workloads that are idle most of the time - nightly batch jobs, event processors that handle a few thousand messages per day, staging environments - this saves real money. A deployment with 0 replicas costs nothing.

minReplicaCount: 0    # KEDA can do this
maxReplicaCount: 50
cooldownPeriod: 300   # Wait 5 min of inactivity before scaling to 0

The tradeoff: when the first event arrives after scaling to zero, there’s a cold start. Pod scheduling + image pull + application startup can take 30-90 seconds. If that latency matters, keep minReplicaCount: 1.

KEDA Trigger Types That Matter

Out of KEDA’s 60+ triggers, these are the ones we see used most in production:

Trigger	Use case	Notes
`aws-sqs-queue`	SQS consumers	Most common. Scales on ApproximateNumberOfMessages
`kafka`	Kafka consumer groups	Scales on consumer lag per partition
`rabbitmq`	RabbitMQ consumers	Scales on queue length
`redis-streams`	Redis stream processors	Scales on pending messages in consumer group
`prometheus`	Any custom metric	Flexible but requires Prometheus
`cron`	Scheduled workloads	Scale up at specific times, better than CronJobs for long-running work
`postgresql`	DB row processing	Scales on query result count

The Kafka trigger deserves special mention. It calculates consumer lag per partition and scales accordingly. If you have a 12-partition topic with varying lag across partitions, KEDA handles that correctly. Doing the same with plain HPA and custom metrics is significantly more work.

Direct Comparison

Feature	HPA	KEDA
Built-in to K8s	Yes	No (requires install)
CPU/memory scaling	Yes	Yes (via KEDA)
Scale to zero	No (min 1)	Yes
External event sources	Via custom metrics adapter	Native (60+ triggers)
Queue-based scaling	Possible but painful	First-class support
Scaling evaluation interval	15s default	Configurable (default 30s)
Scale-up speed	Fast with behavior config	Fast
Scale-down speed	Configurable	Configurable (cooldownPeriod)
Complexity	Low	Medium
Maintenance burden	None	Operator upgrades, CRD management
Community	Kubernetes core	CNCF graduated project

When to Use HPA

Pick HPA when:

Your services are HTTP-based and scale on CPU, memory, or request count
You want zero additional dependencies in your cluster
You’re already using Prometheus Adapter and it’s working fine
Your workloads never need to scale to zero

HPA is the right default. Don’t install KEDA just because it exists.

When to Use KEDA

Pick KEDA when:

You’re consuming from SQS, Kafka, RabbitMQ, or any external queue
You need scale-to-zero for cost savings
You have multiple workloads driven by different event sources
Setting up custom metrics adapters for HPA feels like too much work for what you get
You want cron-based scaling mixed with event-based scaling

If even one of your services consumes from a message queue, KEDA is probably worth installing. Once it’s there, you’ll find yourself using it for more and more workloads.

Running Both Together

You don’t have to choose one. KEDA manages HPA resources internally, so they coexist in the same cluster. Use HPA directly for simple CPU-based services, and KEDA for everything event-driven.

One rule: don’t attach both an HPA and a KEDA ScaledObject to the same deployment. They’ll fight over the replica count. KEDA creates its own HPA - let it manage that.

KEDA’s Rough Edges

KEDA isn’t perfect. Things we’ve hit in production:

Authentication management is clunky. KEDA needs credentials to poll external sources (AWS keys for SQS, Kafka credentials, etc.). You manage these through TriggerAuthentication CRDs, and the secret/auth rotation story isn’t great. IRSA (IAM Roles for Service Accounts) works for AWS, but it took us a few tries to get the trust policy right.

Operator upgrades require care. KEDA CRDs change between versions. Upgrading from 2.x to 2.y sometimes requires manual CRD updates before the helm upgrade. Read the release notes.

Scaling decisions can lag behind HPA for simple metrics. KEDA’s default polling interval is 30 seconds vs HPA’s 15 seconds. For CPU-based scaling, HPA reacts faster out of the box. Set pollingInterval: 15 if this matters.

Debugging is harder. When HPA isn’t scaling, you run kubectl describe hpa and see the metrics. When KEDA isn’t scaling, you need to check KEDA operator logs, ScaledObject status, and the generated HPA. More moving parts means more places for things to break.

The ECS Perspective

If you’re not on Kubernetes, this comparison doesn’t apply directly. ECS doesn’t have HPA or KEDA.

For ECS queue-based workloads, the equivalent pattern is a Lambda-based scaler that reads queue metrics from SQS, Kafka, RabbitMQ, or Redis and updates your ECS service’s desired count on a 1-2 minute cadence. The mechanics are simpler than KEDA but you give up the rich trigger ecosystem.

Whether you’re on ECS or Kubernetes, stepscale AI sits above both as a tuning layer. It analyzes your scaling history and optimizes the configuration parameters (thresholds, min/max counts, scaling ratios, cooldowns) that drive your autoscaler’s decisions, so you stop hand-tuning these values per service.

What to Do Next

If you’re running only HTTP services on K8s, start with HPA. Configure the behavior section properly - most scaling problems come from using defaults
If you have queue consumers, install KEDA. Start with one ScaledObject for your highest-traffic queue and measure the improvement
Audit your existing HPAs - check if the behavior section is configured. If not, you’re using defaults that are probably too slow for scale-up
For mixed ECS + K8s environments, look into a unified approach. Managing different autoscaling tools per platform gets messy fast

ECS Autoscaling Best Practices - Target tracking, step scaling, and queue-based scaling patterns for ECS
Queue-Based Autoscaling on AWS - Deep dive into scaling from SQS, Kafka, RabbitMQ, and Redis
How to Reduce AWS ECS Costs - Cost optimization strategies that work across ECS and Kubernetes
Why Choose Fargate Over EKS with EC2 - When to skip Kubernetes entirely