Skip to main content

Kubernetes HPA vs KEDA - Which Autoscaler Should You Use?

If you're running workloads on Kubernetes, you have two main autoscaling options: the built-in Horizontal Pod Autoscaler (HPA) or KEDA (Kubernetes Event-Driven Autoscaling). The right choice depends on what your pods actually do.

HPA is enough for most HTTP services that scale on CPU or memory. KEDA is what you need when your workload is driven by external events - queue messages, database rows, cron schedules, or custom metrics from outside the cluster.

HPA: What It Does Well

HPA ships with every Kubernetes cluster. No installation, no extra operators, no CRDs to manage. It watches a metric, compares it to your target, and adjusts replica count.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 3
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 65
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15

A few things worth noting in this config:

The behavior section was added in autoscaling/v2 and most teams don't use it. That's a mistake. Without it, HPA uses conservative defaults that scale up slowly. The config above says: scale up immediately (0s stabilization) by up to 100% every 15 seconds, but scale down only 10% per minute after waiting 5 minutes. Aggressive up, cautious down.

HPA evaluates metrics every 15 seconds by default (controlled by --horizontal-pod-autoscaler-sync-period on the controller manager). That's fast enough for most request-driven workloads.

HPA with Custom Metrics

HPA can scale on custom metrics through the Kubernetes metrics API, not just CPU and memory. You need a metrics adapter like Prometheus Adapter or Datadog Cluster Agent:

metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 1000

This works, but the setup is fiddly. You need:

  1. Prometheus (or similar) collecting your application metrics
  2. A metrics adapter translating Prometheus queries into the Kubernetes custom metrics API
  3. The correct RBAC for the adapter to register with the API aggregation layer

We've spent hours debugging "metric not found" errors that turned out to be a mismatch between the Prometheus metric name and what the adapter was exposing. It works once you get it right, but the initial setup is painful.

KEDA: When External Events Drive Your Workload

KEDA installs as an operator and adds a ScaledObject CRD. The difference from HPA: KEDA natively integrates with 60+ external event sources. It creates and manages HPA resources under the hood, so it's not replacing HPA - it's building on top of it.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor
spec:
scaleTargetRef:
name: order-processor
minReplicaCount: 0
maxReplicaCount: 100
pollingInterval: 15
cooldownPeriod: 300
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
queueLength: "5"
awsRegion: us-east-1
authenticationRef:
name: aws-credentials

That queueLength: "5" means KEDA targets 5 messages per replica. If the queue has 50 messages, KEDA sets desired replicas to 10. Simple, direct, and exactly what queue consumers need.

Scale to Zero

The biggest feature KEDA has over plain HPA: it can scale your deployment to zero replicas and bring it back when events arrive. HPA's minimum is 1.

For workloads that are idle most of the time - nightly batch jobs, event processors that handle a few thousand messages per day, staging environments - this saves real money. A deployment with 0 replicas costs nothing.

minReplicaCount: 0    # KEDA can do this
maxReplicaCount: 50
cooldownPeriod: 300 # Wait 5 min of inactivity before scaling to 0

The tradeoff: when the first event arrives after scaling to zero, there's a cold start. Pod scheduling + image pull + application startup can take 30-90 seconds. If that latency matters, keep minReplicaCount: 1.

KEDA Trigger Types That Matter

Out of KEDA's 60+ triggers, these are the ones we see used most in production:

TriggerUse caseNotes
aws-sqs-queueSQS consumersMost common. Scales on ApproximateNumberOfMessages
kafkaKafka consumer groupsScales on consumer lag per partition
rabbitmqRabbitMQ consumersScales on queue length
redis-streamsRedis stream processorsScales on pending messages in consumer group
prometheusAny custom metricFlexible but requires Prometheus
cronScheduled workloadsScale up at specific times, better than CronJobs for long-running work
postgresqlDB row processingScales on query result count

The Kafka trigger deserves special mention. It calculates consumer lag per partition and scales accordingly. If you have a 12-partition topic with varying lag across partitions, KEDA handles that correctly. Doing the same with plain HPA and custom metrics is significantly more work.

Direct Comparison

FeatureHPAKEDA
Built-in to K8sYesNo (requires install)
CPU/memory scalingYesYes (via KEDA)
Scale to zeroNo (min 1)Yes
External event sourcesVia custom metrics adapterNative (60+ triggers)
Queue-based scalingPossible but painfulFirst-class support
Scaling evaluation interval15s defaultConfigurable (default 30s)
Scale-up speedFast with behavior configFast
Scale-down speedConfigurableConfigurable (cooldownPeriod)
ComplexityLowMedium
Maintenance burdenNoneOperator upgrades, CRD management
CommunityKubernetes coreCNCF graduated project

When to Use HPA

Pick HPA when:

  • Your services are HTTP-based and scale on CPU, memory, or request count
  • You want zero additional dependencies in your cluster
  • You're already using Prometheus Adapter and it's working fine
  • Your workloads never need to scale to zero

HPA is the right default. Don't install KEDA just because it exists.

When to Use KEDA

Pick KEDA when:

  • You're consuming from SQS, Kafka, RabbitMQ, or any external queue
  • You need scale-to-zero for cost savings
  • You have multiple workloads driven by different event sources
  • Setting up custom metrics adapters for HPA feels like too much work for what you get
  • You want cron-based scaling mixed with event-based scaling

If even one of your services consumes from a message queue, KEDA is probably worth installing. Once it's there, you'll find yourself using it for more and more workloads.

Running Both Together

You don't have to choose one. KEDA manages HPA resources internally, so they coexist in the same cluster. Use HPA directly for simple CPU-based services, and KEDA for everything event-driven.

One rule: don't attach both an HPA and a KEDA ScaledObject to the same deployment. They'll fight over the replica count. KEDA creates its own HPA - let it manage that.

KEDA's Rough Edges

KEDA isn't perfect. Things we've hit in production:

Authentication management is clunky. KEDA needs credentials to poll external sources (AWS keys for SQS, Kafka credentials, etc.). You manage these through TriggerAuthentication CRDs, and the secret/auth rotation story isn't great. IRSA (IAM Roles for Service Accounts) works for AWS, but it took us a few tries to get the trust policy right.

Operator upgrades require care. KEDA CRDs change between versions. Upgrading from 2.x to 2.y sometimes requires manual CRD updates before the helm upgrade. Read the release notes.

Scaling decisions can lag behind HPA for simple metrics. KEDA's default polling interval is 30 seconds vs HPA's 15 seconds. For CPU-based scaling, HPA reacts faster out of the box. Set pollingInterval: 15 if this matters.

Debugging is harder. When HPA isn't scaling, you run kubectl describe hpa and see the metrics. When KEDA isn't scaling, you need to check KEDA operator logs, ScaledObject status, and the generated HPA. More moving parts means more places for things to break.

The ECS Perspective

If you're not on Kubernetes, this comparison doesn't apply directly. ECS doesn't have HPA or KEDA.

For ECS queue-based workloads, fast-autoscaler fills a similar role to KEDA - it reads queue metrics from SQS, Kafka, RabbitMQ, and others, then adjusts your ECS service's task count directly.

And whether you're on ECS or Kubernetes, stepscale AI sits above both as a tuning layer - analyzing your scaling history and optimizing the configuration parameters (thresholds, min/max counts, scaling ratios) that drive your autoscaler's decisions.

What to Do Next

  1. If you're running only HTTP services on K8s, start with HPA. Configure the behavior section properly - most scaling problems come from using defaults
  2. If you have queue consumers, install KEDA. Start with one ScaledObject for your highest-traffic queue and measure the improvement
  3. Audit your existing HPAs - check if the behavior section is configured. If not, you're using defaults that are probably too slow for scale-up
  4. For mixed ECS + K8s environments, look into a unified approach. Managing different autoscaling tools per platform gets messy fast