ECS Autoscaling Best Practices That Actually Work

stepscale · March 10, 2026 ·

#ecs#autoscaling#aws

AWS ECS autoscaling sounds simple on paper - set a target, let AWS handle the rest. In practice, most teams end up with services that scale too slowly, overshoot on the way up, and refuse to scale down. Here’s what we’ve learned running ECS autoscaling in production across dozens of services.

The short version: target tracking works for request-driven services, step scaling gives you control for batch workloads, and queue-based scaling is the only sane option if your services consume from SQS, Kafka, or similar queues.

The Three Autoscaling Approaches

ECS supports three autoscaling policy types. Each one fits a different workload shape.

Target Tracking

You pick a metric (CPU, memory, ALB request count) and a target value. AWS figures out how many tasks you need. This is the default recommendation from AWS, and it works well for HTTP services behind a load balancer.

# CloudFormation example
ScalingPolicy:
  Type: AWS::ApplicationAutoScaling::ScalingPolicy
  Properties:
    PolicyType: TargetTrackingScaling
    TargetTrackingScalingPolicyConfiguration:
      TargetValue: 60.0
      PredefinedMetricSpecification:
        PredefinedMetricType: ECSServiceAverageCPUUtilization
      ScaleInCooldown: 300
      ScaleOutCooldown: 60

The 60% CPU target is a good starting point. We’ve seen teams set it to 80% thinking they’ll save money - then wonder why their p99 latency spikes during scale-out because there’s no headroom left.

When to use it: API services, web backends, anything where CPU or request count correlates directly with load.

When it fails: Batch processors, queue consumers, services where CPU doesn’t reflect actual workload. A service consuming SQS messages might sit at 10% CPU while a queue of 50,000 messages builds up.

Step Scaling

You define specific thresholds and how many tasks to add or remove at each level. More work to configure, but you get precise control.

StepScalingPolicy:
  Type: AWS::ApplicationAutoScaling::ScalingPolicy
  Properties:
    PolicyType: StepScaling
    StepScalingPolicyConfiguration:
      AdjustmentType: ChangeInCapacity
      StepAdjustments:
        - MetricIntervalLowerBound: 0
          MetricIntervalUpperBound: 1000
          ScalingAdjustment: 2
        - MetricIntervalLowerBound: 1000
          MetricIntervalUpperBound: 5000
          ScalingAdjustment: 5
        - MetricIntervalLowerBound: 5000
          ScalingAdjustment: 10
      Cooldown: 60

The config above says: if the metric crosses the alarm threshold by 0-1000, add 2 tasks. By 1000-5000, add 5. Over 5000, add 10. This lets you react proportionally to different levels of load.

When to use it: When you know your scaling curve and want fine-grained control. Works well for workloads with predictable patterns where you’ve measured the relationship between metric and capacity.

When it fails: When you don’t know the right thresholds yet. Getting step scaling wrong means either over-provisioning or being too slow to react.

Queue-Based Scaling (Custom Metrics)

Neither target tracking nor step scaling works well for queue consumers. The problem is fundamental: CloudWatch CPU metrics tell you how busy your current tasks are, not how much work is waiting.

A service with 5 tasks at 20% CPU and 100,000 messages in the queue needs to scale up aggressively. But target tracking sees 20% CPU and thinks everything is fine.

The fix is scaling based on queue depth directly:

# Calculate desired tasks based on queue depth
messages_in_queue = 45000
messages_per_task = 1000  # each task processes ~1000 msg/min
desired_tasks = math.ceil(messages_in_queue / messages_per_task)
desired_tasks = max(min_tasks, min(desired_tasks, max_tasks))

This is where a Lambda-based scaling approach pays off. A small function that reads queue metrics every minute and updates your ECS service’s desired count directly bypasses CloudWatch-based autoscaling entirely, giving you sub-minute scaling latency for queue consumers. stepscale AI goes one step further: it learns the right messages-per-task ratio and min/max bounds from your historical workload, so you do not have to guess those values yourself.

When to use it: Any service consuming from SQS, Kafka, RabbitMQ, Redis queues, or Kinesis streams. Also useful for services processing S3 event notifications.

Cooldown Periods: The Most Misconfigured Setting

Cooldowns prevent your service from scaling up and down repeatedly (thrashing). AWS defaults are often too conservative.

Here’s what we’ve found works:

Scenario	Scale-out cooldown	Scale-in cooldown
API service (target tracking)	60s	300s
Queue consumer (step scaling)	30s	120s
Batch processor	60s	600s

The pattern: scale out fast, scale in slow. When load hits, you want tasks up quickly. When load drops, wait longer to confirm it’s actually gone before removing capacity.

A common mistake: setting scale-in cooldown to 60 seconds. What happens is traffic drops briefly during a lull, tasks get removed, then traffic comes back and you’re scrambling to scale up again. 300 seconds minimum for scale-in on any production service.

Min and Max Task Counts

Setting minCapacity and maxCapacity wrong causes the most common autoscaling failures.

Min tasks too low: Setting min to 1 means a cold start on every traffic spike. If your service takes 45 seconds to start and register with the load balancer, you’ll have a full minute of degraded service on every scale-out from 1.

For production services, calculate your minimum based on:

Baseline traffic during your quietest hour
How many tasks you need to handle that traffic at 50% capacity (leaving headroom)

If your quietest hour needs 3 tasks at full load, set min to 2.

Max tasks too low: We’ve seen teams set max to 20 “to control costs” and then eat a full outage when a marketing campaign drives 10x normal traffic. Your max should be your absolute ceiling based on what your VPC, database connections, and downstream services can handle - not a cost control measure. Use billing alerts for cost control instead.

Max tasks too high: Less common, but setting max to 1000 when your RDS instance can only handle 200 connections means autoscaling could bring down your database. Know your downstream limits.

The Task Startup Time Problem

ECS autoscaling has a built-in delay that most teams underestimate:

Scaling decision made (0s)
New task provisioned by ECS (5-15s for Fargate, longer for EC2)
Container image pulled (10-60s depending on image size)
Application starts (varies - 5s to 120s)
Health check passes (your health check interval + 1 successful check)
ALB target registration (15-30s)

Total: 45 seconds to 4+ minutes from decision to serving traffic.

This means your scaling needs to be predictive, not reactive. By the time your new tasks are serving traffic, the spike might already be over.

Fixes:

Keep images small. A 50MB image pulls in 5 seconds. A 2GB image takes a minute. Use multi-stage builds, alpine base images, and avoid shipping build tools in your production image.
Reduce health check intervals. If your health check runs every 30 seconds with a 3-success threshold, that’s 90 seconds just for health checks. Use 10-second intervals with 2-success threshold for faster registration.
Pre-warm during known peak hours. If you know traffic spikes at 9am, schedule a scale-up at 8:50am. Use scheduled scaling alongside your dynamic policy.

ScheduledAction:
  Type: AWS::ApplicationAutoScaling::ScheduledAction
  Properties:
    ScheduledActionName: morning-warmup
    Schedule: "cron(50 8 ? * MON-FRI *)"
    ScalableTargetAction:
      MinCapacity: 10

Scaling Based on Multiple Metrics

A single metric is rarely enough. Your API service might need to scale on both CPU and request count:

CPU handles compute-heavy requests that process data
Request count handles lightweight requests that don’t use much CPU but need connection slots

ECS lets you attach multiple scaling policies to one service. The policies operate independently - whichever one calls for the most tasks wins. This is the correct behavior: if either metric says you need more capacity, you should get more capacity.

Don’t try to build a single composite metric by averaging CPU and request count. It dilutes both signals.

Monitoring Your Autoscaling

You can’t improve what you don’t measure. Track these:

Scaling event frequency: If you’re scaling more than 10 times per hour, your thresholds or cooldowns are wrong
Time between scale-out decision and task serving traffic: This is your actual scaling latency
Queue depth at scale-out trigger vs peak queue depth: If the queue hits 50k before scaling kicks in but peaks at 200k, you’re scaling too late
Cost per scaling event: Each task-hour costs money. Thrashing wastes it

CloudWatch dashboards work for this, but stepscale AI takes it further by analyzing your scaling patterns over time and automatically tuning your thresholds, cooldowns, and min/max values based on actual workload data. Instead of manually adjusting these numbers, the AI learns your traffic patterns and optimizes the configuration for you.

Common Mistakes We See

1. Using CPU scaling for queue consumers. Already covered this, but it’s the #1 mistake. CPU tells you how busy tasks are, not how much work is waiting.

2. Not testing autoscaling before production. Run a load test that simulates your actual traffic pattern. Steady ramp-up, sudden spike, gradual decline. Watch how your scaling responds.

3. Ignoring downstream limits. Your ECS service might scale to 100 tasks, but if your RDS instance only handles 50 connections and each task opens 2, you’ve just created a database outage. Always check: database connections, API rate limits, NAT gateway bandwidth, and any shared resources.

4. Setting identical scale-out and scale-in thresholds. If you scale out at 60% CPU and scale in at 59% CPU, you’ll thrash endlessly. Create a gap: scale out at 70%, scale in at 40%.

5. Forgetting about Fargate spot termination. If you use Fargate Spot for cost savings, your tasks can be interrupted with 30 seconds notice. Your min capacity should use regular Fargate, with spot only for additional capacity. Mix capacity providers:

CapacityProviderStrategy:
  - CapacityProvider: FARGATE
    Base: 3
    Weight: 1
  - CapacityProvider: FARGATE_SPOT
    Weight: 3

This keeps 3 tasks on regular Fargate (stable base) and adds spot tasks at a 3:1 ratio for scale-out.

What to Do Next

Audit your current scaling policies. Check if you’re using the right policy type for your workload pattern
Measure your actual scaling latency. Time from decision to traffic-serving. If it’s over 2 minutes, optimize your startup time
Review your cooldowns. Scale-out should be 60s or less. Scale-in should be 300s or more
Set up scaling dashboards. You need visibility into how often you scale and whether it’s working
For queue-based workloads, move off CPU metrics and onto queue depth. stepscale AI handles the math: it picks the right messages-per-task ratio, min/max bounds, and cooldowns from your historical traffic instead of relying on guesses

Queue-Based Autoscaling on AWS - How to scale ECS and Lambda based on SQS, Kafka, and Redis queue depth
How to Reduce AWS ECS Costs - Practical strategies to cut your ECS bill by 30-50%
Kubernetes HPA vs KEDA - Which autoscaler to use for your Kubernetes workloads
Why Choose Fargate Over EKS with EC2 - When Fargate beats Kubernetes on EC2