How to Reduce AWS ECS Costs Without Sacrificing Performance

stepscale · April 15, 2026 ·

#ecs#cost#fargate#spot

Most ECS bills are 30-50% higher than they need to be. The waste comes from over-provisioned tasks, poor autoscaling configuration, and not taking advantage of pricing options that AWS offers but doesn’t advertise loudly.

Here are the changes that make the biggest impact, ordered by effort vs savings.

1. Right-Size Your Task Definitions

This is the single highest-impact change. Most teams copy task definitions from examples or other services and never revisit them.

A task defined as 1 vCPU / 2GB RAM that averages 15% CPU and 400MB memory is wasting 85% of its compute allocation. You’re paying for 1 vCPU but using 0.15.

How to find the right size:

Pull your CloudWatch metrics for the past 2 weeks. Look at p95 CPU and memory, not average. You want to size for your peaks, not your baseline.

aws cloudwatch get-metric-statistics \
  --namespace AWS/ECS \
  --metric-name CPUUtilization \
  --dimensions Name=ClusterName,Value=prod Name=ServiceName,Value=api \
  --start-time 2026-03-25T00:00:00Z \
  --end-time 2026-04-08T00:00:00Z \
  --period 3600 \
  --statistics p95 Maximum

If your p95 CPU is 45% of 1 vCPU, you can safely drop to 0.5 vCPU. Your p95 becomes 90% of 0.5 vCPU, which still leaves headroom for spikes.

Fargate size options:

vCPU	Memory options	Price/hr (us-east-1)
0.25	0.5, 1, 2 GB	$0.012
0.5	1, 2, 3, 4 GB	$0.025
1	2, 3, 4, 5, 6, 7, 8 GB	$0.049
2	4-16 GB	$0.099
4	8-30 GB	$0.198

Dropping from 1 vCPU to 0.5 vCPU cuts your per-task cost in half. For a service running 20 tasks 24/7, that’s roughly $350/month saved on a single service.

One catch: Fargate has specific vCPU/memory combinations. You can’t run 0.25 vCPU with 8GB RAM. Check the Fargate task size table before changing.

2. Use Fargate Spot for Non-Critical Workloads

Fargate Spot runs your tasks on spare AWS capacity at up to 70% discount. The catch: AWS can terminate your tasks with 30 seconds notice when they need the capacity back.

This is fine for:

Queue consumers (messages get retried)
Batch processing jobs
Dev/staging environments
Any workload with built-in retry logic

Not great for:

API servers handling live user traffic
Singleton services (leader election, schedulers)
Long-running tasks that can’t checkpoint their progress

The smart approach is mixing capacity providers:

{
  "capacityProviderStrategy": [
    {
      "capacityProvider": "FARGATE",
      "base": 3,
      "weight": 1
    },
    {
      "capacityProvider": "FARGATE_SPOT",
      "weight": 3
    }
  ]
}

This keeps 3 tasks on regular Fargate (guaranteed base) and runs additional tasks on Spot at a 3:1 ratio. If you scale to 15 tasks total, 3 run on Fargate and 12 on Spot. Your base capacity is safe, and you save ~70% on the burst capacity.

Real savings: A service that scales between 5 and 40 tasks throughout the day, running at an average of 18 tasks:

All Fargate: 18 tasks * $0.049/hr * 730 hrs = $644/month
Mixed (3 base + 15 avg Spot): (3 * $0.049 + 15 * $0.015) * 730 = $271/month
Savings: $373/month (58%) on one service

3. Fix Your Autoscaling

Bad autoscaling is the second biggest cost driver. The two failure modes:

Scaling up too aggressively: Setting a low CPU target (40%) or adding too many tasks per step causes overshoot. Traffic spikes, you scale from 5 to 25 tasks, traffic normalizes, and you’re paying for 25 tasks during the 5-minute scale-in cooldown.

Not scaling down at all: We’ve audited ECS clusters where services ran at minimum task count during peak hours but still had 10+ tasks running at 3am. The scale-in policy was either missing or the cooldown was so long that brief traffic increases kept resetting it.

Fixes:

Set appropriate scale-in cooldowns. 300 seconds is enough for most services. If your traffic fluctuates on a 5-minute cycle, bump to 600. Anything over 900 seconds is probably costing you money.

Use scheduled scaling for predictable patterns. If traffic drops 80% between midnight and 6am every night, don’t rely on reactive autoscaling to figure that out. Schedule it:

NightScale:
  Schedule: "cron(0 0 * * ? *)"  # midnight
  ScalableTargetAction:
    MinCapacity: 2
    MaxCapacity: 20

MorningScale:
  Schedule: "cron(0 6 * * ? *)"  # 6am
  ScalableTargetAction:
    MinCapacity: 8
    MaxCapacity: 100

Use the right scaling metric. If your service is a queue consumer, scale on queue depth, not CPU. A service sitting at 10% CPU with 50,000 messages queued needs more tasks, not fewer. See our queue-based autoscaling guide for details.

If you have more than a few services and keeping scaling configs optimal feels like a part-time job, stepscale AI handles this automatically. It monitors your scaling patterns and tunes thresholds, min/max values, and cooldowns based on actual workload data.

4. Reduce Container Image Size

Large images cost money in two ways:

ECR storage: $0.10/GB/month. A 2GB image across 10 tags = $2/month. Not huge, but it adds up with many services.
Task startup time: A 2GB image takes 30-60 seconds to pull. During that time, your other tasks handle the load while you wait for new capacity. Faster pull = faster scaling = less over-provisioning needed.

Before and after optimizing a Node.js service image:

Image	Size	Pull time
`node:18` base	1.1 GB	35s
`node:18-slim` base	240 MB	12s
`node:18-alpine` base	170 MB	8s
Multi-stage + alpine	95 MB	5s

A multi-stage Dockerfile that builds in one stage and copies only the runtime artifacts to a minimal base image:

# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build

# Runtime stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER node
CMD ["node", "dist/index.js"]

For Python services, the difference is even more dramatic. A typical Python ML service with scikit-learn and pandas can be 3-4GB with default images. Using python:3.11-slim and only installing production dependencies brings it under 500MB.

5. Use ARM64 (Graviton) Processors

AWS Graviton processors (ARM64) are 20% cheaper than x86 for the same vCPU/memory on Fargate, and they deliver equivalent or better performance for most workloads.

Switching is straightforward if your application doesn’t have x86-specific binary dependencies:

{
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  }
}

You need ARM64-compatible container images. Most base images (node, python, go, java) publish multi-arch manifests, so if you rebuild your images they’ll pull the right architecture automatically.

20% savings for changing one line in your task definition. This is the best effort-to-savings ratio after right-sizing.

Things that might break:

Services with native C/C++ dependencies compiled for x86 (rebuild them)
Very old base images that don’t publish ARM variants
Tools like sharp (Node.js image processing) that need ARM-specific setup

Test your service on ARM first, but for standard web services and queue consumers, it just works.

6. Review ECS Exec and Logging Costs

Hidden costs that add up:

CloudWatch Logs: Every console.log and print statement costs money. A verbose service generating 50GB of logs per month costs ~$25 in ingestion + $1.25/month in storage. Multiply by 20 services and you’re at $500/month on logs.

Fix: Set your log level to WARN in production. Use INFO only for meaningful events, not per-request logging. If you need detailed logs for debugging, enable them temporarily with a feature flag.

ECS Exec: If you enabled executeCommand on your service for debugging, each task runs an SSM agent sidecar that consumes ~256MB of memory. That memory is reserved even when you’re not using ECS Exec. Disable it on services where you don’t need interactive debugging.

Container Insights: If enabled at the cluster level, Container Insights sends detailed metrics to CloudWatch at $0.01 per custom metric per month. For a large cluster, this can be $50-200/month. Useful, but make sure you’re actually looking at the dashboards.

7. Consolidate Small Services

Microservice architectures can lead to many small services, each running 2-3 tasks minimum. If you have 30 services with minCapacity: 2, that’s 60 tasks running 24/7 as a baseline.

Look for services that:

Handle very low traffic (< 10 requests/minute)
Share the same technology stack
Are logically related

Two services that each handle 5 req/min at 0.25 vCPU could be combined into one service at 0.25 vCPU, cutting your task count in half. This doesn’t mean abandoning microservices - it means being pragmatic about where the service boundary provides real value.

Cost Reduction Checklist

Here’s a practical order to tackle these, from quickest wins to larger projects:

Action	Effort	Typical savings	Priority
Switch to ARM64 (Graviton)	Low - one config change	20% per task	Do this week
Right-size task definitions	Low - metrics review	30-50% per task	Do this week
Fix autoscaling cooldowns	Low - config change	10-20% overall	Do this week
Add Fargate Spot for workers	Medium - test reliability	50-70% on spot tasks	This month
Reduce image sizes	Medium - Dockerfile refactor	Indirect (faster scaling)	This month
Add scheduled scaling	Medium - identify patterns	15-30% off-hours	This month
Consolidate services	High - architecture change	Variable	When appropriate
Review logging levels	Low - config change	$50-500/month	Do this week

Automating Cost Optimization

The hardest part of cost optimization isn’t the initial changes - it’s keeping them optimized. Traffic patterns shift, services get new features that change their resource profile, and new services get deployed with copy-pasted configs that were wrong to begin with.

This is the problem stepscale AI is designed to solve. It continuously analyzes your workload metrics and identifies where your autoscaling configuration is costing you money - thresholds that are too aggressive, min/max values that don’t match your actual traffic, and cooldowns that cause over-provisioning. The AI tunes these parameters automatically across both ECS and Kubernetes, so your cost optimization does not degrade as the workload evolves.

ECS Autoscaling Best Practices - Configure scaling policies correctly to avoid overprovisioning
Queue-Based Autoscaling on AWS - Scale queue consumers without overpaying for idle capacity
Kubernetes HPA vs KEDA - Scale-to-zero and cost-efficient autoscaling on Kubernetes
Why Choose Fargate Over EKS with EC2 - When Fargate actually costs less than EC2