How to Reduce AWS ECS Costs Without Sacrificing Performance
Most ECS bills are 30-50% higher than they need to be. The waste comes from over-provisioned tasks, poor autoscaling configuration, and not taking advantage of pricing options that AWS offers but doesn't advertise loudly.
Here are the changes that make the biggest impact, ordered by effort vs savings.
1. Right-Size Your Task Definitions
This is the single highest-impact change. Most teams copy task definitions from examples or other services and never revisit them.
A task defined as 1 vCPU / 2GB RAM that averages 15% CPU and 400MB memory is wasting 85% of its compute allocation. You're paying for 1 vCPU but using 0.15.
How to find the right size:
Pull your CloudWatch metrics for the past 2 weeks. Look at p95 CPU and memory, not average. You want to size for your peaks, not your baseline.
aws cloudwatch get-metric-statistics \
--namespace AWS/ECS \
--metric-name CPUUtilization \
--dimensions Name=ClusterName,Value=prod Name=ServiceName,Value=api \
--start-time 2026-03-25T00:00:00Z \
--end-time 2026-04-08T00:00:00Z \
--period 3600 \
--statistics p95 Maximum
If your p95 CPU is 45% of 1 vCPU, you can safely drop to 0.5 vCPU. Your p95 becomes 90% of 0.5 vCPU, which still leaves headroom for spikes.
Fargate size options:
| vCPU | Memory options | Price/hr (us-east-1) |
|---|---|---|
| 0.25 | 0.5, 1, 2 GB | $0.012 |
| 0.5 | 1, 2, 3, 4 GB | $0.025 |
| 1 | 2, 3, 4, 5, 6, 7, 8 GB | $0.049 |
| 2 | 4-16 GB | $0.099 |
| 4 | 8-30 GB | $0.198 |
Dropping from 1 vCPU to 0.5 vCPU cuts your per-task cost in half. For a service running 20 tasks 24/7, that's roughly $350/month saved on a single service.
One catch: Fargate has specific vCPU/memory combinations. You can't run 0.25 vCPU with 8GB RAM. Check the Fargate task size table before changing.
2. Use Fargate Spot for Non-Critical Workloads
Fargate Spot runs your tasks on spare AWS capacity at up to 70% discount. The catch: AWS can terminate your tasks with 30 seconds notice when they need the capacity back.
This is fine for:
- Queue consumers (messages get retried)
- Batch processing jobs
- Dev/staging environments
- Any workload with built-in retry logic
Not great for:
- API servers handling live user traffic
- Singleton services (leader election, schedulers)
- Long-running tasks that can't checkpoint their progress
The smart approach is mixing capacity providers:
{
"capacityProviderStrategy": [
{
"capacityProvider": "FARGATE",
"base": 3,
"weight": 1
},
{
"capacityProvider": "FARGATE_SPOT",
"weight": 3
}
]
}
This keeps 3 tasks on regular Fargate (guaranteed base) and runs additional tasks on Spot at a 3:1 ratio. If you scale to 15 tasks total, 3 run on Fargate and 12 on Spot. Your base capacity is safe, and you save ~70% on the burst capacity.
Real savings: A service that scales between 5 and 40 tasks throughout the day, running at an average of 18 tasks:
- All Fargate: 18 tasks * $0.049/hr * 730 hrs = $644/month
- Mixed (3 base + 15 avg Spot): (3 * $0.049 + 15 * $0.015) * 730 = $271/month
- Savings: $373/month (58%) on one service
3. Fix Your Autoscaling
Bad autoscaling is the second biggest cost driver. The two failure modes:
Scaling up too aggressively: Setting a low CPU target (40%) or adding too many tasks per step causes overshoot. Traffic spikes, you scale from 5 to 25 tasks, traffic normalizes, and you're paying for 25 tasks during the 5-minute scale-in cooldown.
Not scaling down at all: We've audited ECS clusters where services ran at minimum task count during peak hours but still had 10+ tasks running at 3am. The scale-in policy was either missing or the cooldown was so long that brief traffic increases kept resetting it.
Fixes:
Set appropriate scale-in cooldowns. 300 seconds is enough for most services. If your traffic fluctuates on a 5-minute cycle, bump to 600. Anything over 900 seconds is probably costing you money.
Use scheduled scaling for predictable patterns. If traffic drops 80% between midnight and 6am every night, don't rely on reactive autoscaling to figure that out. Schedule it:
NightScale:
Schedule: "cron(0 0 * * ? *)" # midnight
ScalableTargetAction:
MinCapacity: 2
MaxCapacity: 20
MorningScale:
Schedule: "cron(0 6 * * ? *)" # 6am
ScalableTargetAction:
MinCapacity: 8
MaxCapacity: 100
Use the right scaling metric. If your service is a queue consumer, scale on queue depth, not CPU. A service sitting at 10% CPU with 50,000 messages queued needs more tasks, not fewer. See our queue-based autoscaling guide for details.
If you have more than a few services and keeping scaling configs optimal feels like a part-time job, stepscale AI handles this automatically. It monitors your scaling patterns and tunes thresholds, min/max values, and cooldowns based on actual workload data.
4. Reduce Container Image Size
Large images cost money in two ways:
- ECR storage: $0.10/GB/month. A 2GB image across 10 tags = $2/month. Not huge, but it adds up with many services.
- Task startup time: A 2GB image takes 30-60 seconds to pull. During that time, your other tasks handle the load while you wait for new capacity. Faster pull = faster scaling = less over-provisioning needed.
Before and after optimizing a Node.js service image:
| Image | Size | Pull time |
|---|---|---|
node:18 base | 1.1 GB | 35s |
node:18-slim base | 240 MB | 12s |
node:18-alpine base | 170 MB | 8s |
| Multi-stage + alpine | 95 MB | 5s |
A multi-stage Dockerfile that builds in one stage and copies only the runtime artifacts to a minimal base image:
# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY . .
RUN npm run build
# Runtime stage
FROM node:18-alpine
WORKDIR /app
COPY --from=builder /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/package.json ./
USER node
CMD ["node", "dist/index.js"]
For Python services, the difference is even more dramatic. A typical Python ML service with scikit-learn and pandas can be 3-4GB with default images. Using python:3.11-slim and only installing production dependencies brings it under 500MB.
5. Use ARM64 (Graviton) Processors
AWS Graviton processors (ARM64) are 20% cheaper than x86 for the same vCPU/memory on Fargate, and they deliver equivalent or better performance for most workloads.
Switching is straightforward if your application doesn't have x86-specific binary dependencies:
{
"runtimePlatform": {
"cpuArchitecture": "ARM64",
"operatingSystemFamily": "LINUX"
}
}
You need ARM64-compatible container images. Most base images (node, python, go, java) publish multi-arch manifests, so if you rebuild your images they'll pull the right architecture automatically.
20% savings for changing one line in your task definition. This is the best effort-to-savings ratio after right-sizing.
Things that might break:
- Services with native C/C++ dependencies compiled for x86 (rebuild them)
- Very old base images that don't publish ARM variants
- Tools like
sharp(Node.js image processing) that need ARM-specific setup
Test your service on ARM first, but for standard web services and queue consumers, it just works.
6. Review ECS Exec and Logging Costs
Hidden costs that add up:
CloudWatch Logs: Every console.log and print statement costs money. A verbose service generating 50GB of logs per month costs ~$25 in ingestion + $1.25/month in storage. Multiply by 20 services and you're at $500/month on logs.
Fix: Set your log level to WARN in production. Use INFO only for meaningful events, not per-request logging. If you need detailed logs for debugging, enable them temporarily with a feature flag.
ECS Exec: If you enabled executeCommand on your service for debugging, each task runs an SSM agent sidecar that consumes ~256MB of memory. That memory is reserved even when you're not using ECS Exec. Disable it on services where you don't need interactive debugging.
Container Insights: If enabled at the cluster level, Container Insights sends detailed metrics to CloudWatch at $0.01 per custom metric per month. For a large cluster, this can be $50-200/month. Useful, but make sure you're actually looking at the dashboards.
7. Consolidate Small Services
Microservice architectures can lead to many small services, each running 2-3 tasks minimum. If you have 30 services with minCapacity: 2, that's 60 tasks running 24/7 as a baseline.
Look for services that:
- Handle very low traffic (< 10 requests/minute)
- Share the same technology stack
- Are logically related
Two services that each handle 5 req/min at 0.25 vCPU could be combined into one service at 0.25 vCPU, cutting your task count in half. This doesn't mean abandoning microservices - it means being pragmatic about where the service boundary provides real value.
Cost Reduction Checklist
Here's a practical order to tackle these, from quickest wins to larger projects:
| Action | Effort | Typical savings | Priority |
|---|---|---|---|
| Switch to ARM64 (Graviton) | Low - one config change | 20% per task | Do this week |
| Right-size task definitions | Low - metrics review | 30-50% per task | Do this week |
| Fix autoscaling cooldowns | Low - config change | 10-20% overall | Do this week |
| Add Fargate Spot for workers | Medium - test reliability | 50-70% on spot tasks | This month |
| Reduce image sizes | Medium - Dockerfile refactor | Indirect (faster scaling) | This month |
| Add scheduled scaling | Medium - identify patterns | 15-30% off-hours | This month |
| Consolidate services | High - architecture change | Variable | When appropriate |
| Review logging levels | Low - config change | $50-500/month | Do this week |
Automating Cost Optimization
The hardest part of cost optimization isn't the initial changes - it's keeping them optimized. Traffic patterns shift, services get new features that change their resource profile, and new services get deployed with copy-pasted configs that were wrong to begin with.
This is the problem stepscale AI is designed to solve. It continuously analyzes your workload metrics and identifies where your autoscaling configuration is costing you money - thresholds that are too aggressive, min/max values that don't match your actual traffic, and cooldowns that cause over-provisioning. The AI tunes these parameters automatically, so your cost optimization doesn't degrade over time.
For the free, open-source option, fast-autoscaler handles queue-based autoscaling for ECS with support for SQS, Kafka, RabbitMQ, and more. It won't auto-tune your config, but it gives you the right foundation for queue-driven cost optimization.