stepscale is a self-hosted operator. It reads your HPAs and KEDA ScaledObjects, analyzes the real metrics with a rule engine plus an LLM on your own key, and recommends better config: lower over-set floors, saner CPU targets, fixes for thrashing and scale-lag. Every recommendation carries a risk rating and plain-language reasoning. Recommend-only by default, and it runs entirely in your cluster.
$ kubectl get scalerec checkout-api -o yaml
→ 14 days of history analyzed
→ idle overnight (00-06 UTC), p95 CPU 12%
→ floor never scales below min
min_replicas: 10 → 3 target_cpu_pct: 80 → 60 cooldown: 60s→ 300s
Risk
low
Mode
recommend
Runs
in-cluster
Illustrative recommendation. Real output depends on your workload.
Works with the stack you already run
Runs in your cluster
Metrics, analysis, and any applied change stay in your account. No cross-account access, no phone-home. The only outbound call is to your LLM provider, with your key.
Recommend-only by default
The operator writes recommendations you read with kubectl. It never mutates a workload until you approve one and run apply mode with a license.
Verify before you run
The image and chart are public and cosign-signed. Check the signature, then helm install. Closed source, no agent in your traffic path.
Capabilities
Autoscaling configs drift. Floors set during an incident never come back down; targets get copied between services. stepscale reads the real metrics and tells you what to change, and why.
Over-provisioned minReplicas idle 24/7. stepscale finds floors it can safely cut from load headroom, with a savings estimate.
Targets get copied between services and never revisited. It recommends a target from your actual utilization, not a guess.
Detects autoscalers that flap up and down, or that lag traffic spikes, and proposes stabilization and target fixes.
A rule engine finds candidates; an LLM, on your own OpenAI or Anthropic key, judges each one, rates the risk, and explains it in plain language.
Reads HorizontalPodAutoscalers and KEDA ScaledObjects. Recommendations are standard resources you review with kubectl.
When you let it act, approved changes run behind a probation window with automatic rollback if workload health regresses.
Before / After
Illustrative, same workload over a day. A static minReplicas holds 18 replicas all night and lags the morning peak. A tuned floor drops to the load-justified minimum overnight, pre-warms ahead of the spike, then tracks demand. Pre-warming is a scheduled-window feature; lowering the floor is a recommendation you can apply.
Overnight floor
Cut to the load-justified minimum
Peak readiness
Pre-warmed ahead of the spike
Default mode
Recommend-only until you approve
Get started
Install the operator with Helm, point it at your own LLM key, and read the first recommendations about 30 minutes after metrics are flowing. Recommend-only by default. No login, no license.
Want a walkthrough instead? Talk to us.