stepscale
New Why your HPA minReplicas is probably too high

Find the waste in your Kubernetes autoscaling

stepscale is a self-hosted operator. It reads your HPAs and KEDA ScaledObjects, analyzes the real metrics with a rule engine plus an LLM on your own key, and recommends better config: lower over-set floors, saner CPU targets, fixes for thrashing and scale-lag. Every recommendation carries a risk rating and plain-language reasoning. Recommend-only by default, and it runs entirely in your cluster.

Kubernetes HPA + KEDA
Runs in your cluster
Recommend-only by default
recommend

$ kubectl get scalerec checkout-api -o yaml

→ 14 days of history analyzed

→ idle overnight (00-06 UTC), p95 CPU 12%

→ floor never scales below min

configDiff risk: low
  min_replicas:     10  3
  target_cpu_pct:   80  60
  cooldown:         60s 300s

Risk

low

Mode

recommend

Runs

in-cluster

Illustrative recommendation. Real output depends on your workload.

Works with the stack you already run

Runs in your cluster

Metrics, analysis, and any applied change stay in your account. No cross-account access, no phone-home. The only outbound call is to your LLM provider, with your key.

Recommend-only by default

The operator writes recommendations you read with kubectl. It never mutates a workload until you approve one and run apply mode with a license.

Verify before you run

The image and chart are public and cosign-signed. Check the signature, then helm install. Closed source, no agent in your traffic path.

Capabilities

Find the waste, with reasoning you can check

Autoscaling configs drift. Floors set during an incident never come back down; targets get copied between services. stepscale reads the real metrics and tells you what to change, and why.

Lower over-set floors

Over-provisioned minReplicas idle 24/7. stepscale finds floors it can safely cut from load headroom, with a savings estimate.

Right-size CPU targets

Targets get copied between services and never revisited. It recommends a target from your actual utilization, not a guess.

Catch thrashing and scale-lag

Detects autoscalers that flap up and down, or that lag traffic spikes, and proposes stabilization and target fixes.

LLM reasoning, your key

A rule engine finds candidates; an LLM, on your own OpenAI or Anthropic key, judges each one, rates the risk, and explains it in plain language.

Kubernetes HPA + KEDA

Reads HorizontalPodAutoscalers and KEDA ScaledObjects. Recommendations are standard resources you review with kubectl.

Apply with a safety net

When you let it act, approved changes run behind a probation window with automatic rollback if workload health regresses.

Before / After

A static floor vs a tuned floor

Illustrative, same workload over a day. A static minReplicas holds 18 replicas all night and lags the morning peak. A tuned floor drops to the load-justified minimum overnight, pre-warms ahead of the spike, then tracks demand. Pre-warming is a scheduled-window feature; lowering the floor is a recommendation you can apply.

Static floor
Tuned (stepscale)
Replica count, 24-hour view (illustrative)
00:0004:0008:0008:3009:0012:0018:0022:00

Overnight floor

Cut to the load-justified minimum

Peak readiness

Pre-warmed ahead of the spike

Default mode

Recommend-only until you approve

Get started

See what it finds in your cluster

Install the operator with Helm, point it at your own LLM key, and read the first recommendations about 30 minutes after metrics are flowing. Recommend-only by default. No login, no license.

Want a walkthrough instead? Talk to us.