Founding Platform Engineer

About the role

You will own the core stepscale AI runtime that ingests workload telemetry, runs the tuning models, and writes optimized configs back to customer autoscalers across AWS ECS and Kubernetes. This is a foundational position. You will shape what production infrastructure looks like at stepscale for the next several years.

What you will work on

The metrics ingestion pipeline that handles queue-depth, task-count, and request-rate streams from customer environments
The applier service that pushes tuned configurations through native APIs (ECS UpdateService, HPA / KEDA CRDs) safely, with rollback
Multi-tenancy isolation and per-customer state stores
Observability for our own scaling decisions - every change should be explainable

What we are looking for

5+ years building production cloud infrastructure, ideally including ECS or Kubernetes at scale
Comfortable owning a system end-to-end from API surface to operations
Strong written communication - we are async-first
A bias toward shipping over polishing the architecture diagram
Bonus: prior work on autoscaling, scheduler internals, or time-series systems

How we work

Remote, async by default, no status meetings. We hire ICs only at this stage. You ship to production with confidence and review pairs you up with whoever has the most context, not a fixed gatekeeper.

Founding Platform Engineer

About the role

What you will work on

What we are looking for

How we work

Apply