Founding Platform Engineer
Remote (EU / Israel timezones) · Full-time · Competitive + meaningful equity
About the role
You will own the core stepscale AI runtime that ingests workload telemetry, runs the tuning models, and writes optimized configs back to customer autoscalers across AWS ECS and Kubernetes. This is a foundational position. You will shape what production infrastructure looks like at stepscale for the next several years.
What you will work on
- The metrics ingestion pipeline that handles queue-depth, task-count, and request-rate streams from customer environments
- The applier service that pushes tuned configurations through native APIs (ECS UpdateService, HPA / KEDA CRDs) safely, with rollback
- Multi-tenancy isolation and per-customer state stores
- Observability for our own scaling decisions - every change should be explainable
What we are looking for
- 5+ years building production cloud infrastructure, ideally including ECS or Kubernetes at scale
- Comfortable owning a system end-to-end from API surface to operations
- Strong written communication - we are async-first
- A bias toward shipping over polishing the architecture diagram
- Bonus: prior work on autoscaling, scheduler internals, or time-series systems
How we work
Remote, async by default, no status meetings. We hire ICs only at this stage. You ship to production with confidence and review pairs you up with whoever has the most context, not a fixed gatekeeper.