State Management

Fast Autoscaler uses a state management system to track scaling events and enforce cooldown periods between scaling actions.

Purpose

State management serves several important functions:

Cooldown Enforcement: Prevent oscillation by limiting how often scaling can occur
History Tracking: Maintain a record of scaling events for analysis
Continuity: Maintain state across Lambda function invocations
Distributed Operation: Support multiple instances or regions if needed

The default implementation uses S3 as a persistent storage mechanism:

State files are stored in S3 with a path structure of:

s3://<bucket>/autoscaling-state/<cluster-name>/<service-name>/<action-type>-last-action.json

Where:

The state data is stored as a JSON object:

{
  "timestamp": 1682341234.567,
  "cluster": "production-cluster",
  "service": "worker-service",
  "action_type": "up",
  "count": 5
}

The state management system is designed to be resilient:

Read errors result in conservative defaults (allow scale-up, prevent scale-down)
Write errors are logged but don't halt operation
JSON parsing issues are handled gracefully
Legacy timestamp formats are supported for backwards compatibility

The state management system can be extended with alternative storage backends by implementing the same interface provided by the S3 state module.