State Management
Fast Autoscaler uses a state management system to track scaling events and enforce cooldown periods between scaling actions.
Purpose
State management serves several important functions:
- Cooldown Enforcement: Prevent oscillation by limiting how often scaling can occur
- History Tracking: Maintain a record of scaling events for analysis
- Continuity: Maintain state across Lambda function invocations
- Distributed Operation: Support multiple instances or regions if needed
Implementation
The default implementation uses S3 as a persistent storage mechanism:
- Each scaling action (up/down) is recorded in a separate S3 object
- The state includes timestamp, action type, and counter
- State is namespaced by cluster and service name
- The S3 implementation includes error handling and fallbacks
File Structure
State files are stored in S3 with a path structure of:
s3://<bucket>/autoscaling-state/<cluster-name>/<service-name>/<action-type>-last-action.json
Where:
<bucket>
is the configured S3 bucket<cluster-name>
is the ECS cluster name<service-name>
is the ECS service name<action-type>
is either "up" or "down"
State Data Format
The state data is stored as a JSON object:
{
"timestamp": 1682341234.567,
"cluster": "production-cluster",
"service": "worker-service",
"action_type": "up",
"count": 5
}
Error Handling
The state management system is designed to be resilient:
- Read errors result in conservative defaults (allow scale-up, prevent scale-down)
- Write errors are logged but don't halt operation
- JSON parsing issues are handled gracefully
- Legacy timestamp formats are supported for backwards compatibility
Extending State Management
The state management system can be extended with alternative storage backends by implementing the same interface provided by the S3 state module.