Scaling Logic

The core scaling logic in Fast Autoscaler determines when and how to adjust the number of ECS tasks based on queue metrics and configuration parameters.

Key Principles

Fast Autoscaler follows these key principles:

Responsive Scale-Up: Scale up quickly when queue depth increases
Gradual Scale-Down: Scale down more conservatively to avoid oscillation
Cooldown Periods: Respect cooldown periods to prevent rapid fluctuations
Priority Override: Bypass cooldown for critical scale-up operations
Safety Bounds: Always respect minimum and maximum task constraints

Scaling Decision Flow

The scaling process follows this flow:

Collect current queue metrics (visible and optionally in-flight messages)
Calculate the ratio of messages per task
Determine if scaling is needed based on thresholds
If scaling up, calculate additional tasks needed
If scaling down, ensure we don't scale down too rapidly
Apply min/max task constraints
Check cooldown periods (always allowing scale-up operations)
Execute the scaling action if approved

Scale-Up Logic

When queue depth exceeds the configured threshold:

if total_messages > scale_up_threshold:
    # Calculate additional tasks needed based on total messages
    additional_tasks_needed = int(total_messages * tasks_per_message)
    
    # Ensure we add at least one task if needed
    if total_messages > 0 and additional_tasks_needed == 0:
        additional_tasks_needed = 1
    
    # Add the additional tasks to current count
    new_task_count = current_task_count + additional_tasks_needed

Scale-Down Logic

When queue depth falls below the configured threshold:

elif total_messages < scale_down_threshold:
    # Calculate based on message count
    target_tasks = max(min_tasks, int(total_messages * tasks_per_message))
    
    # Don't scale down by more than the configurable factor at once
    min_tasks_after_scaling = int(current_task_count * max_scale_down_factor)
    final_task_count = max(target_tasks, min_tasks_after_scaling)

Cooldown Management

The autoscaler maintains separate cooldown periods for scaling up and scaling down. However, scale-up operations can bypass cooldown when queue depth increases significantly:

if action_type == 'up' and new_task_count > current_task_count:
    # Calculate task increase as a percentage
    increase_percentage = ((new_task_count - current_task_count) / current_task_count) * 100
    
    # Log the override with detailed information
    logging.info(f"PRIORITY OVERRIDE: Bypassing up scaling cooldown...")
    return True

This ensures rapid response to growing queue depths even when in a cooldown period.

Key Principles​

Scaling Decision Flow​

Scale-Up Logic​

Scale-Down Logic​

Cooldown Management​

Key Principles

Scaling Decision Flow

Scale-Up Logic

Scale-Down Logic

Cooldown Management