Autoscaling Configuration

Overview

Implement autoscaling strategies to automatically adjust resource capacity based on demand, ensuring cost efficiency while maintaining performance and availability.

When to Use

Traffic-driven workload scaling
Time-based scheduled scaling
Resource utilization optimization
Cost reduction
High-traffic event handling
Batch processing optimization
Database connection pooling

Implementation Examples

1. Kubernetes Horizontal Pod Autoscaler

# hpa-configuration.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 15
        - type: Pods
          value: 2
          periodSeconds: 60
      selectPolicy: Min
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max

---
# Vertical Pod Autoscaler for resource optimization
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: myapp
        minAllowed:
          cpu: 50m
          memory: 64Mi
        maxAllowed:
          cpu: 1000m
          memory: 512Mi
        controlledResources:
          - cpu
          - memory

2. AWS Auto Scaling

# aws-autoscaling.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: autoscaling-config
  namespace: production
data:
  setup-asg.sh: |
    #!/bin/bash
    set -euo pipefail

    ASG_NAME="myapp-asg"
    MIN_SIZE=2
    MAX_SIZE=10
    DESIRED_CAPACITY=3
    TARGET_CPU=70
    TARGET_MEMORY=80

    echo "Creating Auto Scaling Group..."

    # Create launch template
    aws ec2 create-launch-template \
      --launch-template-name myapp-template \
      --version-description "Production version" \
      --launch-template-data '{
        "ImageId": "ami-0c55b159cbfafe1f0",
        "InstanceType": "t3.medium",
        "KeyName": "myapp-key",
        "SecurityGroupIds": ["sg-0123456789abcdef0"],
        "UserData": "#!/bin/bash\ncd /app && docker-compose up -d",
        "TagSpecifications": [{
          "ResourceType": "instance",
          "Tags": [{"Key": "Name", "Value": "myapp-instance"}]
        }]
      }' || true

    # Create Auto Scaling Group
    aws autoscaling create-auto-scaling-group \
      --auto-scaling-group-name "$ASG_NAME" \
      --launch-template LaunchTemplateName=myapp-template \
      --min-size $MIN_SIZE \
      --max-size $MAX_SIZE \
      --desired-capacity $DESIRED_CAPACITY \
      --availability-zones us-east-1a us-east-1b us-east-1c \
      --target-group-arns arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/myapp/abcdef123456 \
      --health-check-type ELB \
      --health-check-grace-period 300 \
      --tags "Key=Name,Value=myapp,PropagateAtLaunch=true"

    # Create CPU scaling policy
    aws autoscaling put-scaling-policy \
      --auto-scaling-group-name "$ASG_NAME" \
      --policy-name myapp-cpu-scaling \
      --policy-type TargetTrackingScaling \
      --target-tracking-configuration '{
        "TargetValue": '$TARGET_CPU',
        "PredefinedMetricSpecification": {
          "PredefinedMetricType": "ASGAverageCPUUtilization"
        },
        "ScaleOutCooldown": 60,
        "ScaleInCooldown": 300
      }'

    echo "Auto Scaling Group created: $ASG_NAME"

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: scheduled-autoscaling
  namespace: production
spec:
  # Scale up at 8 AM
  - schedule: "0 8 * * 1-5"
    jobTemplate:
      spec:
        template:
          spec:
            containers:
              - name: autoscale
                image: amazon/aws-cli:latest
                command:
                  - sh
                  - -c
                  - |
                    aws autoscaling set-desired-capacity \
                      --auto-scaling-group-name myapp-asg \
                      --desired-capacity 10
            restartPolicy: OnFailure

  # Scale down at 6 PM
  - schedule: "0 18 * * 1-5"
    jobTemplate:
      spec:
        template:
          spec:
            containers:
              - name: autoscale
                image: amazon/aws-cli:latest
                command:
                  - sh
                  - -c
                  - |
                    aws autoscaling set-desired-capacity \
                      --auto-scaling-group-name myapp-asg \
                      --desired-capacity 3
            restartPolicy: OnFailure

3. Custom Metrics Autoscaling

# custom-metrics-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-metrics-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 1
  maxReplicas: 50
  metrics:
    # Queue depth from custom metrics
    - type: Pods
      pods:
        metric:
          name: job_queue_depth
        target:
          type: AverageValue
          averageValue: "100"

    # Request rate from custom metrics
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "1000"

    # Custom business metric
    - type: Pods
      pods:
        metric:
          name: active_connections
        target:
          type: AverageValue
          averageValue: "500"

---
# Prometheus ServiceMonitor for custom metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: myapp-metrics
  namespace: production
spec:
  selector:
    matchLabels:
      app: myapp
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

4. Autoscaling Script

#!/bin/bash
# autoscaling-setup.sh - Complete autoscaling configuration

set -euo pipefail

ENVIRONMENT="${1:-production}"
DEPLOYMENT="${2:-myapp}"

echo "Setting up autoscaling for $DEPLOYMENT in $ENVIRONMENT"

# Create HPA
cat <<EOF | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ${DEPLOYMENT}-hpa
  namespace: ${ENVIRONMENT}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ${DEPLOYMENT}
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 50
          periodSeconds: 15
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
EOF

echo "HPA created successfully"

# Monitor autoscaling
echo "Monitoring autoscaling events..."
kubectl get hpa ${DEPLOYMENT}-hpa -n $ENVIRONMENT -w

5. Monitoring Autoscaling

# autoscaling-monitoring.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: autoscaling-alerts
  namespace: monitoring
data:
  alerts.yaml: |
    groups:
      - name: autoscaling
        rules:
          - alert: HpaMaxedOut
            expr: |
              kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
              and
              kube_hpa_status_desired_replicas == kube_hpa_spec_max_replicas
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "HPA {{ $labels.hpa }} is at maximum replicas"

          - alert: HpaMinedOut
            expr: |
              kube_hpa_status_current_replicas == kube_hpa_status_desired_replicas
              and
              kube_hpa_status_desired_replicas == kube_hpa_spec_min_replicas
            for: 30m
            labels:
              severity: info
            annotations:
              summary: "HPA {{ $labels.hpa }} is at minimum replicas"

          - alert: AsgCapacityLow
            expr: |
              aws_autoscaling_group_desired_capacity / aws_autoscaling_group_max_size < 0.2
            for: 10m
            labels:
              severity: warning
            annotations:
              summary: "ASG {{ $labels.auto_scaling_group_name }} has low capacity"

Best Practices

✅ DO

Set appropriate min/max replicas
Monitor metric aggregation window
Implement cooldown periods
Use multiple metrics
Test scaling behavior
Monitor scaling events
Plan for peak loads
Implement fallback strategies

❌ DON'T

Set min replicas to 1
Scale too aggressively
Ignore cooldown periods
Use single metric only
Forget to test scaling
Scale below resource needs
Neglect monitoring
Deploy without capacity tests

Scaling Metrics

CPU Utilization: Most common metric
Memory Utilization: Heap-bound applications
Request Rate: API-driven scaling
Queue Depth: Async job processing
Custom Metrics: Business-specific indicators

autoscaling-configuration

Autoscaling Configuration

Overview

When to Use

Implementation Examples

1. Kubernetes Horizontal Pod Autoscaler

2. AWS Auto Scaling

3. Custom Metrics Autoscaling

4. Autoscaling Script

5. Monitoring Autoscaling

Best Practices

✅ DO

❌ DON'T

Scaling Metrics

Resources