skills/adaptationio/skrillz/ecs-deployment

ecs-deployment

SKILL.md

ECS Deployment Strategies

Complete guide to deploying ECS services safely and efficiently, from rolling updates to blue-green deployments.

Quick Reference

Strategy Downtime Rollback Speed Complexity Best For
Rolling Update Zero Medium Low Most workloads
Blue-Green Zero Instant High Critical services
Canary Zero Fast High Risk mitigation

Rolling Updates (Default)

Configuration

resource "aws_ecs_service" "app" {
  deployment_configuration {
    maximum_percent         = 200  # Allow 2x during deployment
    minimum_healthy_percent = 100  # Keep 100% healthy
  }

  deployment_circuit_breaker {
    enable   = true   # Auto-detect failures
    rollback = true   # Auto-rollback on failure
  }
}

Behavior

  1. New task definition registered
  2. New tasks launched (up to maximum_percent)
  3. Health checks pass on new tasks
  4. Old tasks drained and stopped
  5. Continues until all tasks updated

Boto3 Deployment

import boto3

ecs = boto3.client('ecs')

def deploy_rolling_update(cluster: str, service: str,
                          new_image: str, container_name: str):
    """Deploy new image via rolling update"""

    # 1. Get current task definition
    svc = ecs.describe_services(cluster=cluster, services=[service])
    current_task_def = svc['services'][0]['taskDefinition']

    # 2. Create new task definition revision
    task_def = ecs.describe_task_definition(taskDefinition=current_task_def)
    new_task_def = task_def['taskDefinition'].copy()

    # Remove response-only fields
    for field in ['taskDefinitionArn', 'revision', 'status',
                  'requiresAttributes', 'compatibilities',
                  'registeredAt', 'registeredBy']:
        new_task_def.pop(field, None)

    # Update image
    for container in new_task_def['containerDefinitions']:
        if container['name'] == container_name:
            container['image'] = new_image

    response = ecs.register_task_definition(**new_task_def)
    new_task_def_arn = response['taskDefinition']['taskDefinitionArn']

    # 3. Update service
    ecs.update_service(
        cluster=cluster,
        service=service,
        taskDefinition=new_task_def_arn,
        forceNewDeployment=True
    )

    print(f"Deploying {new_task_def_arn}")
    return new_task_def_arn

# Usage
deploy_rolling_update(
    cluster='production',
    service='api',
    new_image='123456789.dkr.ecr.us-east-1.amazonaws.com/api:v2.0',
    container_name='api'
)

Monitor Deployment

def wait_for_deployment(cluster: str, service: str, timeout: int = 600):
    """Wait for deployment to complete"""
    import time

    start = time.time()
    while time.time() - start < timeout:
        response = ecs.describe_services(cluster=cluster, services=[service])
        svc = response['services'][0]

        for deployment in svc['deployments']:
            print(f"Deployment {deployment['id'][:8]}: "
                  f"{deployment['rolloutState']} "
                  f"({deployment['runningCount']}/{deployment['desiredCount']})")

            if deployment['status'] == 'PRIMARY':
                if deployment['rolloutState'] == 'COMPLETED':
                    print("Deployment successful!")
                    return True
                elif deployment['rolloutState'] == 'FAILED':
                    print(f"Deployment failed: {deployment.get('rolloutStateReason')}")
                    return False

        time.sleep(15)

    print("Deployment timed out")
    return False

Blue-Green Deployments

Architecture

                    ┌─────────────┐
                    │    ALB      │
                    └──────┬──────┘
           ┌───────────────┴───────────────┐
           │                               │
    ┌──────▼──────┐                 ┌──────▼──────┐
    │ Target Group│                 │ Target Group│
    │    (Blue)   │                 │   (Green)   │
    └──────┬──────┘                 └──────┬──────┘
           │                               │
    ┌──────▼──────┐                 ┌──────▼──────┐
    │ ECS Service │                 │ ECS Service │
    │   (Blue)    │                 │   (Green)   │
    └─────────────┘                 └─────────────┘

Terraform with CodeDeploy

# Two target groups
resource "aws_lb_target_group" "blue" {
  name        = "app-blue"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

resource "aws_lb_target_group" "green" {
  name        = "app-green"
  port        = 8080
  protocol    = "HTTP"
  vpc_id      = module.vpc.vpc_id
  target_type = "ip"

  health_check {
    path = "/health"
  }
}

# ALB with two listeners
resource "aws_lb_listener" "prod" {
  load_balancer_arn = aws_lb.app.arn
  port              = 443
  protocol          = "HTTPS"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn
  }

  lifecycle {
    ignore_changes = [default_action]  # Managed by CodeDeploy
  }
}

resource "aws_lb_listener" "test" {
  load_balancer_arn = aws_lb.app.arn
  port              = 8443
  protocol          = "HTTPS"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.green.arn
  }

  lifecycle {
    ignore_changes = [default_action]
  }
}

# ECS Service with CodeDeploy
resource "aws_ecs_service" "app" {
  name            = "app"
  cluster         = module.ecs.cluster_id
  task_definition = aws_ecs_task_definition.app.arn
  desired_count   = 3

  deployment_controller {
    type = "CODE_DEPLOY"
  }

  load_balancer {
    target_group_arn = aws_lb_target_group.blue.arn
    container_name   = "app"
    container_port   = 8080
  }

  lifecycle {
    ignore_changes = [task_definition, load_balancer]
  }
}

# CodeDeploy Application
resource "aws_codedeploy_app" "app" {
  compute_platform = "ECS"
  name             = "app-deploy"
}

# CodeDeploy Deployment Group
resource "aws_codedeploy_deployment_group" "app" {
  app_name               = aws_codedeploy_app.app.name
  deployment_group_name  = "app-dg"
  deployment_config_name = "CodeDeployDefault.ECSAllAtOnce"
  service_role_arn       = aws_iam_role.codedeploy.arn

  auto_rollback_configuration {
    enabled = true
    events  = ["DEPLOYMENT_FAILURE", "DEPLOYMENT_STOP_ON_REQUEST"]
  }

  blue_green_deployment_config {
    deployment_ready_option {
      action_on_timeout = "CONTINUE_DEPLOYMENT"
    }

    terminate_blue_instances_on_deployment_success {
      action                           = "TERMINATE"
      termination_wait_time_in_minutes = 5
    }
  }

  deployment_style {
    deployment_option = "WITH_TRAFFIC_CONTROL"
    deployment_type   = "BLUE_GREEN"
  }

  ecs_service {
    cluster_name = module.ecs.cluster_name
    service_name = aws_ecs_service.app.name
  }

  load_balancer_info {
    target_group_pair_info {
      prod_traffic_route {
        listener_arns = [aws_lb_listener.prod.arn]
      }

      test_traffic_route {
        listener_arns = [aws_lb_listener.test.arn]
      }

      target_group {
        name = aws_lb_target_group.blue.name
      }

      target_group {
        name = aws_lb_target_group.green.name
      }
    }
  }
}

Trigger Blue-Green Deployment

import boto3
import json

codedeploy = boto3.client('codedeploy')

def deploy_blue_green(app_name: str, deployment_group: str,
                      task_definition_arn: str, container_name: str,
                      container_port: int):
    """Trigger blue-green deployment via CodeDeploy"""

    app_spec = {
        "version": "0.0",
        "Resources": [{
            "TargetService": {
                "Type": "AWS::ECS::Service",
                "Properties": {
                    "TaskDefinition": task_definition_arn,
                    "LoadBalancerInfo": {
                        "ContainerName": container_name,
                        "ContainerPort": container_port
                    }
                }
            }
        }]
    }

    response = codedeploy.create_deployment(
        applicationName=app_name,
        deploymentGroupName=deployment_group,
        revision={
            'revisionType': 'AppSpecContent',
            'appSpecContent': {
                'content': json.dumps(app_spec)
            }
        }
    )

    deployment_id = response['deploymentId']
    print(f"Started deployment: {deployment_id}")
    return deployment_id

# Usage
deploy_blue_green(
    app_name='app-deploy',
    deployment_group='app-dg',
    task_definition_arn='arn:aws:ecs:us-east-1:123456789:task-definition/app:5',
    container_name='app',
    container_port=8080
)

Canary Releases

ALB Weighted Routing

resource "aws_lb_listener_rule" "canary" {
  listener_arn = aws_lb_listener.prod.arn
  priority     = 100

  action {
    type = "forward"
    forward {
      target_group {
        arn    = aws_lb_target_group.stable.arn
        weight = 90
      }
      target_group {
        arn    = aws_lb_target_group.canary.arn
        weight = 10
      }
    }
  }

  condition {
    path_pattern {
      values = ["/*"]
    }
  }
}

Gradual Traffic Shift

def shift_traffic(listener_rule_arn: str, canary_weight: int):
    """Shift traffic percentage to canary"""
    elb = boto3.client('elbv2')

    stable_weight = 100 - canary_weight

    elb.modify_rule(
        RuleArn=listener_rule_arn,
        Actions=[{
            'Type': 'forward',
            'ForwardConfig': {
                'TargetGroups': [
                    {
                        'TargetGroupArn': stable_tg_arn,
                        'Weight': stable_weight
                    },
                    {
                        'TargetGroupArn': canary_tg_arn,
                        'Weight': canary_weight
                    }
                ]
            }
        }]
    )

    print(f"Traffic: {stable_weight}% stable, {canary_weight}% canary")

# Progressive rollout
shift_traffic(rule_arn, 10)   # 10% to canary
# Monitor metrics...
shift_traffic(rule_arn, 25)   # 25% to canary
# Monitor metrics...
shift_traffic(rule_arn, 50)   # 50% to canary
# Monitor metrics...
shift_traffic(rule_arn, 100)  # 100% to canary (promote)

Deployment Circuit Breaker

How It Works

  1. ECS monitors deployment health
  2. Detects repeated task failures
  3. Automatically stops deployment
  4. Optional: Rolls back to previous version

Configuration

resource "aws_ecs_service" "app" {
  deployment_circuit_breaker {
    enable   = true
    rollback = true  # Auto-rollback on failure
  }
}

Failure Detection

Circuit breaker triggers when:

  • Tasks fail to reach RUNNING state
  • Health checks fail repeatedly
  • Tasks crash shortly after starting

GitOps Workflow

GitHub Actions Example

name: Deploy to ECS

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
          aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
          aws-region: us-east-1

      - name: Login to Amazon ECR
        id: login-ecr
        uses: aws-actions/amazon-ecr-login@v2

      - name: Build and push image
        env:
          ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/myapp:$IMAGE_TAG .
          docker push $ECR_REGISTRY/myapp:$IMAGE_TAG

      - name: Update task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: task-definition.json
          container-name: myapp
          image: ${{ steps.login-ecr.outputs.registry }}/myapp:${{ github.sha }}

      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: myapp-service
          cluster: production
          wait-for-service-stability: true

Rollback Strategies

Manual Rollback

def rollback_to_previous(cluster: str, service: str):
    """Rollback to previous task definition"""

    # Get current task definition
    svc = ecs.describe_services(cluster=cluster, services=[service])
    current_td = svc['services'][0]['taskDefinition']

    # Parse family and revision
    # arn:aws:ecs:region:account:task-definition/family:revision
    parts = current_td.split('/')[-1].split(':')
    family = parts[0]
    current_revision = int(parts[1])

    # Go back to previous revision
    previous_td = f"{family}:{current_revision - 1}"

    # Update service
    ecs.update_service(
        cluster=cluster,
        service=service,
        taskDefinition=previous_td
    )

    print(f"Rolling back to {previous_td}")

# Usage
rollback_to_previous('production', 'api')

Automatic Rollback (Circuit Breaker)

Enabled via deployment_circuit_breaker.rollback = true

Best Practices

  1. Always enable circuit breaker with rollback for production
  2. Use blue-green for critical services requiring instant rollback
  3. Implement health checks at container, task, and ALB levels
  4. Pin image digests instead of tags for reproducibility
  5. Use immutable image tags in ECR
  6. Monitor deployments with CloudWatch alarms
  7. Test rollback procedures regularly
  8. Keep previous task definitions for quick rollback

Progressive Disclosure

Quick Start (This File)

  • Rolling updates
  • Blue-green basics
  • Canary releases
  • Circuit breaker

Detailed References

Related Skills

  • boto3-ecs: SDK patterns
  • terraform-ecs: Infrastructure as Code
  • ecs-troubleshooting: Debugging deployments
Weekly Installs
1
Installed on
claude-code1