skills/adaptationio/skrillz/karpenter-autoscaling

karpenter-autoscaling

SKILL.md

Karpenter Autoscaling for Amazon EKS

Intelligent, high-performance node autoscaling for Amazon EKS that provisions nodes in seconds, automatically selects optimal instance types, and reduces costs by 20-70% through Spot integration and consolidation.

Overview

Karpenter is the recommended autoscaler for production EKS workloads (2025), replacing Cluster Autoscaler with:

  • Speed: Provisions nodes in seconds (vs minutes with Cluster Autoscaler)
  • Intelligence: Automatically selects optimal instance types based on pod requirements
  • Flexibility: No need to configure node groups - direct EC2 instance provisioning
  • Cost Optimization: 20-70% cost reduction through better bin-packing and Spot integration
  • Consolidation: Automatic node consolidation when underutilized or empty

Real-World Results:

  • 20% overall AWS bill reduction
  • Up to 90% savings for CI/CD workloads
  • 70% reduction in monthly compute costs
  • 15-30% waste reduction with faster scale-up

When to Use

  • Replacing Cluster Autoscaler with faster, smarter autoscaling
  • Optimizing EKS cluster costs (target: 20%+ savings)
  • Implementing Spot instance strategies (30-70% Spot mix)
  • Need sub-minute node provisioning (seconds vs minutes)
  • Workloads with variable resource requirements
  • Multi-instance-type flexibility without node group management
  • GPU or specialized instance provisioning
  • Consolidating underutilized nodes automatically

Prerequisites

  • EKS cluster running Kubernetes 1.23+
  • Terraform or Helm for installation
  • IRSA or EKS Pod Identity enabled
  • Small node group for Karpenter controller (2-3 nodes)
  • VPC subnets and security groups tagged for Karpenter discovery

Quick Start

1. Install Karpenter (Helm)

# Add Karpenter Helm repo
helm repo add karpenter https://charts.karpenter.sh
helm repo update

# Install Karpenter v1.0+
helm upgrade --install karpenter karpenter/karpenter \
  --namespace kube-system \
  --set settings.clusterName=my-cluster \
  --set settings.interruptionQueue=my-cluster \
  --set controller.resources.requests.cpu=1 \
  --set controller.resources.requests.memory=1Gi \
  --set controller.resources.limits.cpu=1 \
  --set controller.resources.limits.memory=1Gi \
  --wait

See: references/installation.md for complete setup including IRSA/Pod Identity

2. Create NodePool and EC2NodeClass

NodePool (defines scheduling requirements and limits):

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]  # Compute, general, memory-optimized
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]  # Gen 5+
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "1000"
    memory: "1000Gi"
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s
    budgets:
      - nodes: "10%"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: default
spec:
  amiFamily: AL2023  # Amazon Linux 2023
  role: KarpenterNodeRole-my-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 100Gi
        volumeType: gp3
        encrypted: true
        deleteOnTermination: true
kubectl apply -f nodepool.yaml

See: references/nodepools.md for advanced NodePool patterns

3. Deploy Workload and Watch Autoscaling

# Deploy test workload
kubectl create deployment inflate --image=public.ecr.aws/eks-distro/kubernetes/pause:3.7 \
  --replicas=0

# Scale up to trigger node provisioning
kubectl scale deployment inflate --replicas=10

# Watch Karpenter provision nodes (seconds!)
kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

# Verify nodes
kubectl get nodes -l karpenter.sh/nodepool=default

# Scale down to trigger consolidation
kubectl scale deployment inflate --replicas=0

# Watch Karpenter consolidate (30s after scale-down)
kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter -c controller

4. Monitor and Optimize

# Check NodePool status
kubectl get nodepools

# View disruption metrics
kubectl describe nodepool default

# Monitor provisioning decisions
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i "launched\|terminated"

# Cost optimization metrics
kubectl top nodes

See: references/optimization.md for cost optimization strategies


Core Concepts

Karpenter v1.0 Architecture

Key Resources (v1.0+):

  1. NodePool: Defines node scheduling requirements, limits, and disruption policies
  2. EC2NodeClass: AWS-specific configuration (AMIs, instance types, subnets, security groups)
  3. NodeClaim: Karpenter's representation of a node request (auto-created)

How It Works:

  1. Pod becomes unschedulable
  2. Karpenter evaluates pod requirements (CPU, memory, affinity, taints/tolerations)
  3. Karpenter selects optimal instance type from 600+ options
  4. Karpenter provisions EC2 instance directly (no node groups)
  5. Node joins cluster in 30-60 seconds
  6. Pod scheduled to new node

Consolidation:

  • Continuously monitors node utilization
  • Consolidates underutilized nodes (bin-packing)
  • Drains and deletes empty nodes
  • Replaces nodes with cheaper alternatives
  • Respects Pod Disruption Budgets

NodePool vs Cluster Autoscaler Node Groups

Feature Karpenter NodePool Cluster Autoscaler
Provisioning Speed 30-60 seconds 2-5 minutes
Instance Selection Automatic (600+ types) Manual (pre-defined)
Bin-Packing Intelligent Limited
Spot Integration Built-in, intelligent Requires node groups
Consolidation Automatic Manual
Configuration Single NodePool Multiple node groups
Cost Savings 20-70% 10-20%

Common Workflows

Workflow 1: Install Karpenter with Terraform

Use case: Production-grade installation with infrastructure as code

# Karpenter module
module "karpenter" {
  source = "terraform-aws-modules/eks/aws//modules/karpenter"
  version = "~> 20.0"

  cluster_name = module.eks.cluster_name
  irsa_oidc_provider_arn = module.eks.oidc_provider_arn

  # Enable Pod Identity (2025 recommended)
  enable_pod_identity = true

  # Additional IAM policies
  node_iam_role_additional_policies = {
    AmazonSSMManagedInstanceCore = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
  }

  tags = {
    Environment = "production"
  }
}

# Helm release
resource "helm_release" "karpenter" {
  namespace        = "kube-system"
  name             = "karpenter"
  repository       = "oci://public.ecr.aws/karpenter"
  chart            = "karpenter"
  version          = "1.0.0"

  set {
    name  = "settings.clusterName"
    value = module.eks.cluster_name
  }

  set {
    name  = "settings.interruptionQueue"
    value = module.karpenter.queue_name
  }

  set {
    name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
    value = module.karpenter.iam_role_arn
  }
}

Steps:

  1. Review references/installation.md
  2. Configure Terraform module with cluster details
  3. Apply infrastructure: terraform apply
  4. Verify installation: kubectl get pods -n kube-system -l app.kubernetes.io/name=karpenter
  5. Tag subnets and security groups for discovery
  6. Deploy NodePool and EC2NodeClass

See: references/installation.md for complete Terraform setup


Workflow 2: Configure Spot/On-Demand Mix (30/70)

Use case: Optimize costs while maintaining availability (recommended: 30% On-Demand, 70% Spot)

Critical NodePool (On-Demand only):

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: critical
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["m", "c"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      taints:
        - key: "critical"
          value: "true"
          effect: "NoSchedule"
  limits:
    cpu: "200"
  weight: 100  # Higher priority

Flexible NodePool (Spot preferred):

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: flexible
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "800"
  disruption:
    consolidationPolicy: WhenUnderutilized
    budgets:
      - nodes: "20%"
  weight: 10  # Lower priority (use after critical)

Pod tolerations for critical workloads:

spec:
  tolerations:
    - key: "critical"
      operator: "Equal"
      value: "true"
      effect: "NoSchedule"
  nodeSelector:
    karpenter.sh/capacity-type: on-demand

Steps:

  1. Create critical NodePool for databases, stateful apps (On-Demand)
  2. Create flexible NodePool for stateless apps (Spot preferred)
  3. Use taints/tolerations to separate critical workloads
  4. Monitor Spot interruptions: kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i interrupt

See: references/nodepools.md for Spot strategies


Workflow 3: Enable Consolidation for Cost Savings

Use case: Reduce costs by automatically consolidating underutilized nodes

Aggressive consolidation (development/staging):

spec:
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 30s  # Consolidate quickly
    budgets:
      - nodes: "50%"  # Allow disrupting 50% of nodes

Conservative consolidation (production):

spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 5m  # Wait 5 minutes before consolidating
    budgets:
      - nodes: "10%"  # Limit disruption to 10% of nodes at a time
      - schedule: "0 9-17 * * MON-FRI"  # Only during business hours
        nodes: "20%"
      - schedule: "0 0-8,18-23 * * *"  # Off-hours
        nodes: "5%"

Pod Disruption Budget (protect critical pods):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: critical-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: critical-app

Steps:

  1. Review references/optimization.md
  2. Set consolidation policy (WhenEmpty, WhenUnderutilized, WhenEmptyOrUnderutilized)
  3. Configure consolidateAfter delay (30s-5m)
  4. Set disruption budgets (% of nodes)
  5. Create PodDisruptionBudgets for critical apps
  6. Monitor consolidation: kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep consolidat

Expected savings: 15-30% additional reduction beyond Spot savings

See: references/optimization.md for consolidation best practices


Workflow 4: Migrate from Cluster Autoscaler

Use case: Upgrade from Cluster Autoscaler to Karpenter for better performance and cost savings

Migration strategy (zero-downtime):

  1. Install Karpenter (runs alongside Cluster Autoscaler)

    helm install karpenter karpenter/karpenter --namespace kube-system
    
  2. Create NodePool with distinct labels

    spec:
      template:
        metadata:
          labels:
            provisioner: karpenter
    
  3. Migrate workloads gradually

    # Add node selector to new deployments
    spec:
      nodeSelector:
        provisioner: karpenter
    
  4. Monitor both autoscalers

    # Watch Karpenter
    kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter
    
    # Watch Cluster Autoscaler
    kubectl logs -f -n kube-system -l app=cluster-autoscaler
    
  5. Gradually scale down CA node groups

    # Reduce desired size of CA node groups
    aws eks update-nodegroup-config \
      --cluster-name my-cluster \
      --nodegroup-name ca-nodes \
      --scaling-config desiredSize=1,minSize=0,maxSize=3
    
  6. Remove Cluster Autoscaler tags

    # Remove tags from node groups
    # k8s.io/cluster-autoscaler/enabled
    # k8s.io/cluster-autoscaler/<cluster-name>
    
  7. Uninstall Cluster Autoscaler

    helm uninstall cluster-autoscaler -n kube-system
    

Testing checklist:

  • Karpenter provisions nodes successfully
  • Pods schedule on Karpenter nodes
  • Consolidation works as expected
  • Spot interruptions handled gracefully
  • No unschedulable pods
  • Cost metrics show improvement

Rollback plan: Keep CA node groups at min size until confident in Karpenter


Workflow 5: GPU Node Provisioning

Use case: Automatically provision GPU instances for ML workloads

GPU NodePool:

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: gpu
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["on-demand"]  # GPU typically on-demand
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values: ["g4dn", "g5", "p3", "p4d"]
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: Gt
          values: ["0"]
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: gpu
      taints:
        - key: "nvidia.com/gpu"
          value: "true"
          effect: "NoSchedule"
  limits:
    cpu: "1000"
    nvidia.com/gpu: "8"
---
apiVersion: karpenter.k8s.aws/v1
kind: EC2NodeClass
metadata:
  name: gpu
spec:
  amiFamily: AL2  # AL2 with GPU drivers
  amiSelectorTerms:
    - alias: al2@latest  # Latest GPU-enabled AMI
  role: KarpenterNodeRole-my-cluster
  subnetSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: my-cluster
  userData: |
    #!/bin/bash
    # Install NVIDIA device plugin
    /etc/eks/bootstrap.sh my-cluster

GPU workload:

apiVersion: v1
kind: Pod
metadata:
  name: gpu-pod
spec:
  tolerations:
    - key: "nvidia.com/gpu"
      operator: "Exists"
      effect: "NoSchedule"
  containers:
  - name: cuda-container
    image: nvidia/cuda:11.8.0-base-ubuntu22.04
    command: ["nvidia-smi"]
    resources:
      limits:
        nvidia.com/gpu: 1

See: references/nodepools.md for GPU configuration details


Key Configuration

NodePool Resource Limits

Prevent runaway scaling:

spec:
  limits:
    cpu: "1000"       # Max 1000 CPUs across all nodes in pool
    memory: "1000Gi"  # Max 1000Gi memory
    nvidia.com/gpu: "8"  # Max 8 GPUs

Disruption Controls

Balance cost savings with stability:

spec:
  disruption:
    # When to consolidate
    consolidationPolicy: WhenUnderutilized | WhenEmpty | WhenEmptyOrUnderutilized

    # Delay before consolidating (prevent flapping)
    consolidateAfter: 30s  # Default: 30s

    # Node expiration (security patching)
    expireAfter: 720h  # 30 days

    # Disruption budgets (rate limiting)
    budgets:
      - nodes: "10%"  # Max 10% of nodes disrupted at once
        reasons:
          - Underutilized
          - Empty
      - schedule: "0 0-8 * * *"  # Off-hours: more aggressive
        nodes: "50%"

Instance Type Flexibility

Maximize Spot availability and cost savings:

spec:
  template:
    spec:
      requirements:
        # Architecture
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64", "arm64"]  # Include ARM for savings

        # Instance categories (c=compute, m=general, r=memory)
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values: ["c", "m", "r"]

        # Instance generation (5+ for best performance/cost)
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values: ["4"]

        # Instance size (exclude large sizes if not needed)
        - key: karpenter.k8s.aws/instance-size
          operator: NotIn
          values: ["metal", "32xlarge", "24xlarge"]

        # Capacity type
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]

Result: Karpenter selects from 600+ instance types, maximizing Spot availability


Monitoring and Troubleshooting

Key Metrics

# NodePool status
kubectl get nodepools

# NodeClaim status (pending provisions)
kubectl get nodeclaims

# Node events
kubectl get events --field-selector involvedObject.kind=Node

# Karpenter controller logs
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -c controller --tail=100

# Filter for provisioning decisions
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "launched instance"

# Filter for consolidation events
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "consolidating"

# Spot interruption warnings
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep "interrupt"

Common Issues

1. Nodes not provisioning:

# Check NodePool status
kubectl describe nodepool default

# Check for unschedulable pods
kubectl get pods -A --field-selector=status.phase=Pending

# Review Karpenter logs for errors
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter | grep -i error

Common causes:

  • Insufficient IAM permissions
  • Subnet/security group tags missing
  • Resource limits exceeded
  • No instance types match requirements

2. Excessive consolidation (pod restarts):

# Increase consolidateAfter delay
spec:
  disruption:
    consolidateAfter: 5m  # Increase from 30s

3. Spot interruptions causing issues:

# Reduce Spot ratio
- key: karpenter.sh/capacity-type
  operator: In
  values: ["on-demand"]  # Use more on-demand

Best Practices

Cost Optimization

  • ✅ Use 30% On-Demand, 70% Spot for optimal cost/stability balance
  • ✅ Enable consolidation (WhenUnderutilized)
  • ✅ Include ARM instances (Graviton) for 20% additional savings
  • ✅ Set instance generation > 4 for best price/performance
  • ✅ Use multiple instance families (c, m, r) for Spot diversity

Reliability

  • ✅ Set Pod Disruption Budgets for critical applications
  • ✅ Use multiple availability zones
  • ✅ Configure disruption budgets (10-20% for production)
  • ✅ Test Spot interruption handling
  • ✅ Use On-Demand for stateful workloads (databases)

Security

  • ✅ Use IRSA or Pod Identity (not node IAM roles)
  • ✅ Enable EBS encryption in EC2NodeClass
  • ✅ Set expireAfter for regular node rotation (720h/30 days)
  • ✅ Use Amazon Linux 2023 (AL2023) AMIs
  • ✅ Tag resources for cost allocation

Performance

  • ✅ Use dedicated NodePool for Karpenter controller (On-Demand, no consolidation)
  • ✅ Set appropriate resource limits to prevent runaway scaling
  • ✅ Monitor provisioning latency (should be <60s)
  • ✅ Use topology spread constraints for pod distribution

Reference Documentation

Detailed Guides (load on-demand):

Official Resources:

Community Examples:


Quick Reference

Installation

helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter \
  --version 1.0.0 \
  --namespace kube-system \
  --set settings.clusterName=my-cluster

Basic NodePool

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
      nodeClassRef:
        kind: EC2NodeClass
        name: default
  limits:
    cpu: "1000"
  disruption:
    consolidationPolicy: WhenUnderutilized

Monitor

kubectl logs -f -n kube-system -l app.kubernetes.io/name=karpenter

Cost Savings Formula

  • Spot instances: 70-80% savings vs On-Demand
  • Consolidation: Additional 15-30% reduction
  • Better bin-packing: 10-20% waste reduction
  • Total: 20-70% overall cost reduction

Next Steps: Install Karpenter using references/installation.md, then configure NodePools with references/nodepools.md

Weekly Installs
1
Installed on
claude-code1