skills/ancoleman/ai-design-components/load-balancing-patterns

load-balancing-patterns

SKILL.md

Load Balancing Patterns

Distribute traffic across infrastructure using the appropriate load balancing approach, from simple round-robin to global multi-region failover.

When to Use This Skill

Use load-balancing-patterns when:

  • Distributing traffic across multiple application servers
  • Implementing high availability and failover
  • Routing traffic based on URLs, headers, or geographic location
  • Managing session persistence across stateless backends
  • Deploying applications to Kubernetes clusters
  • Configuring global traffic management across regions
  • Implementing zero-downtime deployments (blue-green, canary)
  • Selecting between cloud-managed and self-managed load balancers

Core Load Balancing Concepts

Layer 4 vs Layer 7

Layer 4 (L4) - Transport Layer:

  • Routes based on IP address and port (TCP/UDP packets)
  • No application data inspection, lower latency, higher throughput
  • Protocol agnostic, preserves client IP addresses
  • Use for: Database connections, video streaming, gaming, financial transactions, non-HTTP protocols

Layer 7 (L7) - Application Layer:

  • Routes based on HTTP URLs, headers, cookies, request body
  • Full application data visibility, SSL/TLS termination, caching, WAF integration
  • Content-based routing capabilities
  • Use for: Web applications, REST APIs, microservices, GraphQL endpoints, complex routing logic

For detailed comparison including performance benchmarks and hybrid approaches, see references/l4-vs-l7-comparison.md.

Load Balancing Algorithms

Algorithm Distribution Method Use Case
Round Robin Sequential Stateless, similar servers
Weighted Round Robin Capacity-based Different server specs
Least Connections Fewest active connections Long-lived connections
Least Response Time Fastest server Performance-sensitive
IP Hash Client IP-based Session persistence
Resource-Based CPU/memory metrics Varying workloads

Health Check Types

Shallow (Liveness): Is the process alive?

  • Endpoint: /health/live or /live
  • Returns: 200 if process running
  • Use for: Process monitoring, container health

Deep (Readiness): Can the service handle requests?

  • Endpoint: /health/ready or /ready
  • Validates: Database, cache, external API connectivity
  • Use for: Load balancer routing decisions

Health Check Hysteresis: Different thresholds for marking up vs down to prevent flapping

  • Example: 3 failures to mark down, 2 successes to mark up

For complete health check implementation patterns, see references/health-check-strategies.md.

Cloud Load Balancers

AWS Load Balancing

Application Load Balancer (ALB) - Layer 7:

  • Use for: HTTP/HTTPS applications, microservices, WebSocket
  • Features: Path/host/header routing, AWS WAF integration, Lambda targets
  • Choose when: Content-based routing needed

Network Load Balancer (NLB) - Layer 4:

  • Use for: Ultra-low latency (<1ms), TCP/UDP, static IPs, millions RPS
  • Features: Preserves source IP, TLS termination
  • Choose when: Non-HTTP protocols, performance critical

Global Accelerator - Layer 4 Global:

  • Use for: Multi-region applications, global users, DDoS protection
  • Features: Anycast IPs, automatic regional failover

GCP Load Balancing

Application LB (L7): Global HTTPS LB, Cloud CDN integration, Cloud Armor (WAF/DDoS) Network LB (L4): Regional TCP/UDP, pass-through balancing, session affinity Cloud Load Balancing: Single anycast IP, global distribution, backend buckets

Azure Load Balancing

Application Gateway (L7): WAF integration, URL-based routing, SSL termination, autoscaling Load Balancer (L4): Basic and Standard SKUs, health probes, HA ports Traffic Manager (Global): DNS-based routing (priority, weighted, performance, geographic)

For complete cloud provider configurations and Terraform examples, see references/cloud-load-balancers.md.

Self-Managed Load Balancers

NGINX

Best for: General-purpose HTTP/HTTPS load balancing, web application stacks

Capabilities:

  • HTTP reverse proxy with multiple algorithms
  • TCP/UDP stream load balancing
  • SSL/TLS termination
  • Passive health checks (open source), active health checks (NGINX Plus)
  • Cookie-based sticky sessions (NGINX Plus)

Basic configuration:

upstream backend {
    least_conn;
    server backend1.example.com:8080 weight=3;
    server backend2.example.com:8080 weight=2;
    keepalive 32;
}

server {
    listen 80;
    location / {
        proxy_pass http://backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
    }
}

For complete NGINX patterns and advanced configurations, see references/nginx-patterns.md.

HAProxy

Best for: Maximum performance, database load balancing, resource efficiency

Capabilities:

  • Highest raw throughput, lowest memory footprint
  • 10+ load balancing algorithms
  • Sophisticated health checks (HTTP, TCP, Redis, MySQL, etc.)
  • Cookie or IP-based persistence

Basic configuration:

frontend http_front
    bind *:80
    default_backend web_servers

backend web_servers
    balance roundrobin
    option httpchk GET /health
    server web1 192.168.1.101:8080 check
    server web2 192.168.1.102:8080 check

For complete HAProxy patterns, see references/haproxy-patterns.md.

Envoy

Best for: Microservices, Kubernetes, service mesh integration

Capabilities:

  • Cloud-native design with dynamic configuration (xDS APIs)
  • Circuit breakers, retries, timeouts
  • Advanced health checks (TCP, HTTP, gRPC)
  • Excellent observability

For complete Envoy patterns, see references/envoy-patterns.md.

Traefik

Best for: Docker/Kubernetes environments, dynamic configuration, ease of use

Capabilities:

  • Automatic service discovery
  • Native Kubernetes integration
  • Built-in Let's Encrypt support
  • Middleware system (auth, rate limiting)

For complete Traefik patterns, see references/traefik-patterns.md.

Kubernetes Ingress Controllers

Selection Guide

Controller Best For Strengths
NGINX Ingress (F5) General purpose Stability, wide adoption, mature features
Traefik Dynamic environments Easy configuration, service discovery
HAProxy Ingress High performance Advanced L7 routing, reliability
Envoy (Contour/Gateway) Service mesh Rich L7 features, extensibility
Kong API-heavy apps JWT auth, rate limiting, plugins
Cloud Provider Single-cloud Native cloud integration

Basic Ingress Example

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/affinity: "cookie"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - app.example.com
    secretName: app-tls
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /api
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: web-service
            port:
              number: 80

For complete Kubernetes ingress examples and Gateway API patterns, see references/kubernetes-ingress.md.

Session Persistence

Sticky Sessions (Use Sparingly)

Cookie-Based: Load balancer sets cookie to track server affinity

  • Accurate routing, works with NAT/proxies
  • HTTP only, adds cookie overhead

IP Hash: Hash client IP to select backend server

  • No cookie required, works for non-HTTP
  • Poor distribution with NAT/proxies

Drawbacks: Uneven load distribution, session lost on server failure, complicates scaling

Shared Session Store (Recommended)

Architecture: Stateless application servers + centralized session storage (Redis, Memcached)

Benefits:

  • No sticky sessions needed
  • True load balancing
  • Server failures don't lose sessions
  • Horizontal scaling trivial

Client-Side Tokens (Best for APIs)

JWT (JSON Web Tokens): Server generates signed token, client stores and sends with requests

Benefits:

  • Fully stateless servers
  • Perfect load balancing
  • No session storage needed

For complete session management patterns and code examples, see references/session-persistence.md.

Global Load Balancing

GeoDNS Routing

Route users to nearest server based on geographic location:

  • DNS returns different IPs based on client location
  • Reduces latency, supports compliance and regional content
  • Implementation: AWS Route 53, GCP Cloud DNS, Azure Traffic Manager

Multi-Region Failover

Primary/secondary region configuration:

  • Health checks determine primary region health
  • Automatic DNS failover to secondary
  • Transparent to clients

CDN Integration

Combine load balancing with CDN:

  • GeoDNS routes to closest CDN PoP
  • CDN caches content globally
  • Origin load balancing for cache misses

For complete global load balancing examples with Terraform, see references/global-load-balancing.md.

Decision Frameworks

L4 vs L7 Selection

Choose L4 when:

  • Protocol is TCP/UDP (not HTTP)
  • Ultra-low latency critical (<1ms)
  • High throughput required (millions RPS)
  • Client source IP preservation needed

Choose L7 when:

  • Protocol is HTTP/HTTPS
  • Content-based routing needed (URL, headers)
  • SSL termination required
  • WAF integration needed
  • Microservices architecture

Cloud vs Self-Managed

Choose Cloud-Managed when:

  • Single cloud deployment
  • Auto-scaling required
  • Team lacks load balancer expertise
  • Managed service preferred

Choose Self-Managed when:

  • Multi-cloud or hybrid deployment
  • Advanced routing requirements
  • Cost optimization important
  • Full control needed
  • Vendor lock-in avoidance

Self-Managed Selection

  • NGINX: General-purpose, web stacks, HTTP/3 support
  • HAProxy: Maximum performance, database LB, lowest resource usage
  • Envoy: Microservices, service mesh, dynamic configuration
  • Traefik: Docker/Kubernetes, automatic discovery, easy configuration

Configuration Examples

Complete working examples available in examples/ directory:

Cloud Providers:

  • examples/aws/alb-terraform.tf - AWS ALB with path-based routing
  • examples/aws/nlb-terraform.tf - AWS NLB for TCP load balancing

Self-Managed:

  • examples/nginx/http-load-balancing.conf - NGINX HTTP reverse proxy
  • examples/haproxy/http-lb.cfg - HAProxy configuration
  • examples/envoy/basic-lb.yaml - Envoy cluster configuration
  • examples/traefik/kubernetes-ingress.yaml - Traefik IngressRoute

Kubernetes:

  • examples/kubernetes/nginx-ingress.yaml - NGINX Ingress with TLS
  • examples/kubernetes/traefik-ingress.yaml - Traefik IngressRoute
  • examples/kubernetes/gateway-api.yaml - Gateway API configuration

Monitoring and Observability

Key Metrics

Throughput: Requests per second, bytes transferred, connection rate Latency: Request duration (p50, p95, p99), backend response time, SSL handshake time Errors: HTTP error rates (4xx, 5xx), backend connection failures, health check failures Resource Utilization: CPU, memory, active connections, connection queue depth Health: Healthy/unhealthy backend count, health check success rate

Load Balancer Logs

Enable access logs for request/response details, client IPs, response times, error tracking

  • AWS ALB: Store in S3, analyze with Athena
  • NGINX: Custom log format, ship to centralized logging
  • HAProxy: Syslog integration, structured logging

Troubleshooting

Uneven Load Distribution

Symptoms: One server receives disproportionate traffic Causes: Sticky sessions with few clients, IP hash with NAT concentration, long-lived connections Solutions: Switch to least connections, disable sticky sessions, implement connection draining

Health Check Flapping

Symptoms: Servers rapidly transition between healthy/unhealthy Causes: Health check timeout too short, threshold too low, network instability Solutions: Increase interval and timeout, implement hysteresis, use deep health checks

Session Loss After Failover

Symptoms: Users logged out when server fails Causes: Sticky sessions without replication, in-memory sessions Solutions: Implement shared session store (Redis), use client-side tokens (JWT)

Integration Points

Related Skills:

  • infrastructure-as-code - Deploy load balancers via Terraform/Pulumi
  • kubernetes-operations - Ingress controllers for K8s traffic management
  • network-architecture - Network design and topology for load balancing
  • deploying-applications - Blue-green and canary deployments via load balancers
  • observability - Load balancer metrics, access logs, distributed tracing
  • security-hardening - WAF integration, rate limiting, DDoS protection
  • service-mesh - Envoy as both ingress and service mesh proxy
  • implementing-tls - TLS termination and certificate management

Quick Reference

Selection Matrix

Use Case Recommended Solution
HTTP web app (AWS) ALB
Non-HTTP protocol (AWS) NLB
Kubernetes HTTP ingress NGINX Ingress or Traefik
Maximum performance HAProxy
Service mesh Envoy
Docker Swarm Traefik
Multi-cloud portable NGINX or HAProxy
Global distribution CloudFlare, AWS Global Accelerator

Algorithm Selection

Traffic Pattern Algorithm
Stateless, similar servers Round Robin
Stateless, different capacity Weighted Round Robin
Long-lived connections Least Connections
Performance-sensitive Least Response Time
Session persistence needed IP Hash or Cookie
Varying server load Resource-Based

Health Check Configuration

Service Type Check Type Interval Timeout
Web app HTTP /health 10s 3s
API HTTP /health/ready 10s 5s
Database TCP connect 5s 2s
Critical service HTTP deep check 5s 3s
Background worker HTTP /live 30s 5s

Summary

Load balancing is essential for distributing traffic, ensuring high availability, and enabling horizontal scaling. Choose L4 for raw performance and non-HTTP protocols, L7 for intelligent content-based routing. Prefer cloud-managed load balancers for simplicity and auto-scaling, self-managed for multi-cloud portability and advanced features. Implement proper health checks with hysteresis, avoid sticky sessions when possible, and monitor key metrics continuously.

For deployment patterns, see examples in examples/aws/, examples/nginx/, examples/kubernetes/, and other provider directories.

Weekly Installs
13
GitHub Stars
310
First Seen
Jan 25, 2026
Installed on
opencode12
gemini-cli12
github-copilot11
cursor11
claude-code10
codex10