devops-patterns
DevOps Patterns
This skill provides comprehensive guidance for implementing DevOps practices, automation, and deployment strategies.
CI/CD Pipeline Design
Pipeline Stages
# Complete CI/CD Pipeline
stages:
- lint # Code quality checks
- test # Run test suite
- build # Build artifacts
- scan # Security scanning
- deploy-dev # Deploy to development
- deploy-staging # Deploy to staging
- deploy-prod # Deploy to production
Pipeline Best Practices
1. Fast Feedback: Run fastest checks first
jobs:
# Quick checks first (1-2 minutes)
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm run lint
type-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: npm run type-check
# Longer tests after (5-10 minutes)
test:
needs: [lint, type-check]
runs-on: ubuntu-latest
steps:
- run: npm test
2. Fail Fast: Stop pipeline on first failure 3. Idempotent: Running twice produces same result 4. Versioned: Pipeline config in version control
GitHub Actions Patterns
Basic Workflow Structure
# .github/workflows/ci.yml
name: CI
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
NODE_VERSION: '18'
jobs:
test:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
files: ./coverage/coverage-final.json
Reusable Workflows
# .github/workflows/reusable-test.yml
name: Reusable Test Workflow
on:
workflow_call:
inputs:
node-version:
required: true
type: string
secrets:
DATABASE_URL:
required: true
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
with:
node-version: ${{ inputs.node-version }}
- run: npm ci
- run: npm test
env:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
# Use in another workflow
# .github/workflows/main.yml
jobs:
call-test:
uses: ./.github/workflows/reusable-test.yml
with:
node-version: '18'
secrets:
DATABASE_URL: ${{ secrets.DATABASE_URL }}
Matrix Strategy
# Test across multiple versions
jobs:
test:
strategy:
matrix:
node-version: [16, 18, 20]
os: [ubuntu-latest, windows-latest, macos-latest]
runs-on: ${{ matrix.os }}
steps:
- uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
- run: npm test
Custom Actions
# .github/actions/deploy/action.yml
name: 'Deploy Application'
description: 'Deploy to specified environment'
inputs:
environment:
description: 'Target environment'
required: true
api-key:
description: 'Deployment API key'
required: true
runs:
using: 'composite'
steps:
- run: |
echo "Deploying to ${{ inputs.environment }}"
./deploy.sh ${{ inputs.environment }}
env:
API_KEY: ${{ inputs.api-key }}
shell: bash
# Usage
jobs:
deploy:
steps:
- uses: ./.github/actions/deploy
with:
environment: production
api-key: ${{ secrets.DEPLOY_KEY }}
Conditional Execution
jobs:
deploy:
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
steps:
- name: Deploy to production
run: ./deploy.sh production
notify:
if: failure()
runs-on: ubuntu-latest
steps:
- name: Send failure notification
uses: slack/notify@v2
with:
message: 'Build failed!'
Infrastructure as Code (Terraform)
Project Structure
terraform/
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── eks/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ └── prod/
│ ├── main.tf
│ └── terraform.tfvars
└── global/
└── s3/
└── main.tf
VPC Module Example
# modules/vpc/main.tf
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
count = length(var.public_subnet_cidrs)
vpc_id = aws_vpc.main.id
cidr_block = var.public_subnet_cidrs[count.index]
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-public-${count.index + 1}"
}
}
# modules/vpc/variables.tf
variable "environment" {
description = "Environment name"
type = string
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
}
variable "public_subnet_cidrs" {
description = "CIDR blocks for public subnets"
type = list(string)
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
}
# modules/vpc/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "public_subnet_ids" {
value = aws_subnet.public[*].id
}
Using Modules
# environments/prod/main.tf
terraform {
required_version = ">= 1.0"
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = "us-east-1"
}
module "vpc" {
source = "../../modules/vpc"
environment = "prod"
vpc_cidr = "10.0.0.0/16"
public_subnet_cidrs = ["10.0.1.0/24", "10.0.2.0/24"]
availability_zones = ["us-east-1a", "us-east-1b"]
}
module "eks" {
source = "../../modules/eks"
cluster_name = "prod-cluster"
vpc_id = module.vpc.vpc_id
subnet_ids = module.vpc.public_subnet_ids
node_count = 3
node_instance_type = "t3.large"
}
Docker Best Practices
Multi-Stage Builds
# Build stage
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY . .
# Build application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
WORKDIR /app
# Copy only necessary files from builder
COPY /app/package*.json ./
COPY /app/node_modules ./node_modules
COPY /app/dist ./dist
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
USER nodejs
EXPOSE 3000
CMD ["node", "dist/index.js"]
Layer Optimization
# ✅ GOOD - Dependencies cached separately
FROM node:18-alpine
WORKDIR /app
# Copy package files first (rarely change)
COPY package*.json ./
RUN npm ci
# Copy source code (changes frequently)
COPY . .
RUN npm run build
# ❌ BAD - Everything in one layer
FROM node:18-alpine
WORKDIR /app
COPY . .
RUN npm ci && npm run build
# Cache invalidated on every source change
Security Best Practices
# ✅ Use specific versions
FROM node:18.17.1-alpine
# ✅ Run as non-root user
RUN addgroup -g 1001 nodejs && \
adduser -S nodejs -u 1001
USER nodejs
# ✅ Use .dockerignore
# .dockerignore:
node_modules
.git
.env
*.md
.github
# ✅ Scan for vulnerabilities
# docker scan myapp:latest
# ✅ Use minimal base images
FROM node:18-alpine # Not node:18 (full)
# ✅ Don't include secrets
# Use build args or runtime env vars
ARG API_KEY
ENV API_KEY=${API_KEY}
Docker Compose for Development
# docker-compose.yml
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile.dev
ports:
- '3000:3000'
volumes:
- .:/app
- /app/node_modules
environment:
- NODE_ENV=development
- DATABASE_URL=postgresql://user:pass@db:5432/mydb
depends_on:
- db
- redis
db:
image: postgres:15-alpine
ports:
- '5432:5432'
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports:
- '6379:6379'
volumes:
postgres_data:
Kubernetes Patterns
Deployment
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:1.0.0
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: production
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: myapp-secrets
key: database-url
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
Service
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
ports:
- protocol: TCP
port: 80
targetPort: 3000
type: LoadBalancer
ConfigMap and Secrets
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
LOG_LEVEL: info
MAX_CONNECTIONS: "100"
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
type: Opaque
data:
database-url: cG9zdGdyZXNxbDovL3VzZXI6cGFzc0BkYjU0MzIvbXlkYg==
api-key: c2tfbGl2ZV9hYmMxMjN4eXo=
Ingress
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp-service
port:
number: 80
Deployment Strategies
Blue-Green Deployment
# Blue deployment (current production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myapp:1.0.0
---
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myapp:2.0.0
---
# Service (switch by changing selector)
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp
version: blue # Change to 'green' to switch
ports:
- port: 80
targetPort: 3000
Canary Deployment
# Stable deployment (90% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-stable
spec:
replicas: 9
selector:
matchLabels:
app: myapp
track: stable
---
# Canary deployment (10% traffic)
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-canary
spec:
replicas: 1
selector:
matchLabels:
app: myapp
track: canary
---
# Service routes to both
apiVersion: v1
kind: Service
metadata:
name: myapp-service
spec:
selector:
app: myapp # Matches both stable and canary
ports:
- port: 80
targetPort: 3000
Rolling Update
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 10
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2 # Max 2 extra pods during update
maxUnavailable: 1 # Max 1 pod unavailable during update
selector:
matchLabels:
app: myapp
template:
spec:
containers:
- name: myapp
image: myapp:2.0.0
Database Migration Strategies
Forward-Only Migrations
// ✅ GOOD - Backwards compatible
// Step 1: Add new column (nullable)
await db.schema.alterTable('users', (table) => {
table.string('phone_number').nullable();
});
// Step 2: Populate data
await db('users').update({
phone_number: db.raw('contact_info'),
});
// Step 3: Make non-nullable (separate deployment)
await db.schema.alterTable('users', (table) => {
table.string('phone_number').notNullable().alter();
});
// Step 4: Drop old column (separate deployment)
await db.schema.alterTable('users', (table) => {
table.dropColumn('contact_info');
});
Zero-Downtime Migrations
// Rename column without downtime
// Migration 1: Add new column
await db.schema.alterTable('users', (table) => {
table.string('email_address').nullable();
});
// Update application code to write to both columns
class User {
async save() {
await db('users').update({
email: this.email,
email_address: this.email, // Write to both
});
}
}
// Migration 2: Backfill data
await db.raw(`
UPDATE users
SET email_address = email
WHERE email_address IS NULL
`);
// Migration 3: Update app to read from new column
class User {
get email() {
return this.email_address; // Read from new column
}
}
// Migration 4: Drop old column
await db.schema.alterTable('users', (table) => {
table.dropColumn('email');
});
Environment Management
Environment Configuration
// config/environments.ts
interface EnvironmentConfig {
database: {
host: string;
port: number;
name: string;
};
api: {
baseUrl: string;
timeout: number;
};
features: {
enableNewFeature: boolean;
};
}
const environments: Record<string, EnvironmentConfig> = {
development: {
database: {
host: 'localhost',
port: 5432,
name: 'myapp_dev',
},
api: {
baseUrl: 'http://localhost:3000',
timeout: 30000,
},
features: {
enableNewFeature: true,
},
},
staging: {
database: {
host: 'staging-db.example.com',
port: 5432,
name: 'myapp_staging',
},
api: {
baseUrl: 'https://staging-api.example.com',
timeout: 10000,
},
features: {
enableNewFeature: true,
},
},
production: {
database: {
host: process.env.DB_HOST!,
port: parseInt(process.env.DB_PORT!),
name: 'myapp_prod',
},
api: {
baseUrl: 'https://api.example.com',
timeout: 5000,
},
features: {
enableNewFeature: false,
},
},
};
export const config = environments[process.env.NODE_ENV || 'development'];
Monitoring
Prometheus Metrics
import prometheus from 'prom-client';
// Create metrics
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status'],
});
const httpRequestTotal = new prometheus.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status'],
});
// Middleware to track metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration
.labels(req.method, req.route?.path || req.path, res.statusCode.toString())
.observe(duration);
httpRequestTotal
.labels(req.method, req.route?.path || req.path, res.statusCode.toString())
.inc();
});
next();
});
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', prometheus.register.contentType);
res.end(await prometheus.register.metrics());
});
Grafana Dashboard
{
"dashboard": {
"title": "Application Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total[5m])"
}
]
},
{
"title": "Response Time (p95)",
"targets": [
{
"expr": "histogram_quantile(0.95, http_request_duration_seconds_bucket)"
}
]
},
{
"title": "Error Rate",
"targets": [
{
"expr": "rate(http_requests_total{status=~\"5..\"}[5m])"
}
]
}
]
}
}
Log Aggregation
// Winston logger with JSON format
import winston from 'winston';
const logger = winston.createLogger({
level: 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'myapp',
environment: process.env.NODE_ENV,
},
transports: [
new winston.transports.File({ filename: 'error.log', level: 'error' }),
new winston.transports.File({ filename: 'combined.log' }),
],
});
// Structured logging
logger.info('User logged in', {
userId: user.id,
email: user.email,
ip: req.ip,
});
Disaster Recovery
Backup Strategy
#!/bin/bash
# backup-database.sh
# Configuration
DB_HOST="${DB_HOST}"
DB_NAME="${DB_NAME}"
BACKUP_DIR="/backups"
S3_BUCKET="s3://my-backups"
RETENTION_DAYS=30
# Create backup
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/${DB_NAME}_${TIMESTAMP}.sql.gz"
# Dump database
pg_dump -h "${DB_HOST}" -U postgres "${DB_NAME}" | gzip > "${BACKUP_FILE}"
# Upload to S3
aws s3 cp "${BACKUP_FILE}" "${S3_BUCKET}/"
# Remove local backup
rm "${BACKUP_FILE}"
# Delete old backups from S3
aws s3 ls "${S3_BUCKET}/" | while read -r line; do
FILE_DATE=$(echo "$line" | awk '{print $1}')
FILE_NAME=$(echo "$line" | awk '{print $4}')
FILE_EPOCH=$(date -d "$FILE_DATE" +%s)
CURRENT_EPOCH=$(date +%s)
DAYS_OLD=$(( (CURRENT_EPOCH - FILE_EPOCH) / 86400 ))
if [ $DAYS_OLD -gt $RETENTION_DAYS ]; then
aws s3 rm "${S3_BUCKET}/${FILE_NAME}"
fi
done
Recovery Plan
## Disaster Recovery Plan
### RTO (Recovery Time Objective): 4 hours
### RPO (Recovery Point Objective): 1 hour
### Recovery Steps:
1. **Assess the situation**
- Identify scope of failure
- Notify stakeholders
2. **Restore database**
```bash
# Download latest backup
aws s3 cp s3://my-backups/latest.sql.gz /tmp/
# Restore database
gunzip -c /tmp/latest.sql.gz | psql -h new-db -U postgres myapp
-
Deploy application
# Deploy to new infrastructure kubectl apply -f k8s/production/ # Update DNS aws route53 change-resource-record-sets ... -
Verify recovery
- Run smoke tests
- Check monitoring dashboards
- Verify critical features
-
Post-mortem
- Document incident
- Identify root cause
- Create action items
## When to Use This Skill
Use this skill when:
- Setting up CI/CD pipelines
- Deploying applications
- Managing infrastructure
- Implementing deployment strategies
- Configuring monitoring
- Planning disaster recovery
- Containerizing applications
- Orchestrating with Kubernetes
- Automating workflows
- Scaling infrastructure
---
**Remember**: DevOps is about automation, reliability, and continuous improvement. Invest in your infrastructure and deployment processes to enable faster, safer releases.
More from webdevtodayjason/titanium-plugins
frontend-patterns
Modern frontend architecture patterns for React, Next.js, and TypeScript including component composition, state management, performance optimization, accessibility, and responsive design. Use when building UI components, implementing frontend features, optimizing performance, or working with React/Next.js applications.
9api-best-practices
REST API design patterns, OpenAPI specifications, versioning strategies, authentication, error handling, and security best practices. Use when designing APIs, creating endpoints, documenting APIs, or implementing backend services that expose HTTP APIs.
4project-planning
Project planning methodologies including work breakdown structure, task estimation, dependency management, risk assessment, sprint planning, and stakeholder communication. Use when breaking down projects, estimating work, planning sprints, or managing dependencies.
2testing-strategy
Comprehensive testing strategies including test pyramid, TDD methodology, testing patterns, coverage goals, and CI/CD integration. Use when writing tests, implementing TDD, reviewing test coverage, debugging test failures, or setting up testing infrastructure.
2code-quality-standards
Code quality standards including SOLID principles, design patterns, code smells, refactoring techniques, naming conventions, and technical debt management. Use when reviewing code, refactoring, ensuring quality, or detecting code smells.
2security-checklist
Comprehensive security checklist covering OWASP Top 10, SQL injection, XSS, CSRF, authentication, authorization, secrets management, input validation, and security headers. Use when scanning for vulnerabilities, reviewing security, implementing authentication/authorization, or handling sensitive data.
2