ci-cd-specialist

Installation

SKILL.md

CI/CD Specialist

Senior CI/CD engineer specializing in GitHub Actions, release workflows, deployment strategies, hotfix processes, containerization, and build pipeline automation.

Role Definition

You are a senior CI/CD engineer who builds reliable, fast, and secure build/deploy pipelines. You automate everything from linting to production deployment. You design for fail-fast feedback, caching, security scanning, and safe rollback.

Core Principles

Fail fast — catch errors in the cheapest stage (lint → test → build → deploy)
Reproducible builds — same commit always produces the same artifact
Automate everything — manual steps are failure points and bottlenecks
Never expose secrets — in logs, artifacts, or error messages
Always have a rollback — every deployment must be reversible
Version everything — code, config, infrastructure, dependencies

GitHub Actions: Complete CI Pipeline

# .github/workflows/ci.yml
name: CI

on:
  push:
    branches: [main, develop]
  pull_request:
    branches: [main]

concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true  # Cancel stale runs on same branch

env:
  NODE_VERSION: '20'

jobs:
  # ── Lint & Type Check (fastest feedback) ──────────────────
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run lint
      - run: npm run typecheck

  # ── Unit Tests (parallel with lint) ───────────────────────
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        shard: [1, 2, 3]  # Parallel test shards
    services:
      postgres:
        image: postgres:16-alpine
        env:
          POSTGRES_PASSWORD: test
          POSTGRES_DB: test_db
        ports: ['5432:5432']
        options: >-
          --health-cmd pg_isready
          --health-interval 10s
          --health-timeout 5s
          --health-retries 5
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run db:migrate
        env:
          DATABASE_URL: postgres://postgres:test@localhost:5432/test_db
      - run: npm test -- --shard=${{ matrix.shard }}/3 --coverage
        env:
          DATABASE_URL: postgres://postgres:test@localhost:5432/test_db
      - uses: actions/upload-artifact@v4
        with:
          name: coverage-${{ matrix.shard }}
          path: coverage/

  # ── Security Scan (parallel) ──────────────────────────────
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: npm audit --audit-level=high
      - uses: github/codeql-action/init@v3
        with:
          languages: javascript
      - uses: github/codeql-action/analyze@v3

  # ── Build (after lint + test pass) ────────────────────────
  build:
    needs: [lint, test]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: ${{ env.NODE_VERSION }}
          cache: 'npm'
      - run: npm ci
      - run: npm run build
      - uses: actions/upload-artifact@v4
        with:
          name: build-output
          path: dist/
          retention-days: 7

  # ── Docker Image (only on main) ───────────────────────────
  docker:
    needs: [build, security]
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
      - uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: |
            ghcr.io/${{ github.repository }}:${{ github.sha }}
            ghcr.io/${{ github.repository }}:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

Production Dockerfile (Multi-Stage)

# ── Stage 1: Dependencies ──────────────────────────────────
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --only=production && \
    cp -R node_modules /prod_modules && \
    npm ci  # Full install for build stage

# ── Stage 2: Build ─────────────────────────────────────────
FROM node:20-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

# ── Stage 3: Production Runtime ────────────────────────────
FROM node:20-alpine AS runtime
WORKDIR /app

# Security: non-root user
RUN addgroup -g 1001 appgroup && \
    adduser -u 1001 -G appgroup -s /bin/sh -D appuser

# Copy only production dependencies + built output
COPY --from=deps /prod_modules ./node_modules
COPY --from=build /app/dist ./dist
COPY package.json ./

# Health check
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
  CMD wget -qO- http://localhost:3000/health || exit 1

USER appuser
EXPOSE 3000
CMD ["node", "dist/server.js"]

Docker Compose for Local Development

# docker-compose.yml
services:
  app:
    build:
      context: .
      target: build  # Use build stage for dev (includes devDeps)
    ports:
      - "3000:3000"
    volumes:
      - ./src:/app/src  # Hot reload
    environment:
      - NODE_ENV=development
      - DATABASE_URL=postgres://postgres:postgres@db:5432/app_dev
      - REDIS_URL=redis://redis:6379
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: app_dev
    ports:
      - "5432:5432"
    volumes:
      - pgdata:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"

volumes:
  pgdata:

Release Automation

Semantic Release Workflow

# .github/workflows/release.yml
name: Release

on:
  push:
    branches: [main]

permissions:
  contents: write
  packages: write

jobs:
  release:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0  # Full history for changelog generation

      - uses: actions/setup-node@v4
        with:
          node-version: 20
          cache: 'npm'

      - run: npm ci
      - run: npm test
      - run: npm run build

      # Determine version bump from conventional commits
      - name: Determine version
        id: version
        run: |
          # Analyze commits since last tag
          LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
          COMMITS=$(git log ${LAST_TAG}..HEAD --pretty=format:"%s")

          if echo "$COMMITS" | grep -qE "^feat!:|^BREAKING CHANGE:"; then
            echo "bump=major" >> $GITHUB_OUTPUT
          elif echo "$COMMITS" | grep -qE "^feat(\(.+\))?:"; then
            echo "bump=minor" >> $GITHUB_OUTPUT
          elif echo "$COMMITS" | grep -qE "^fix(\(.+\))?:"; then
            echo "bump=patch" >> $GITHUB_OUTPUT
          else
            echo "bump=none" >> $GITHUB_OUTPUT
          fi

      - name: Bump version
        if: steps.version.outputs.bump != 'none'
        run: |
          npm version ${{ steps.version.outputs.bump }} --no-git-tag-version
          VERSION=$(node -p "require('./package.json').version")
          echo "VERSION=$VERSION" >> $GITHUB_ENV

      - name: Generate changelog
        if: steps.version.outputs.bump != 'none'
        run: |
          LAST_TAG=$(git describe --tags --abbrev=0 2>/dev/null || echo "v0.0.0")
          {
            echo "## What's Changed"
            echo ""
            git log ${LAST_TAG}..HEAD --pretty=format:"- %s (%h)" | \
              sed 's/^- feat/- ✨ feat/; s/^- fix/- 🐛 fix/; s/^- docs/- 📚 docs/; s/^- perf/- ⚡ perf/'
          } > RELEASE_NOTES.md

      - name: Commit, tag, and push
        if: steps.version.outputs.bump != 'none'
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "github-actions[bot]@users.noreply.github.com"
          git add package.json package-lock.json
          git commit -m "chore: release v${VERSION}"
          git tag "v${VERSION}"
          git push origin main --tags

      - name: Create GitHub Release
        if: steps.version.outputs.bump != 'none'
        uses: softprops/action-gh-release@v1
        with:
          tag_name: v${{ env.VERSION }}
          body_path: RELEASE_NOTES.md
          generate_release_notes: true

Conventional Commits Reference

feat:     New feature                    → MINOR bump (1.0.0 → 1.1.0)
fix:      Bug fix                        → PATCH bump (1.0.0 → 1.0.1)
feat!:    Breaking feature               → MAJOR bump (1.0.0 → 2.0.0)
docs:     Documentation only             → no release
style:    Formatting, whitespace         → no release
refactor: Code restructuring             → no release
perf:     Performance improvement        → PATCH bump
test:     Adding/fixing tests            → no release
chore:    Build, tooling, deps           → no release

# Examples
feat(auth): add OAuth2 login support
fix(api): handle null response from payment gateway
feat!: rename User.fullName to User.displayName

BREAKING CHANGE: User.fullName has been renamed to User.displayName.
Update all references in your code.

Deployment Strategies

Blue-Green Deployment

# .github/workflows/deploy-blue-green.yml
name: Deploy (Blue-Green)

on:
  workflow_dispatch:
    inputs:
      version:
        description: 'Version to deploy'
        required: true

jobs:
  deploy:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Deploy to inactive environment
        run: |
          # Determine which environment is inactive
          ACTIVE=$(aws elbv2 describe-target-groups \
            --names "app-blue" "app-green" \
            --query "TargetGroups[?length(LoadBalancerArns)>`0`].TargetGroupName" \
            --output text)

          if [ "$ACTIVE" = "app-blue" ]; then
            DEPLOY_TO="green"
          else
            DEPLOY_TO="blue"
          fi

          echo "Deploying v${{ inputs.version }} to $DEPLOY_TO"

          # Deploy new version to inactive environment
          aws ecs update-service \
            --cluster production \
            --service "app-${DEPLOY_TO}" \
            --task-definition "app:${{ inputs.version }}" \
            --desired-count 3

      - name: Health check inactive environment
        run: |
          for i in {1..30}; do
            if curl -sf "https://${DEPLOY_TO}.internal.example.com/health"; then
              echo "Health check passed"
              exit 0
            fi
            sleep 10
          done
          echo "Health check failed"
          exit 1

      - name: Switch traffic
        run: |
          # Update ALB listener to point to new target group
          aws elbv2 modify-listener \
            --listener-arn $LISTENER_ARN \
            --default-actions Type=forward,TargetGroupArn=$NEW_TG_ARN

      - name: Verify production
        run: |
          sleep 30
          curl -sf https://api.example.com/health
          # Run smoke tests
          npm run test:smoke:production

      - name: Scale down old environment
        run: |
          aws ecs update-service \
            --cluster production \
            --service "app-${OLD_ENV}" \
            --desired-count 0

Canary Deployment

# Kubernetes canary with progressive traffic shifting
# canary-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: app-canary
  labels:
    app: myapp
    track: canary
spec:
  replicas: 1  # Start with 1 canary pod
  selector:
    matchLabels:
      app: myapp
      track: canary
  template:
    metadata:
      labels:
        app: myapp
        track: canary
    spec:
      containers:
        - name: app
          image: ghcr.io/org/app:v1.3.0  # New version
          ports:
            - containerPort: 3000
          readinessProbe:
            httpGet:
              path: /health
              port: 3000
            initialDelaySeconds: 5
            periodSeconds: 10
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
---
# Istio VirtualService for traffic splitting
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: app-routing
spec:
  hosts: [api.example.com]
  http:
    - route:
        - destination:
            host: app-stable
            port:
              number: 3000
          weight: 90  # 90% to stable
        - destination:
            host: app-canary
            port:
              number: 3000
          weight: 10  # 10% to canary

Hotfix Process

#!/bin/bash
# scripts/hotfix.sh — Guided hotfix workflow
set -e

echo "🚨 HOTFIX WORKFLOW"
echo "=================="

# 1. Branch from latest production tag
LATEST_TAG=$(git describe --tags --abbrev=0)
echo "Latest production tag: $LATEST_TAG"

read -p "Hotfix branch name (e.g., fix-auth-bypass): " BRANCH_NAME
git checkout "$LATEST_TAG"
git checkout -b "hotfix/$BRANCH_NAME"

echo ""
echo "📝 Make your fix now, then run this script again with --continue"
echo "   Remember: MINIMAL changes only. No refactoring."
echo ""

if [ "$1" = "--continue" ]; then
  # 2. Run targeted tests
  echo "Running tests..."
  npm test

  # 3. Version bump (patch)
  npm version patch --no-git-tag-version
  VERSION=$(node -p "require('./package.json').version")

  # 4. Commit, tag, push
  git add -A
  git commit -m "hotfix: $BRANCH_NAME

Fixes critical issue in production.
Version: $VERSION"

  git tag "v${VERSION}"
  git push origin "hotfix/$BRANCH_NAME" "v${VERSION}"

  # 5. Merge back to main
  echo "Creating PR to merge hotfix back to main..."
  gh pr create \
    --base main \
    --title "hotfix: merge $BRANCH_NAME back to main" \
    --body "Automated PR to merge hotfix v${VERSION} back to main branch."

  echo ""
  echo "✅ Hotfix v${VERSION} tagged and pushed."
  echo "   Deploy will trigger automatically from the tag."
  echo "   Don't forget to merge the PR back to main!"
fi

Rollback Script

#!/bin/bash
# scripts/rollback.sh — Emergency rollback
set -e

echo "🔄 ROLLBACK PROCEDURE"
echo "====================="

# Show recent versions
echo "Recent versions:"
git tag --sort=-version:refname | head -5
echo ""

CURRENT=$(curl -sf https://api.example.com/version 2>/dev/null || echo "unknown")
echo "Current production version: $CURRENT"
echo ""

read -p "Rollback to version (e.g., v1.2.3): " TARGET_VERSION

# Validate version exists
if ! git rev-parse "$TARGET_VERSION" >/dev/null 2>&1; then
  echo "❌ Version $TARGET_VERSION not found"
  exit 1
fi

echo ""
echo "⚠️  Rolling back from $CURRENT to $TARGET_VERSION"
read -p "Are you sure? (type 'rollback' to confirm): " CONFIRM
if [ "$CONFIRM" != "rollback" ]; then
  echo "Cancelled."
  exit 0
fi

# Check for database migrations between versions
echo "Checking for database migrations..."
MIGRATION_COUNT=$(git diff --name-only "$TARGET_VERSION"..HEAD -- 'migrations/' | wc -l)
if [ "$MIGRATION_COUNT" -gt 0 ]; then
  echo "⚠️  WARNING: $MIGRATION_COUNT migration(s) exist between versions."
  echo "   Database rollback may cause data loss."
  read -p "Continue anyway? (yes/no): " DB_CONFIRM
  if [ "$DB_CONFIRM" != "yes" ]; then
    echo "Cancelled. Consider a forward fix instead."
    exit 0
  fi
fi

# Execute rollback
echo "Deploying $TARGET_VERSION..."
kubectl set image deployment/app app="ghcr.io/org/app:${TARGET_VERSION}" \
  --record
kubectl rollout status deployment/app --timeout=300s

# Verify
echo "Verifying..."
sleep 15
NEW_VERSION=$(curl -sf https://api.example.com/version)
if [ "$NEW_VERSION" = "$TARGET_VERSION" ]; then
  echo "✅ Rollback to $TARGET_VERSION successful"
else
  echo "❌ Version mismatch: expected $TARGET_VERSION, got $NEW_VERSION"
  echo "   Manual intervention may be required."
  exit 1
fi

Pipeline Optimization Tips

# Speed up CI with these patterns:

# 1. Cancel stale runs
concurrency:
  group: ci-${{ github.ref }}
  cancel-in-progress: true

# 2. Cache aggressively
- uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
    key: deps-${{ hashFiles('package-lock.json') }}

# 3. Run independent jobs in parallel (not sequential)
jobs:
  lint:    ...
  test:    ...
  security: ...
  build:
    needs: [lint, test]  # Only build depends on lint+test

# 4. Use matrix for parallel test shards
test:
  strategy:
    matrix:
      shard: [1, 2, 3, 4]
  steps:
    - run: npm test -- --shard=${{ matrix.shard }}/4

# 5. Skip jobs when only docs changed
test:
  if: |
    !contains(github.event.head_commit.message, '[skip ci]') &&
    !startsWith(github.event.head_commit.message, 'docs:')

# 6. Docker layer caching
- uses: docker/build-push-action@v5
  with:
    cache-from: type=gha
    cache-to: type=gha,mode=max

Anti-Patterns to Avoid

❌ No concurrency control — stale CI runs waste resources and confuse results
❌ Sequential jobs that could run parallel — lint/test/security are independent
❌ Building Docker images without cache — multi-stage + layer caching saves minutes
❌ Secrets in code or logs — use GitHub secrets + mask in output
❌ No rollback plan — if you can't undo it, don't deploy it
❌ Manual version bumping — use conventional commits + automated semver
❌ Deploying without health checks — verify the new version actually works
❌ Skipping staging — "it works in CI" is not enough
❌ Massive infrequent releases — small frequent releases are safer and easier to debug
❌ No deployment notifications — the team should know when and what was deployed

Adapted from buildwithclaude by Dave Poon (MIT)

Related skills

More from jgarrison929/openclaw-skills

Installs

Repository

jgarrison929/op…w-skills

GitHub Stars

First Seen

Mar 1, 2026

Security Audits

Gen Agent Trust HubPass

SocketWarn

SnykWarn