ci-cd-expert
SKILL.md
CI/CD Expert
Create, debug, and optimize CI/CD pipelines.
Platform Selection
| Platform | Best For | Key Strength |
|---|---|---|
| GitHub Actions | GitHub-hosted repos, open source | Native GitHub integration, marketplace |
| GitLab CI | GitLab repos, self-hosted | Built-in registry, Auto DevOps |
| CircleCI | Complex workflows, speed | Parallelism, orbs ecosystem |
| Azure DevOps | Microsoft/enterprise, multi-repo | Azure integration, YAML templates |
| Bitbucket | Atlassian stack, Jira integration | Pipes marketplace, deployments |
Quick pick:
- GitHub repo? → GitHub Actions
- GitLab repo? → GitLab CI
- Need extreme parallelism? → CircleCI
- Azure/Microsoft shop? → Azure DevOps
- Using Jira/Confluence? → Bitbucket Pipelines
Core Workflows
1. Pipeline Creation
Requirements → Platform Selection → Structure → Implementation → Validation
Steps:
- Identify triggers (push, PR, schedule, manual)
- Define stages (build, test, deploy)
- Map environments (dev, staging, prod)
- Configure secrets and variables
- Set up caching strategy
- Implement deployment gates
Minimal viable pipeline:
# Every pipeline needs these elements
triggers: # When to run
stages: # What to run (in order)
- build # Compile/bundle
- test # Validate
- deploy # Ship (optional)
caching: # Speed optimization
environment: # Secrets/variables
2. Pipeline Review
Review checklist (in order of priority):
-
Security
- Secrets not hardcoded
- Minimal permissions (least privilege)
- Dependencies pinned (no
@latest) - Untrusted input sanitized
-
Reliability
- Idempotent steps
- Explicit failure handling
- Timeouts configured
- Retry logic for flaky steps
-
Efficiency
- Caching implemented
- Parallelization where possible
- Conditional execution (skip unchanged)
- Resource sizing appropriate
-
Maintainability
- DRY (reusable workflows/templates)
- Clear naming conventions
- Comments on non-obvious logic
3. Pipeline Debugging
Diagnostic framework:
Symptom → Category → Platform-specific fix
Common failure categories:
| Symptom | Likely Cause | First Check |
|---|---|---|
| "Command not found" | Missing dependency | Environment setup |
| "Permission denied" | File/secret access | Permissions config |
| "Connection refused" | Service not ready | Health checks, wait |
| "Out of memory" | Resource limits | Runner sizing |
| "Timeout exceeded" | Slow step or deadlock | Step isolation |
| "Exit code 1" | Script failure | Run locally first |
Debug approach:
- Read the full error (scroll up!)
- Identify which step failed
- Check if it works locally
- Add debug output (
set -x, verbose flags) - Check platform-specific logs
4. Deployment Strategy Selection
Risk Tolerance
↑
Low │ High
│
┌─────────────────┼─────────────────┐
│ │ │
Slow │ Blue-Green │ Rolling │ Fast
│ (safest) │ (balanced) │
│ │ │
├─────────────────┼─────────────────┤
│ │ │
│ Canary │ Direct │
│ (measured) │ (fastest) │
│ │ │
└─────────────────┴─────────────────┘
Rollback Speed →
Decision guide:
| Strategy | Use When | Avoid When |
|---|---|---|
| Blue-Green | Zero downtime critical, simple rollback needed | Limited infrastructure budget |
| Canary | Need to validate with real traffic, gradual rollout | Urgent hotfix, no traffic splitting |
| Rolling | Large clusters, cost-conscious | Stateful apps, breaking changes |
| Direct | Dev/test environments, low risk | Production, user-facing services |
5. Build Optimization
Optimization priority:
- Caching (highest impact)
- Parallelization
- Conditional execution
- Resource sizing
- Image optimization
Caching hierarchy:
Dependencies (npm, pip) → Build artifacts → Docker layers → Test fixtures
↑ ↑ ↑ ↑
Always cache If build slow If using Docker If tests slow
Cache key strategies:
# Dependencies - hash lockfile
key: deps-${{ hashFiles('package-lock.json') }}
# Build artifacts - hash source
key: build-${{ hashFiles('src/**') }}
# Combined - hash both
key: ${{ runner.os }}-${{ hashFiles('**/lockfiles') }}
Common Patterns
Secret Management
# ✅ Good: Environment variables
env:
API_KEY: ${{ secrets.API_KEY }}
# ❌ Bad: Inline secrets
run: curl -H "Authorization: hardcoded-secret"
Secret rotation checklist:
- Secrets stored in platform secret store
- Secrets scoped to minimum required (repo/org/env)
- Rotation procedure documented
- No secrets in logs (mask sensitive output)
Environment Progression
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Dev │ → │Staging │ → │ QA │ → │ Prod │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
Auto Auto Manual Manual
deploy deploy approval approval
Artifact Management
# Upload
- name: Upload build
uses: actions/upload-artifact@v4
with:
name: dist
path: dist/
retention-days: 7
# Download (later job)
- name: Download build
uses: actions/download-artifact@v4
with:
name: dist
Matrix Builds
# Test across multiple versions
strategy:
matrix:
node: [18, 20, 22]
os: [ubuntu-latest, macos-latest]
fail-fast: false # Continue other jobs if one fails
Anti-Patterns
❌ Secrets in Logs
# BAD
run: echo "Deploying with key $API_KEY"
# GOOD
run: echo "Deploying..." && deploy --key "$API_KEY"
❌ No Caching
# BAD: Downloads deps every run
- run: npm install
# GOOD: Cache node_modules
- uses: actions/cache@v4
with:
path: node_modules
key: ${{ hashFiles('package-lock.json') }}
- run: npm ci
❌ Mutable Tags
# BAD: Could change anytime
uses: actions/checkout@master
# GOOD: Pinned version
uses: actions/checkout@v4
❌ Overly Broad Triggers
# BAD: Runs on every push to every branch
on: push
# GOOD: Specific triggers
on:
push:
branches: [main]
pull_request:
branches: [main]
❌ Sequential When Parallel Works
# BAD: 15 min total
jobs:
lint:
...
test:
needs: lint # Waits for lint
...
# GOOD: 10 min total (parallel)
jobs:
lint:
...
test:
... # Runs alongside lint
Troubleshooting Quick Reference
| Error Pattern | Platform | Solution |
|---|---|---|
EACCES: permission denied |
All | chmod +x script.sh or check file ownership |
rate limit exceeded |
GitHub | Use GITHUB_TOKEN, implement backoff |
no space left on device |
All | Clean workspace, use smaller runner |
SSL certificate problem |
All | Update CA certs, check proxy settings |
container not found |
Docker-based | Check registry auth, image exists |
workflow not triggered |
GitHub | Check trigger conditions, branch patterns |
job timeout |
All | Increase timeout, optimize slow steps |
Integration Points
With version control:
- Branch protection rules
- Required status checks
- Auto-merge on success
With deployment targets:
- Cloud providers (AWS, GCP, Azure)
- Container registries (Docker Hub, ECR, GCR)
- Kubernetes clusters
- Serverless (Lambda, Cloud Functions)
With monitoring:
- Build metrics (duration, success rate)
- Deployment tracking
- Error alerting (Slack, PagerDuty)
Skill Usage
Creating a pipeline:
- Identify platform from repo hosting
- Read platform-specific reference
- Follow creation workflow above
- Validate with platform's linter
Debugging a failure:
- Categorize the error (table above)
- Read debugging-guide.md for detailed steps
- Check platform-specific quirks
Implementing deployments:
- Choose strategy using decision guide
- Read deployment-strategies.md
- Implement with platform-specific syntax
Security audit:
- Use review checklist above
- Read security-checklist.md for comprehensive review
- Generate ohno tasks for findings
References:
- references/github-actions.md — GitHub Actions syntax, patterns, marketplace actions
- references/gitlab-ci.md — GitLab CI syntax, Auto DevOps, runners
- references/circleci.md — CircleCI orbs, parallelism, caching
- references/azure-devops.md — Azure Pipelines, templates, stages
- references/bitbucket-pipelines.md — Bitbucket Pipelines, pipes, deployments
- references/deployment-strategies.md — Blue-green, canary, rolling implementations
- references/debugging-guide.md — Platform-specific debugging techniques
- references/security-checklist.md — OWASP CI/CD security, supply chain
Weekly Installs
6
Repository
srstomp/pokayokayGitHub Stars
2
First Seen
Jan 24, 2026
Installed on
claude-code4
codex3
antigravity3
windsurf3
trae3
github-copilot3