devops-workflow-engineer
DevOps Workflow Engineer
The agent generates GitHub Actions workflow YAML, analyzes existing pipelines for optimization opportunities, and creates deployment plans with strategy selection, health checks, and rollback procedures.
Quick Start
# Generate a CI workflow
python scripts/workflow_generator.py --type ci --language python --test-framework pytest
# Analyze existing pipelines for optimization
python scripts/pipeline_analyzer.py .github/workflows/ --format json
# Plan a deployment strategy
python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --strategy canary
Tools Overview
| Tool | Input | Output |
|---|---|---|
workflow_generator.py |
Workflow type + language | GitHub Actions YAML (ci, cd, release, security-scan, docs-check) |
pipeline_analyzer.py |
Workflow file or directory | Optimization findings, cost estimates, severity ratings |
deployment_planner.py |
Project type + environments | Deployment plan with strategy, health checks, rollback |
All tools support --format json and --output for file writing.
Workflow 1: CI Pipeline Design
The agent generates pipelines following fail-fast ordering:
- Lint and format (~30s) -- cheapest gate first
- Unit tests (~2-5m) -- matrix across versions
- Build verification (~3-8m)
- Integration tests (~5-15m, parallel with build)
- Security scanning (~2-5m)
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: make lint
test:
needs: lint
strategy:
matrix:
python-version: ['3.10', '3.11', '3.12']
steps:
- uses: actions/setup-python@v5
with: { python-version: "${{ matrix.python-version }}", cache: pip }
- run: pip install -r requirements.txt
- run: pytest --junitxml=results.xml
security:
needs: lint
steps:
- run: pip-audit -r requirements.txt
CI targets:
| Metric | Target | Fix |
|---|---|---|
| Total CI time | < 10 min | Parallelize, add caching |
| Lint step | < 1 min | Use pre-commit locally |
| Unit tests | < 5 min | Split suites, use matrix |
| Flaky rate | < 1% | Quarantine flaky tests |
| Cache hit rate | > 80% | Review cache keys |
Workflow 2: CD Pipeline and Multi-Environment Deployment
python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --format json
Environment promotion flow:
Build -> Dev (auto) -> Staging (auto) -> Production (manual approval)
|
Canary (10%) -> Full rollout
| Aspect | Dev | Staging | Production |
|---|---|---|---|
| Trigger | Every push | Merge to main | Manual approval |
| Replicas | 1 | 2 | 3+ (auto-scaled) |
| Secrets | Repository | Environment | Vault/OIDC |
| Monitoring | Basic logs | Full observability | Full + alerting |
Key CD rules:
- Build once, deploy the same artifact everywhere
- Tag artifacts with commit SHA for traceability
- Use environment protection rules for production gates
- Maintain rollback capability at every stage
Workflow 3: Pipeline Optimization
python scripts/pipeline_analyzer.py .github/workflows/ --format json -o report.json
The agent checks for:
- Missing caching -- dependencies reinstalled every run
- No timeouts -- stuck jobs burn budget
- Sequential chains that could parallelize
- Deprecated actions with newer versions available
- Security issues -- secrets in logs, missing permissions scoping
- Cost inefficiency -- oversized runners, no path filtering
Optimization techniques:
Path-based filtering -- skip CI for docs-only changes:
on:
push:
paths: ['src/**', 'tests/**', 'requirements*.txt']
paths-ignore: ['docs/**', '*.md']
Concurrency cancellation -- cancel superseded runs:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
Dependency caching:
- uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt') }}
Deployment Strategies
Decision tree:
Zero-downtime required?
No -> Rolling deployment
Yes -> Need instant rollback?
No -> Rolling with health checks
Yes -> Budget for 2x infrastructure?
Yes -> Blue-green
No -> Canary
Canary traffic split schedule:
| Phase | % | Duration | Gate |
|---|---|---|---|
| 1 | 5% | 15 min | Error rate < 0.1% |
| 2 | 25% | 30 min | P99 latency < 200ms |
| 3 | 50% | 60 min | Business metrics stable |
| 4 | 100% | -- | Full promotion |
GitHub Actions Patterns
Reusable workflows -- define once, call everywhere:
# .github/workflows/reusable-deploy.yml
on:
workflow_call:
inputs:
environment: { required: true, type: string }
image_tag: { required: true, type: string }
secrets:
DEPLOY_KEY: { required: true }
OIDC authentication -- no long-lived credentials:
permissions:
id-token: write
contents: read
steps:
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions
aws-region: us-east-1
Secrets hierarchy: Organization > Repository > Environment. Never echo secrets; use add-mask for dynamic values. Prefer OIDC for cloud auth.
Runner Cost Optimization
| Runner | vCPU | RAM | Cost/min | Best For |
|---|---|---|---|---|
| 2-core | 2 | 7 GB | $0.008 | Standard tasks |
| 4-core | 4 | 16 GB | $0.016 | Build-heavy |
| 8-core | 8 | 32 GB | $0.032 | Large compilations |
| 16-core | 16 | 64 GB | $0.064 | Parallel test suites |
Monthly estimate: (runs/day) x (avg min/run) x 30 x (cost/min)
Example: 50 pushes/day x 8 min x 30 = 12,000 min x $0.008 = $96/month.
Anti-Patterns
| Anti-Pattern | Problem | Fix |
|---|---|---|
| Monolithic workflow | 45-min single workflow | Split into parallel jobs |
| No caching | Reinstall deps every run | Cache dependencies and builds |
| Secrets in logs | Leaked credentials | add-mask, avoid echo |
| No timeout | Stuck jobs burn budget | timeout-minutes on every job |
| Full matrix every push | 30-min matrix on every commit | Full nightly; reduced on push |
| No rollback plan | Stuck with broken deploy | Automate rollback in CD pipeline |
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Workflow never triggers | Wrong on: config or branch name mismatch |
Verify triggers match branching strategy |
| Cache miss every run | Volatile cache key (timestamp) | Use hashFiles() on lock files |
| Matrix fails on one OS only | Platform-specific paths or deps | Use shell: bash; install OS deps per matrix entry |
| Secret not available | Wrong environment scope | Ensure job declares correct environment: |
| Health check fails after deploy | App not started before check | Add retry loop with backoff |
| Concurrency cancels needed runs | Overly broad group key | Scope to workflow-ref; separate groups for deploy |
References
| Guide | Path |
|---|---|
| GitHub Actions Patterns | references/github-actions-patterns.md |
| Deployment Strategies | references/deployment-strategies.md |
| Agentic Workflows Guide | references/agentic-workflows-guide.md |
Integration Points
| Skill | Integration |
|---|---|
release-orchestrator |
Release workflows align with versioning and changelog |
senior-devops |
Deployment strategies complement infra automation |
senior-secops |
Security scanning steps feed SecOps dashboards |
senior-qa |
CI quality gates map to QA acceptance criteria |
incident-commander |
Rollback procedures connect to incident playbooks |
Last Updated: April 2026 Version: 1.1.0