DevOps Workflow Engineer

The agent generates GitHub Actions workflow YAML, analyzes existing pipelines for optimization opportunities, and creates deployment plans with strategy selection, health checks, and rollback procedures.

Quick Start

# Generate a CI workflow
python scripts/workflow_generator.py --type ci --language python --test-framework pytest

# Analyze existing pipelines for optimization
python scripts/pipeline_analyzer.py .github/workflows/ --format json

# Plan a deployment strategy
python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --strategy canary

Tools Overview

Tool	Input	Output
`workflow_generator.py`	Workflow type + language	GitHub Actions YAML (ci, cd, release, security-scan, docs-check)
`pipeline_analyzer.py`	Workflow file or directory	Optimization findings, cost estimates, severity ratings
`deployment_planner.py`	Project type + environments	Deployment plan with strategy, health checks, rollback

All tools support --format json and --output for file writing.

Workflow 1: CI Pipeline Design

The agent generates pipelines following fail-fast ordering:

Lint and format (~30s) -- cheapest gate first
Unit tests (~2-5m) -- matrix across versions
Build verification (~3-8m)
Integration tests (~5-15m, parallel with build)
Security scanning (~2-5m)

jobs:
  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: make lint

  test:
    needs: lint
    strategy:
      matrix:
        python-version: ['3.10', '3.11', '3.12']
    steps:
      - uses: actions/setup-python@v5
        with: { python-version: "${{ matrix.python-version }}", cache: pip }
      - run: pip install -r requirements.txt
      - run: pytest --junitxml=results.xml

  security:
    needs: lint
    steps:
      - run: pip-audit -r requirements.txt

CI targets:

Metric	Target	Fix
Total CI time	< 10 min	Parallelize, add caching
Lint step	< 1 min	Use pre-commit locally
Unit tests	< 5 min	Split suites, use matrix
Flaky rate	< 1%	Quarantine flaky tests
Cache hit rate	> 80%	Review cache keys

Workflow 2: CD Pipeline and Multi-Environment Deployment

python scripts/deployment_planner.py --type webapp --environments dev,staging,prod --format json

Environment promotion flow:

Build -> Dev (auto) -> Staging (auto) -> Production (manual approval)
                                              |
                                        Canary (10%) -> Full rollout

Aspect	Dev	Staging	Production
Trigger	Every push	Merge to main	Manual approval
Replicas	1	2	3+ (auto-scaled)
Secrets	Repository	Environment	Vault/OIDC
Monitoring	Basic logs	Full observability	Full + alerting

Key CD rules:

Build once, deploy the same artifact everywhere
Tag artifacts with commit SHA for traceability
Use environment protection rules for production gates
Maintain rollback capability at every stage

Workflow 3: Pipeline Optimization

python scripts/pipeline_analyzer.py .github/workflows/ --format json -o report.json

The agent checks for:

Missing caching -- dependencies reinstalled every run
No timeouts -- stuck jobs burn budget
Sequential chains that could parallelize
Deprecated actions with newer versions available
Security issues -- secrets in logs, missing permissions scoping
Cost inefficiency -- oversized runners, no path filtering

Optimization techniques:

Path-based filtering -- skip CI for docs-only changes:

on:
  push:
    paths: ['src/**', 'tests/**', 'requirements*.txt']
    paths-ignore: ['docs/**', '*.md']

Concurrency cancellation -- cancel superseded runs:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

Dependency caching:

- uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-deps-${{ hashFiles('**/requirements.txt') }}

Deployment Strategies

Decision tree:

Zero-downtime required?
  No  -> Rolling deployment
  Yes -> Need instant rollback?
    No  -> Rolling with health checks
    Yes -> Budget for 2x infrastructure?
      Yes -> Blue-green
      No  -> Canary

Canary traffic split schedule:

Phase	%	Duration	Gate
1	5%	15 min	Error rate < 0.1%
2	25%	30 min	P99 latency < 200ms
3	50%	60 min	Business metrics stable
4	100%	--	Full promotion

GitHub Actions Patterns

Reusable workflows -- define once, call everywhere:

# .github/workflows/reusable-deploy.yml
on:
  workflow_call:
    inputs:
      environment: { required: true, type: string }
      image_tag: { required: true, type: string }
    secrets:
      DEPLOY_KEY: { required: true }

OIDC authentication -- no long-lived credentials:

permissions:
  id-token: write
  contents: read
steps:
  - uses: aws-actions/configure-aws-credentials@v4
    with:
      role-to-assume: arn:aws:iam::123456789:role/github-actions
      aws-region: us-east-1

Secrets hierarchy: Organization > Repository > Environment. Never echo secrets; use add-mask for dynamic values. Prefer OIDC for cloud auth.

Runner Cost Optimization

Runner	vCPU	RAM	Cost/min	Best For
2-core	2	7 GB	$0.008	Standard tasks
4-core	4	16 GB	$0.016	Build-heavy
8-core	8	32 GB	$0.032	Large compilations
16-core	16	64 GB	$0.064	Parallel test suites

Monthly estimate: (runs/day) x (avg min/run) x 30 x (cost/min) Example: 50 pushes/day x 8 min x 30 = 12,000 min x $0.008 = $96/month.

Anti-Patterns

Anti-Pattern	Problem	Fix
Monolithic workflow	45-min single workflow	Split into parallel jobs
No caching	Reinstall deps every run	Cache dependencies and builds
Secrets in logs	Leaked credentials	`add-mask`, avoid `echo`
No timeout	Stuck jobs burn budget	`timeout-minutes` on every job
Full matrix every push	30-min matrix on every commit	Full nightly; reduced on push
No rollback plan	Stuck with broken deploy	Automate rollback in CD pipeline

Troubleshooting

Problem	Cause	Solution
Workflow never triggers	Wrong `on:` config or branch name mismatch	Verify triggers match branching strategy
Cache miss every run	Volatile cache key (timestamp)	Use `hashFiles()` on lock files
Matrix fails on one OS only	Platform-specific paths or deps	Use `shell: bash`; install OS deps per matrix entry
Secret not available	Wrong environment scope	Ensure job declares correct `environment:`
Health check fails after deploy	App not started before check	Add retry loop with backoff
Concurrency cancels needed runs	Overly broad group key	Scope to `workflow-ref`; separate groups for deploy

References

Guide	Path
GitHub Actions Patterns	`references/github-actions-patterns.md`
Deployment Strategies	`references/deployment-strategies.md`
Agentic Workflows Guide	`references/agentic-workflows-guide.md`

Integration Points

Skill	Integration
`release-orchestrator`	Release workflows align with versioning and changelog
`senior-devops`	Deployment strategies complement infra automation
`senior-secops`	Security scanning steps feed SecOps dashboards
`senior-qa`	CI quality gates map to QA acceptance criteria
`incident-commander`	Rollback procedures connect to incident playbooks

Last Updated: April 2026 Version: 1.1.0

devops-workflow-engineer