managing-pipelines
Pipeline Management
Decision guidance for GitHub Actions CI/CD pipelines, deployment strategies, and infrastructure automation.
Contents
- When to use which pattern
- Security quick reference
- Performance quick reference
- Workflow architecture quick reference
- Deployment quick reference
- Infrastructure as code quick reference
- Observability quick reference
- Cross-pipeline conventions
- Pipeline debugging checklist
When to use which pattern
| Scenario | Reference | Why |
|---|---|---|
| Hardening against supply chain attacks | Security | SHA pinning, permissions, OIDC |
| Speeding up slow CI builds | Performance | Caching, matrix builds, concurrency |
| DRY-ing up duplicated workflow YAML | Workflow architecture | Reusable workflows vs composite actions |
| Setting up staging/production deploys | Deployment | Environment promotion, protection rules |
| Adding Terraform/OpenTofu to CI | Infrastructure | Plan-on-PR, apply-on-merge, drift detection |
| Tracking pipeline reliability | Observability | OTel, DORA metrics, SLOs |
| Reviewing a PR that modifies workflows | Security + Workflow | Permissions audit, secret exposure review |
| Debugging flaky pipelines | Observability + Performance | Metrics, cache hit rates, concurrency |
| Migrating from Jenkins/CircleCI | Workflow architecture | Action patterns, reusable workflow design |
| Setting up monorepo CI | Performance | Path filtering, selective job execution |
Security quick reference
Use for: Preventing supply chain attacks, minimizing credential exposure, hardening runner environments.
Key decisions:
- Pin all third-party actions to full commit SHAs, not tags
- Set org-level default token permissions to read-only
- Use OIDC for cloud auth instead of stored credentials
- Never use
pull_request_targetwithout understanding the security model
See references/security-hardening.md for attack patterns and mitigations.
Performance quick reference
Use for: Reducing CI times, optimizing runner costs, parallelizing builds.
Key decisions:
- Cache dependency installs AND build artifacts (not just
node_modules) - Use
fail-fast: falsefor CI matrices,truefor deployment - Set concurrency groups with
cancel-in-progress: truefor CI,falsefor deploys - Use path filtering in monorepos to skip irrelevant jobs
See references/performance-optimization.md for caching strategies and runner selection.
Workflow architecture quick reference
Use for: Structuring reusable CI/CD components, managing action dependencies.
Key decisions:
- Reusable workflows for entire pipeline templates; composite actions for shared steps
- Pass secrets explicitly, not with
secrets: inherit - Automate SHA pin updates with Dependabot or Renovate
- Restrict allowed actions at the org level
See references/workflow-architecture.md for patterns and versioning.
Deployment quick reference
Use for: Environment promotion, deployment gates, progressive delivery.
Key decisions:
- Use GitHub Environments with branch restrictions for production
- Release-based promotion gives the cleanest audit trail
- Progressive delivery (canary/blue-green) via Argo Rollouts or Flagger
- Custom deployment protection rules for SLO-gated deployments
See references/deployment-strategies.md for promotion patterns and rollback strategies.
Infrastructure as code quick reference
Use for: Terraform/OpenTofu pipelines, drift detection, policy enforcement.
Key decisions:
- Always save plan output and apply the saved plan (never plan-then-apply without
-out) - Post plan output as PR comments for review
- Segment state by functional boundary, not geography
- Run scheduled drift detection separately from code-triggered deploys
See references/infrastructure-pipelines.md for IaC workflow patterns.
Observability quick reference
Use for: Pipeline reliability tracking, incident response, capacity planning.
Key decisions:
- Instrument pipelines with OpenTelemetry (runs as traces, jobs as spans)
- Track DORA metrics: deployment frequency, lead time, change failure rate, MTTR
- Set SLOs for pipeline reliability (e.g., 99% main branch build success)
- Monitor cache hit rates and queue times as leading indicators
See references/pipeline-observability.md for instrumentation and metrics.
Cross-pipeline conventions
Workflow file naming
| Convention | Example | When |
|---|---|---|
| Trigger-based prefix | ci-test.yml, ci-lint.yml |
CI workflows |
| Deploy prefix | deploy-staging.yml, deploy-prod.yml |
Deployment workflows |
| Scheduled prefix | scheduled-drift.yml, scheduled-cleanup.yml |
Cron jobs |
| Reusable prefix | _reusable-build.yml |
Shared workflow templates |
Permissions
| Principle | Pattern |
|---|---|
| Default to read-only | Set at org/repo level, override per-job |
| Scope per job, not workflow | Each job declares only what it needs |
| OIDC over stored secrets | Short-lived tokens scoped to repo+branch+env |
| Explicit secret passing | Name each secret, avoid secrets: inherit |
Branch protection
| Rule | CI workflows | Deploy workflows |
|---|---|---|
| Required status checks | Yes | Yes |
| Require PR reviews | Yes | Yes (production) |
| Dismiss stale reviews | Yes | Yes |
| Restrict pushes | Optional | Yes (main/release branches) |
Pipeline debugging checklist
Slow CI builds
- Check cache hit rates (low = cold start overhead)
- Look for sequential jobs that could run in parallel
- Verify concurrency groups aren't queuing unnecessarily
- Check runner specs (CPU-bound work on small runners)
- Look for full-repo checkouts when sparse checkout would work
Failed deployments
- Check environment protection rule approvals
- Verify OIDC token audience and subject claims
- Check if concurrency group blocked/cancelled the run
- Review Terraform plan output for unexpected changes
- Check if deployment protection rules (Datadog, etc.) rejected
Security incidents
- Audit recent changes to workflow files and action versions
- Check for new
pull_request_targetusage - Review GITHUB_TOKEN permissions in affected workflows
- Scan for secrets in workflow logs (step outputs, artifacts)
- Check if any action SHAs were recently changed
Flaky pipelines
- Check if tests have timing dependencies (see
condition-based-waitingskill) - Look for shared state between matrix jobs
- Verify caches aren't corrupted (clear and rebuild)
- Check for rate limiting from external services
- Review runner availability (self-hosted runner capacity)
More from rileyhilliard/claude-essentials
design
Enforces precise, minimal design for dashboards and admin interfaces. Use when building SaaS UIs, data-heavy interfaces, or any product needing Jony Ive-level craft.
18writer
Writing style and tone guide for human-sounding content. Use when writing documentation, READMEs, commit messages, PR descriptions, blog posts, or any user-facing content.
17strategy-writer
Produces executive-quality strategic documents in The Economist/HBR style. Use when writing strategy memos, market analysis, business cases, customer research reports, or any document for Product, Design, and Business leaders. Customer-led, evidence-based, narrative-driven.
13executing-plans
Executes implementation plans with smart task grouping. Groups related tasks to share context, parallelizes across independent subsystems.
12refactoring-code
Improves code structure while preserving behavior through test verification. Use when cleaning up code, reducing duplication, simplifying complexity, or reorganizing modules.
12handling-errors
Prevents silent failures and context loss in error handling. Use when writing try-catch blocks, designing error propagation, reviewing catch blocks, or implementing Result patterns.
12