qcsd-production-swarm

SKILL.md

QCSD Production Swarm v1.0

Post-release production health assessment and QCSD feedback loop closure.


Overview

The Production Swarm assesses release health in the live production environment using DORA metrics, incident RCA, defect prediction, and cross-phase feedback loops. It renders a HEALTHY / DEGRADED / CRITICAL decision and is the only QCSD phase with dual responsibility: assessing current production health AND closing the feedback loop back to Ideation and Refinement phases.

QCSD Phase Positioning

Phase Swarm Decision When
Ideation qcsd-ideation-swarm GO / CONDITIONAL / NO-GO PI/Sprint Planning
Refinement qcsd-refinement-swarm READY / CONDITIONAL / NOT-READY Sprint Refinement
Development qcsd-development-swarm SHIP / CONDITIONAL / HOLD During Sprint
Verification qcsd-cicd-swarm RELEASE / REMEDIATE / BLOCK Pre-Release / CI-CD
Production qcsd-production-swarm HEALTHY / DEGRADED / CRITICAL Post-Release

Parameters

  • TELEMETRY_DATA: Path to production telemetry, incident reports, and DORA metrics (required)
  • RELEASE_ID: Release identifier for tracking (optional)
  • OUTPUT_FOLDER: Where to save reports (default: ${PROJECT_ROOT}/Agentic QCSD/production/)
  • SLA_DEFINITIONS: Path to SLA/SLO target definitions (optional)

ENFORCEMENT RULES - READ FIRST

Rule Enforcement
E1 You MUST spawn ALL THREE core agents in Step 2. No exceptions.
E2 You MUST put all parallel Task calls in a SINGLE message.
E3 You MUST STOP and WAIT after each batch. No proceeding early.
E4 You MUST spawn conditional agents if flags are TRUE. No skipping.
E5 You MUST apply HEALTHY/DEGRADED/CRITICAL logic exactly as specified in Step 5.
E6 You MUST generate the full report structure. No abbreviated versions.
E7 Each agent MUST read its reference files before analysis.
E8 You MUST run BOTH feedback agents in Step 8 SEQUENTIALLY. Always. Both agents.
E9 You MUST execute Step 7 learning persistence. No skipping.

PROHIBITED BEHAVIORS:

  • Summarizing instead of spawning agents
  • Skipping agents "for brevity"
  • Proceeding before background tasks complete
  • Providing your own analysis instead of spawning specialists
  • Omitting report sections or using placeholder text

Step Execution Protocol

This skill uses a micro-file step architecture. Each step is a self-contained file loaded one at a time to avoid "lost in the middle" context degradation.

Execute steps sequentially by reading each step file with the Read tool.

Steps

  1. Flag Detection -- steps/01-flag-detection.md -- Retrieve CI/CD signals, detect telemetry source, evaluate all 7 flags
  2. Core Agents -- steps/02-core-agents.md -- Spawn qe-metrics-optimizer, qe-defect-predictor, qe-root-cause-analyzer in parallel
  3. Batch 1 Results -- steps/03-batch1-results.md -- Wait for core agents, extract all metrics
  4. Conditional Agents -- steps/04-conditional-agents.md -- Spawn flagged conditional agents in parallel
  5. Decision Synthesis -- steps/05-decision-synthesis.md -- Apply HEALTHY/DEGRADED/CRITICAL logic
  6. Report Generation -- steps/06-report-generation.md -- Generate executive summary and full report
  7. Learning Persistence -- steps/07-learning-persistence.md -- Store findings to memory, save persistence record
  8. Feedback Loop -- steps/08-feedback-loop.md -- Run learning coordinator then transfer specialist (sequential)
  9. Final Output -- steps/09-final-output.md -- Display completion summary with all scores

Execution Instructions

  1. Use the Read tool to load the current step file (e.g., Read({ file_path: ".claude/skills/qcsd-production-swarm/steps/01-flag-detection.md" }))
  2. Execute the step's instructions completely
  3. Verify all success criteria are met before proceeding
  4. Pass the step's output as context to the next step
  5. If a step fails, halt and report the failure point -- do not skip ahead

Resume Support

To resume from a specific step: specify --from-step N and the orchestrator will skip to step N. Ensure you have the required prerequisite data from prior steps.


Agent Inventory

Agent Type Domain Batch
qe-metrics-optimizer Core (always) learning-optimization 1
qe-defect-predictor Core (always) defect-intelligence 1
qe-root-cause-analyzer Core (always) defect-intelligence 1
qe-chaos-engineer Conditional (HAS_INFRASTRUCTURE_CHANGE) chaos-resilience 2
qe-performance-tester Conditional (HAS_PERFORMANCE_SLA) chaos-resilience 2
qe-regression-analyzer Conditional (HAS_REGRESSION_RISK) defect-intelligence 2
qe-pattern-learner Conditional (HAS_RECURRING_INCIDENTS) defect-intelligence 2
qe-middleware-validator Conditional (HAS_MIDDLEWARE) enterprise-integration 2
qe-sap-rfc-tester Conditional (HAS_SAP_INTEGRATION) enterprise-integration 2
qe-sod-analyzer Conditional (HAS_AUTHORIZATION) enterprise-integration 2
qe-learning-coordinator Feedback (always, sequential) learning-optimization 3
qe-transfer-specialist Feedback (always, sequential) learning-optimization 3

Total: 12 agents (3 core + 7 conditional + 2 feedback)


Quality Gate Thresholds

Metric HEALTHY DEGRADED CRITICAL
DORA Score >= 0.7 0.4 - 0.69 < 0.4
SLA Compliance >= 99% 95 - 98.9% < 95%
Incident Severity P3/P4/NONE P2 P0/P1
Defect Trend declining/stable stable (density > 2) increasing + density > 5
RCA Completeness >= 80% 50 - 79% < 50%

Report Filename Mapping

Agent Report Filename Step
qe-metrics-optimizer 02-dora-metrics.md 2
qe-defect-predictor 03-defect-prediction.md 2
qe-root-cause-analyzer 04-root-cause-analysis.md 2
qe-chaos-engineer 05-chaos-resilience.md 4
qe-performance-tester 06-performance-sla.md 4
qe-regression-analyzer 07-regression-analysis.md 4
qe-pattern-learner 08-pattern-analysis.md 4
Learning Persistence 09-learning-persistence.json 7
qe-middleware-validator 10-middleware-health.md 4
qe-sap-rfc-tester 11-sap-health.md 4
qe-sod-analyzer 12-sod-compliance.md 4
Feedback agents 13-feedback-loops.md 8
Synthesis 01-executive-summary.md 6

Execution Model Options

Model When to Use Agent Spawn
Task Tool (PRIMARY) Claude Code sessions Task({ subagent_type, run_in_background: true })
MCP Tools MCP server available fleet_init({}) / task_submit({})
CLI Terminal/scripts swarm init / agent spawn

Key Principle

Production health is measured by outcomes, not intentions. This swarm provides evidence-based production assessment and closes the QCSD feedback loop.

Weekly Installs
30
GitHub Stars
254
First Seen
Feb 24, 2026
Installed on
opencode30
gemini-cli30
github-copilot30
codex30
amp30
kimi-cli30