QCSD Production Swarm v1.0

Post-release production health assessment and QCSD feedback loop closure.

Overview

The Production Swarm assesses release health in the live production environment using DORA metrics, incident RCA, defect prediction, and cross-phase feedback loops. It renders a HEALTHY / DEGRADED / CRITICAL decision and is the only QCSD phase with dual responsibility: assessing current production health AND closing the feedback loop back to Ideation and Refinement phases.

QCSD Phase Positioning

Phase	Swarm	Decision	When
Ideation	qcsd-ideation-swarm	GO / CONDITIONAL / NO-GO	PI/Sprint Planning
Refinement	qcsd-refinement-swarm	READY / CONDITIONAL / NOT-READY	Sprint Refinement
Development	qcsd-development-swarm	SHIP / CONDITIONAL / HOLD	During Sprint
Verification	qcsd-cicd-swarm	RELEASE / REMEDIATE / BLOCK	Pre-Release / CI-CD
Production	qcsd-production-swarm	HEALTHY / DEGRADED / CRITICAL	Post-Release

Parameters

TELEMETRY_DATA: Path to production telemetry, incident reports, and DORA metrics (required)
RELEASE_ID: Release identifier for tracking (optional)
OUTPUT_FOLDER: Where to save reports (default: ${PROJECT_ROOT}/Agentic QCSD/production/)
SLA_DEFINITIONS: Path to SLA/SLO target definitions (optional)

ENFORCEMENT RULES - READ FIRST

Rule	Enforcement
E1	You MUST spawn ALL THREE core agents in Step 2. No exceptions.
E2	You MUST put all parallel Task calls in a SINGLE message.
E3	You MUST STOP and WAIT after each batch. No proceeding early.
E4	You MUST spawn conditional agents if flags are TRUE. No skipping.
E5	You MUST apply HEALTHY/DEGRADED/CRITICAL logic exactly as specified in Step 5.
E6	You MUST generate the full report structure. No abbreviated versions.
E7	Each agent MUST read its reference files before analysis.
E8	You MUST run BOTH feedback agents in Step 8 SEQUENTIALLY. Always. Both agents.
E9	You MUST execute Step 7 learning persistence. No skipping.

PROHIBITED BEHAVIORS:

Summarizing instead of spawning agents
Skipping agents "for brevity"
Proceeding before background tasks complete
Providing your own analysis instead of spawning specialists
Omitting report sections or using placeholder text

Step Execution Protocol

This skill uses a micro-file step architecture. Each step is a self-contained file loaded one at a time to avoid "lost in the middle" context degradation.

Execute steps sequentially by reading each step file with the Read tool.

Steps

Flag Detection -- steps/01-flag-detection.md -- Retrieve CI/CD signals, detect telemetry source, evaluate all 7 flags
Core Agents -- steps/02-core-agents.md -- Spawn qe-metrics-optimizer, qe-defect-predictor, qe-root-cause-analyzer in parallel
Batch 1 Results -- steps/03-batch1-results.md -- Wait for core agents, extract all metrics
Conditional Agents -- steps/04-conditional-agents.md -- Spawn flagged conditional agents in parallel
Decision Synthesis -- steps/05-decision-synthesis.md -- Apply HEALTHY/DEGRADED/CRITICAL logic
Report Generation -- steps/06-report-generation.md -- Generate executive summary and full report
Learning Persistence -- steps/07-learning-persistence.md -- Store findings to memory, save persistence record
Feedback Loop -- steps/08-feedback-loop.md -- Run learning coordinator then transfer specialist (sequential)
Final Output -- steps/09-final-output.md -- Display completion summary with all scores

Execution Instructions

Use the Read tool to load the current step file (e.g., Read({ file_path: ".claude/skills/qcsd-production-swarm/steps/01-flag-detection.md" }))
Execute the step's instructions completely
Verify all success criteria are met before proceeding
Pass the step's output as context to the next step
If a step fails, halt and report the failure point -- do not skip ahead

Resume Support

To resume from a specific step: specify --from-step N and the orchestrator will skip to step N. Ensure you have the required prerequisite data from prior steps.

Agent Inventory

Agent	Type	Domain	Batch
qe-metrics-optimizer	Core (always)	learning-optimization	1
qe-defect-predictor	Core (always)	defect-intelligence	1
qe-root-cause-analyzer	Core (always)	defect-intelligence	1
qe-chaos-engineer	Conditional (HAS_INFRASTRUCTURE_CHANGE)	chaos-resilience	2
qe-performance-tester	Conditional (HAS_PERFORMANCE_SLA)	chaos-resilience	2
qe-regression-analyzer	Conditional (HAS_REGRESSION_RISK)	defect-intelligence	2
qe-pattern-learner	Conditional (HAS_RECURRING_INCIDENTS)	defect-intelligence	2
qe-middleware-validator	Conditional (HAS_MIDDLEWARE)	enterprise-integration	2
qe-sap-rfc-tester	Conditional (HAS_SAP_INTEGRATION)	enterprise-integration	2
qe-sod-analyzer	Conditional (HAS_AUTHORIZATION)	enterprise-integration	2
qe-learning-coordinator	Feedback (always, sequential)	learning-optimization	3
qe-transfer-specialist	Feedback (always, sequential)	learning-optimization	3

Total: 12 agents (3 core + 7 conditional + 2 feedback)

Quality Gate Thresholds

Metric	HEALTHY	DEGRADED	CRITICAL
DORA Score	>= 0.7	0.4 - 0.69	< 0.4
SLA Compliance	>= 99%	95 - 98.9%	< 95%
Incident Severity	P3/P4/NONE	P2	P0/P1
Defect Trend	declining/stable	stable (density > 2)	increasing + density > 5
RCA Completeness	>= 80%	50 - 79%	< 50%

Report Filename Mapping

Agent	Report Filename	Step
qe-metrics-optimizer	`02-dora-metrics.md`	2
qe-defect-predictor	`03-defect-prediction.md`	2
qe-root-cause-analyzer	`04-root-cause-analysis.md`	2
qe-chaos-engineer	`05-chaos-resilience.md`	4
qe-performance-tester	`06-performance-sla.md`	4
qe-regression-analyzer	`07-regression-analysis.md`	4
qe-pattern-learner	`08-pattern-analysis.md`	4
Learning Persistence	`09-learning-persistence.json`	7
qe-middleware-validator	`10-middleware-health.md`	4
qe-sap-rfc-tester	`11-sap-health.md`	4
qe-sod-analyzer	`12-sod-compliance.md`	4
Feedback agents	`13-feedback-loops.md`	8
Synthesis	`01-executive-summary.md`	6

Execution Model Options

Model	When to Use	Agent Spawn
Task Tool (PRIMARY)	Claude Code sessions	`Task({ subagent_type, run_in_background: true })`
MCP Tools	MCP server available	`fleet_init({})` / `task_submit({})`
CLI	Terminal/scripts	`swarm init` / `agent spawn`

Key Principle

Production health is measured by outcomes, not intentions. This swarm provides evidence-based production assessment and closes the QCSD feedback loop.

qcsd-production-swarm