silent-degradation-audit
Silent Degradation Audit Skill
Overview
Production-ready skill for detecting silent degradation across codebases. Uses multi-wave audit system with 6 specialized category agents, multi-agent validation panel, and convergence detection. Battle-tested on CyberGym codebase (~250 bugs found).
When to Use This Skill
Use this skill when:
- Code has reliability issues but unclear where
- Systems fail silently without operator visibility
- Error handling exists but effectiveness unknown
- Need comprehensive audit across multiple failure modes
- Preparing for production deployment
- Post-mortem analysis after silent failures
Don't use for:
- Code style or formatting issues (use linters)
- Performance optimization (use profilers)
- Security vulnerabilities (use security scanners)
- Simple one-off code reviews (use /analyze)
Key Features
Multi-Wave Progressive Audit
- Wave 1: Broad scan, finds obvious issues (40-50% of total)
- Wave 2-3: Deeper analysis, finds hidden issues (30-40%)
- Wave 4-6: Edge cases and subtleties (10-20%)
- Convergence: Stops when < 10 new findings or < 5% of Wave 1
6 Category Agents
- Dependency Failures (Category A): "What happens when X is down?"
- Config Errors (Category B): "What happens when config is wrong?"
- Background Work (Category C): "What happens when background work fails?"
- Test Effectiveness (Category D): "Do tests actually detect failures?"
- Operator Visibility (Category E): "Is the error visible to operators?"
- Functional Stubs (Category F): "Does this code actually do what its name says?"
Multi-Agent Validation Panel
- 3 agents review findings: Security, Architect, Builder
- 2/3 consensus required to validate finding
- Prevents false positives and unnecessary changes
- Tracks strong vs weak consensus
Language-Agnostic
Supports 9 languages with language-specific patterns:
- Python, JavaScript, TypeScript
- Rust, Go, Java, C#
- Ruby, PHP
Integration Modes
Standalone Invocation
Direct skill invocation for focused audit:
/silent-degradation-audit path/to/codebase
Sub-Loop in Quality Audit Workflow
Integrated as Phase 2 of quality-audit-workflow:
quality-audit-workflow calls silent-degradation-audit
→ Returns findings to quality workflow
→ Quality workflow applies fixes
→ Continues to next phase
Usage
Basic Usage
# Audit entire codebase
/silent-degradation-audit .
# Audit specific directory
/silent-degradation-audit ./src
# With custom exclusions
/silent-degradation-audit . --exclusions .my-exclusions.json
Configuration
Create .silent-degradation-config.json in codebase root:
{
"convergence": {
"absolute_threshold": 10,
"relative_threshold": 0.05
},
"max_waves": 6,
"exclusions": {
"patterns": ["*.test.js", "test_*.py", "**/__tests__/**"]
},
"categories": {
"enabled": [
"dependency-failures",
"config-errors",
"background-work",
"test-effectiveness",
"operator-visibility",
"functional-stubs"
]
}
}
Exclusion Lists
Global Exclusions
Edit ~/.amplihack/.claude/skills/silent-degradation-audit/exclusions-global.json:
[
{
"pattern": "*.test.*",
"reason": "Test files excluded from production audits",
"category": "*"
},
{
"pattern": "**/vendor/**",
"reason": "Third-party code",
"category": "*"
}
]
Repository-Specific Exclusions
Create .silent-degradation-exclusions.json in repository root:
[
{
"pattern": "src/legacy/*.py",
"reason": "Legacy code being replaced",
"category": "*",
"wave": 1
},
{
"pattern": "api/endpoints.py:42",
"reason": "Empty dict is valid API response",
"category": "functional-stubs",
"type": "exact"
}
]
Output
Report Format
Generates .silent-degradation-report.md:
# Silent Degradation Audit Report
## Summary
- **Total Waves**: 4
- **Total Findings**: 137
- **Converged**: Yes
- **Convergence Ratio**: 4.2%
## Convergence Progress
Wave 1: ██████████████████████████████████████████████████ 120
Wave 2: ███████████████████████████ 65 (54.2% of Wave 1)
Wave 3: ████████ 18 (15.0% of Wave 1)
Wave 4: ██ 5 (4.2% of Wave 1)
Status: ✓ CONVERGED
Reason: Relative threshold met: 4.2% < 5.0%
## Findings by Category
### dependency-failures (42 findings)
- High: 15
- Medium: 20
- Low: 7
[... continues for all 6 categories ...]
Findings Format
Generates .silent-degradation-findings.json:
[
{
"id": "dep-001",
"category": "dependency-failures",
"severity": "high",
"file": "src/payments.py",
"line": 89,
"description": "Payment API failure silently falls back to mock",
"impact": "Production system using mock payments, no real charges",
"visibility": "None - no logs or metrics",
"recommendation": "Add explicit failure logging and metric, or fail fast",
"wave": 1,
"validation": {
"result": "VALIDATED",
"consensus": "strong",
"votes": {
"security": "APPROVE",
"architect": "APPROVE",
"builder": "APPROVE"
}
}
},
...
]
Workflow Details
Phase 1: Initialization
- Create convergence tracker with thresholds
- Initialize exclusion manager
- Set up audit state
Phase 2: Language Detection
- Scan codebase for file extensions
- Identify languages (> 5 files or > 5% threshold)
- Load language-specific patterns
Phase 3: Load Exclusions
- Load global exclusions from skill directory
- Load repository-specific exclusions
- Merge into single exclusion list
Phase 4: Wave Loop
For each wave (until convergence):
-
Category Analysis (6 agents in parallel)
- Each agent scans for category-specific issues
- Uses language-specific patterns
- Excludes previous findings
-
Validation Panel (3 agents in parallel)
- Security agent reviews security implications
- Architect agent reviews design impact
- Builder agent reviews implementation feasibility
-
Vote Tallying
- Require 2/3 consensus (APPROVE)
- Track strong vs weak consensus
- Flag inconclusive for human review
-
Exclusion Filtering
- Apply global and repo-specific exclusions
- Filter out duplicates
-
State Update
- Add new findings to total
- Record wave metrics
-
Convergence Check
- Absolute: < 10 new findings
- Relative: < 5% of Wave 1 findings
- Break if converged
Phase 5: Report Generation
- Generate convergence plot
- Calculate metrics summary
- Categorize findings by type and severity
- Write markdown report
- Write JSON findings
Architecture
Directory Structure
.claude/skills/silent-degradation-audit/
├── SKILL.md # This file
├── reference.md # Detailed patterns and examples
├── examples.md # Usage examples
├── patterns.md # Language-specific patterns
├── README.md # Quick start
├── category_agents/ # 6 category agent definitions
│ ├── dependency-failures.md
│ ├── config-errors.md
│ ├── background-work.md
│ ├── test-effectiveness.md
│ ├── operator-visibility.md
│ └── functional-stubs.md
├── validation_panel/ # Validation panel specs
│ ├── panel-spec.md
│ └── voting-rules.md
├── recipe/ # Recipe-based workflow
│ └── audit-workflow.yaml
└── tools/ # Python utilities
├── exclusion_manager.py
├── language_detector.py
├── convergence_tracker.py
└── __init__.py
Component Responsibilities
Category Agents:
- Scan codebase for category-specific issues
- Use language-specific patterns
- Produce findings with severity, impact, recommendation
Validation Panel:
- Review findings from multiple perspectives
- Vote APPROVE/REJECT/ABSTAIN
- Require 2/3 consensus
Convergence Tracker:
- Track findings per wave
- Calculate convergence metrics
- Determine when to stop
Exclusion Manager:
- Load and merge exclusion lists
- Filter findings against patterns
- Add new exclusions
Language Detector:
- Identify languages in codebase
- Load language-specific patterns
- Support 9 languages
Best Practices
Running First Audit
- Start with small scope: Audit single service/module first
- Review Wave 1 carefully: Establishes baseline
- Tune exclusions: Add false positives to exclusion list
- Verify fixes: Test fixes before applying broadly
Exclusion Management
When to add exclusions:
- False positives (finding not actually an issue)
- Intentional design (behavior is correct as-is)
- Legacy code (not worth fixing right now)
- Third-party code (can't modify)
When NOT to add exclusions:
- Real issues you don't want to fix
- Issues without time to fix now
- Issues that seem hard
Better approach: Fix real issues, prioritize by severity.
Validation Tuning
If too many false positives:
- Review validation panel prompts
- Increase consensus threshold (require unanimous)
- Add category-specific validation rules
If missing real issues:
- Review category agent patterns
- Add language-specific patterns
- Decrease consensus threshold (1/3 approval)
Wave Management
Typical wave characteristics:
- Wave 1: 40-50% of findings (obvious issues)
- Wave 2: 25-30% (deeper issues)
- Wave 3: 15-20% (subtle issues)
- Wave 4+: < 10% each (edge cases)
If waves not converging:
- Check for duplicate findings (exclusion not working)
- Review category agent overlap (agents finding same things)
- Consider lowering convergence threshold
Metrics and Monitoring
Success Metrics
Track these over time:
Audit Success:
- Convergence reached: Yes/No
- Waves to convergence: 4 (target: 3-5)
- Total findings: 137 (varies by codebase)
- Validation rate: 75% (target: 60-80%)
Finding Distribution:
- High severity: 15% (target: < 20%)
- Medium severity: 45% (target: 40-60%)
- Low severity: 40% (target: 30-50%)
Panel Effectiveness:
- Strong consensus: 60% (target: > 50%)
- Weak consensus: 30% (target: 20-40%)
- Inconclusive: 10% (target: < 10%)
- Abstention rate: 5% (target: < 10%)
Quality Indicators
Healthy audit:
- Converges in 3-5 waves
- Validation rate 60-80%
- Strong consensus > 50%
- Abstention rate < 10%
Warning signs:
- Doesn't converge after 6 waves (agents finding same things)
- Validation rate > 95% (rubber stamping)
- Validation rate < 40% (too strict)
- Inconclusive rate > 20% (poor context)
Troubleshooting
"Audit not converging"
Symptoms: Reaches max waves without convergence
Causes:
- Category agents finding duplicate issues
- Exclusion filtering not working
- Convergence threshold too tight
Solutions:
- Review findings for duplicates
- Check exclusion patterns are matching
- Increase relative threshold to 10%
- Reduce max waves to 5
"Too many false positives"
Symptoms: Validation rate > 95%, many non-issues
Causes:
- Category agents too aggressive
- Validation panel too permissive
- Patterns not tuned for codebase
Solutions:
- Review category agent patterns
- Add exclusions for false positive patterns
- Require unanimous validation (3/3)
- Tune language-specific patterns
"Missing real issues"
Symptoms: Known issues not in findings
Causes:
- Category agent gaps
- Exclusion too broad
- Validation panel too strict
Solutions:
- Check if issue matches any category
- Review exclusion list for overly broad patterns
- Lower consensus threshold to 1/3
- Add specific patterns for missed issues
"Validation panel abstaining"
Symptoms: High abstention rate (> 20%)
Causes:
- Insufficient context in findings
- Agent prompts unclear
- Findings outside agent expertise
Solutions:
- Include more code context in findings
- Review and improve agent prompts
- Add fourth "generalist" agent
- Improve finding descriptions
Advanced Configuration
Custom Category Agents
Create custom category agent in category_agents/custom.md:
# Category Custom: My Special Cases
## Core Question
"What happens when [specific scenario]?"
## Detection Focus
[Patterns to detect...]
## Language-Specific Patterns
[Language examples...]
Then enable in config:
{
"categories": {
"enabled": [
"dependency-failures",
"config-errors",
"background-work",
"test-effectiveness",
"operator-visibility",
"functional-stubs",
"custom"
]
}
}
Custom Validation Panel
Override validation panel with different agents:
# In recipe/audit-workflow.yaml
validation_panel:
agents:
- security
- architect
- builder
- domain-expert # Add domain-specific agent
consensus:
required: 0.75 # Require 3/4 approval
Staged Rollout
Audit codebase incrementally:
# Phase 1: Critical services only
/silent-degradation-audit ./services/payments ./services/auth
# Phase 2: All services
/silent-degradation-audit ./services
# Phase 3: Full codebase
/silent-degradation-audit .
See Also
reference.md- Detailed technical referenceexamples.md- Real-world usage examplespatterns.md- Language-specific degradation patternsREADME.md- Quick start guidecategory_agents/- Individual category agent documentationvalidation_panel/- Validation panel specifications
Changelog
Version 1.0.0 (2025-02-24)
- Initial release
- 6 category agents (A-F)
- Multi-agent validation panel (2/3 consensus)
- Convergence detection (dual thresholds)
- Language-agnostic (9 languages)
- Battle-tested on CyberGym (~250 bugs)
- Integration modes: standalone + sub-loop