skills/rysweet/amplihack/silent-degradation-audit

silent-degradation-audit

SKILL.md

Silent Degradation Audit Skill

Overview

Production-ready skill for detecting silent degradation across codebases. Uses multi-wave audit system with 6 specialized category agents, multi-agent validation panel, and convergence detection. Battle-tested on CyberGym codebase (~250 bugs found).

When to Use This Skill

Use this skill when:

  • Code has reliability issues but unclear where
  • Systems fail silently without operator visibility
  • Error handling exists but effectiveness unknown
  • Need comprehensive audit across multiple failure modes
  • Preparing for production deployment
  • Post-mortem analysis after silent failures

Don't use for:

  • Code style or formatting issues (use linters)
  • Performance optimization (use profilers)
  • Security vulnerabilities (use security scanners)
  • Simple one-off code reviews (use /analyze)

Key Features

Multi-Wave Progressive Audit

  • Wave 1: Broad scan, finds obvious issues (40-50% of total)
  • Wave 2-3: Deeper analysis, finds hidden issues (30-40%)
  • Wave 4-6: Edge cases and subtleties (10-20%)
  • Convergence: Stops when < 10 new findings or < 5% of Wave 1

6 Category Agents

  1. Dependency Failures (Category A): "What happens when X is down?"
  2. Config Errors (Category B): "What happens when config is wrong?"
  3. Background Work (Category C): "What happens when background work fails?"
  4. Test Effectiveness (Category D): "Do tests actually detect failures?"
  5. Operator Visibility (Category E): "Is the error visible to operators?"
  6. Functional Stubs (Category F): "Does this code actually do what its name says?"

Multi-Agent Validation Panel

  • 3 agents review findings: Security, Architect, Builder
  • 2/3 consensus required to validate finding
  • Prevents false positives and unnecessary changes
  • Tracks strong vs weak consensus

Language-Agnostic

Supports 9 languages with language-specific patterns:

  • Python, JavaScript, TypeScript
  • Rust, Go, Java, C#
  • Ruby, PHP

Integration Modes

Standalone Invocation

Direct skill invocation for focused audit:

/silent-degradation-audit path/to/codebase

Sub-Loop in Quality Audit Workflow

Integrated as Phase 2 of quality-audit-workflow:

quality-audit-workflow calls silent-degradation-audit
→ Returns findings to quality workflow
→ Quality workflow applies fixes
→ Continues to next phase

Usage

Basic Usage

# Audit entire codebase
/silent-degradation-audit .

# Audit specific directory
/silent-degradation-audit ./src

# With custom exclusions
/silent-degradation-audit . --exclusions .my-exclusions.json

Configuration

Create .silent-degradation-config.json in codebase root:

{
  "convergence": {
    "absolute_threshold": 10,
    "relative_threshold": 0.05
  },
  "max_waves": 6,
  "exclusions": {
    "patterns": ["*.test.js", "test_*.py", "**/__tests__/**"]
  },
  "categories": {
    "enabled": [
      "dependency-failures",
      "config-errors",
      "background-work",
      "test-effectiveness",
      "operator-visibility",
      "functional-stubs"
    ]
  }
}

Exclusion Lists

Global Exclusions

Edit ~/.amplihack/.claude/skills/silent-degradation-audit/exclusions-global.json:

[
  {
    "pattern": "*.test.*",
    "reason": "Test files excluded from production audits",
    "category": "*"
  },
  {
    "pattern": "**/vendor/**",
    "reason": "Third-party code",
    "category": "*"
  }
]

Repository-Specific Exclusions

Create .silent-degradation-exclusions.json in repository root:

[
  {
    "pattern": "src/legacy/*.py",
    "reason": "Legacy code being replaced",
    "category": "*",
    "wave": 1
  },
  {
    "pattern": "api/endpoints.py:42",
    "reason": "Empty dict is valid API response",
    "category": "functional-stubs",
    "type": "exact"
  }
]

Output

Report Format

Generates .silent-degradation-report.md:

# Silent Degradation Audit Report

## Summary

- **Total Waves**: 4
- **Total Findings**: 137
- **Converged**: Yes
- **Convergence Ratio**: 4.2%

## Convergence Progress

Wave 1: ██████████████████████████████████████████████████ 120
Wave 2: ███████████████████████████ 65 (54.2% of Wave 1)
Wave 3: ████████ 18 (15.0% of Wave 1)
Wave 4: ██ 5 (4.2% of Wave 1)

Status: ✓ CONVERGED
Reason: Relative threshold met: 4.2% < 5.0%

## Findings by Category

### dependency-failures (42 findings)

- High: 15
- Medium: 20
- Low: 7

[... continues for all 6 categories ...]

Findings Format

Generates .silent-degradation-findings.json:

[
  {
    "id": "dep-001",
    "category": "dependency-failures",
    "severity": "high",
    "file": "src/payments.py",
    "line": 89,
    "description": "Payment API failure silently falls back to mock",
    "impact": "Production system using mock payments, no real charges",
    "visibility": "None - no logs or metrics",
    "recommendation": "Add explicit failure logging and metric, or fail fast",
    "wave": 1,
    "validation": {
      "result": "VALIDATED",
      "consensus": "strong",
      "votes": {
        "security": "APPROVE",
        "architect": "APPROVE",
        "builder": "APPROVE"
      }
    }
  },
  ...
]

Workflow Details

Phase 1: Initialization

  1. Create convergence tracker with thresholds
  2. Initialize exclusion manager
  3. Set up audit state

Phase 2: Language Detection

  1. Scan codebase for file extensions
  2. Identify languages (> 5 files or > 5% threshold)
  3. Load language-specific patterns

Phase 3: Load Exclusions

  1. Load global exclusions from skill directory
  2. Load repository-specific exclusions
  3. Merge into single exclusion list

Phase 4: Wave Loop

For each wave (until convergence):

  1. Category Analysis (6 agents in parallel)

    • Each agent scans for category-specific issues
    • Uses language-specific patterns
    • Excludes previous findings
  2. Validation Panel (3 agents in parallel)

    • Security agent reviews security implications
    • Architect agent reviews design impact
    • Builder agent reviews implementation feasibility
  3. Vote Tallying

    • Require 2/3 consensus (APPROVE)
    • Track strong vs weak consensus
    • Flag inconclusive for human review
  4. Exclusion Filtering

    • Apply global and repo-specific exclusions
    • Filter out duplicates
  5. State Update

    • Add new findings to total
    • Record wave metrics
  6. Convergence Check

    • Absolute: < 10 new findings
    • Relative: < 5% of Wave 1 findings
    • Break if converged

Phase 5: Report Generation

  1. Generate convergence plot
  2. Calculate metrics summary
  3. Categorize findings by type and severity
  4. Write markdown report
  5. Write JSON findings

Architecture

Directory Structure

.claude/skills/silent-degradation-audit/
├── SKILL.md                    # This file
├── reference.md                # Detailed patterns and examples
├── examples.md                 # Usage examples
├── patterns.md                 # Language-specific patterns
├── README.md                   # Quick start
├── category_agents/            # 6 category agent definitions
│   ├── dependency-failures.md
│   ├── config-errors.md
│   ├── background-work.md
│   ├── test-effectiveness.md
│   ├── operator-visibility.md
│   └── functional-stubs.md
├── validation_panel/           # Validation panel specs
│   ├── panel-spec.md
│   └── voting-rules.md
├── recipe/                     # Recipe-based workflow
│   └── audit-workflow.yaml
└── tools/                      # Python utilities
    ├── exclusion_manager.py
    ├── language_detector.py
    ├── convergence_tracker.py
    └── __init__.py

Component Responsibilities

Category Agents:

  • Scan codebase for category-specific issues
  • Use language-specific patterns
  • Produce findings with severity, impact, recommendation

Validation Panel:

  • Review findings from multiple perspectives
  • Vote APPROVE/REJECT/ABSTAIN
  • Require 2/3 consensus

Convergence Tracker:

  • Track findings per wave
  • Calculate convergence metrics
  • Determine when to stop

Exclusion Manager:

  • Load and merge exclusion lists
  • Filter findings against patterns
  • Add new exclusions

Language Detector:

  • Identify languages in codebase
  • Load language-specific patterns
  • Support 9 languages

Best Practices

Running First Audit

  1. Start with small scope: Audit single service/module first
  2. Review Wave 1 carefully: Establishes baseline
  3. Tune exclusions: Add false positives to exclusion list
  4. Verify fixes: Test fixes before applying broadly

Exclusion Management

When to add exclusions:

  • False positives (finding not actually an issue)
  • Intentional design (behavior is correct as-is)
  • Legacy code (not worth fixing right now)
  • Third-party code (can't modify)

When NOT to add exclusions:

  • Real issues you don't want to fix
  • Issues without time to fix now
  • Issues that seem hard

Better approach: Fix real issues, prioritize by severity.

Validation Tuning

If too many false positives:

  • Review validation panel prompts
  • Increase consensus threshold (require unanimous)
  • Add category-specific validation rules

If missing real issues:

  • Review category agent patterns
  • Add language-specific patterns
  • Decrease consensus threshold (1/3 approval)

Wave Management

Typical wave characteristics:

  • Wave 1: 40-50% of findings (obvious issues)
  • Wave 2: 25-30% (deeper issues)
  • Wave 3: 15-20% (subtle issues)
  • Wave 4+: < 10% each (edge cases)

If waves not converging:

  • Check for duplicate findings (exclusion not working)
  • Review category agent overlap (agents finding same things)
  • Consider lowering convergence threshold

Metrics and Monitoring

Success Metrics

Track these over time:

Audit Success:
- Convergence reached: Yes/No
- Waves to convergence: 4 (target: 3-5)
- Total findings: 137 (varies by codebase)
- Validation rate: 75% (target: 60-80%)

Finding Distribution:
- High severity: 15% (target: < 20%)
- Medium severity: 45% (target: 40-60%)
- Low severity: 40% (target: 30-50%)

Panel Effectiveness:
- Strong consensus: 60% (target: > 50%)
- Weak consensus: 30% (target: 20-40%)
- Inconclusive: 10% (target: < 10%)
- Abstention rate: 5% (target: < 10%)

Quality Indicators

Healthy audit:

  • Converges in 3-5 waves
  • Validation rate 60-80%
  • Strong consensus > 50%
  • Abstention rate < 10%

Warning signs:

  • Doesn't converge after 6 waves (agents finding same things)
  • Validation rate > 95% (rubber stamping)
  • Validation rate < 40% (too strict)
  • Inconclusive rate > 20% (poor context)

Troubleshooting

"Audit not converging"

Symptoms: Reaches max waves without convergence

Causes:

  • Category agents finding duplicate issues
  • Exclusion filtering not working
  • Convergence threshold too tight

Solutions:

  1. Review findings for duplicates
  2. Check exclusion patterns are matching
  3. Increase relative threshold to 10%
  4. Reduce max waves to 5

"Too many false positives"

Symptoms: Validation rate > 95%, many non-issues

Causes:

  • Category agents too aggressive
  • Validation panel too permissive
  • Patterns not tuned for codebase

Solutions:

  1. Review category agent patterns
  2. Add exclusions for false positive patterns
  3. Require unanimous validation (3/3)
  4. Tune language-specific patterns

"Missing real issues"

Symptoms: Known issues not in findings

Causes:

  • Category agent gaps
  • Exclusion too broad
  • Validation panel too strict

Solutions:

  1. Check if issue matches any category
  2. Review exclusion list for overly broad patterns
  3. Lower consensus threshold to 1/3
  4. Add specific patterns for missed issues

"Validation panel abstaining"

Symptoms: High abstention rate (> 20%)

Causes:

  • Insufficient context in findings
  • Agent prompts unclear
  • Findings outside agent expertise

Solutions:

  1. Include more code context in findings
  2. Review and improve agent prompts
  3. Add fourth "generalist" agent
  4. Improve finding descriptions

Advanced Configuration

Custom Category Agents

Create custom category agent in category_agents/custom.md:

# Category Custom: My Special Cases

## Core Question

"What happens when [specific scenario]?"

## Detection Focus

[Patterns to detect...]

## Language-Specific Patterns

[Language examples...]

Then enable in config:

{
  "categories": {
    "enabled": [
      "dependency-failures",
      "config-errors",
      "background-work",
      "test-effectiveness",
      "operator-visibility",
      "functional-stubs",
      "custom"
    ]
  }
}

Custom Validation Panel

Override validation panel with different agents:

# In recipe/audit-workflow.yaml
validation_panel:
  agents:
    - security
    - architect
    - builder
    - domain-expert # Add domain-specific agent

  consensus:
    required: 0.75 # Require 3/4 approval

Staged Rollout

Audit codebase incrementally:

# Phase 1: Critical services only
/silent-degradation-audit ./services/payments ./services/auth

# Phase 2: All services
/silent-degradation-audit ./services

# Phase 3: Full codebase
/silent-degradation-audit .

See Also

  • reference.md - Detailed technical reference
  • examples.md - Real-world usage examples
  • patterns.md - Language-specific degradation patterns
  • README.md - Quick start guide
  • category_agents/ - Individual category agent documentation
  • validation_panel/ - Validation panel specifications

Changelog

Version 1.0.0 (2025-02-24)

  • Initial release
  • 6 category agents (A-F)
  • Multi-agent validation panel (2/3 consensus)
  • Convergence detection (dual thresholds)
  • Language-agnostic (9 languages)
  • Battle-tested on CyberGym (~250 bugs)
  • Integration modes: standalone + sub-loop
Weekly Installs
20
GitHub Stars
32
First Seen
12 days ago
Installed on
opencode20
gemini-cli20
github-copilot20
amp20
cline20
codex20