documentation-audit
Documentation Audit
Systematically verify claims in documentation against the actual codebase using a two-pass approach.
Overview
Core principle: Low recall is worse than false positives—missed claims stay invisible.
Two-pass process:
- Pass 1: Extract and verify claims directly from docs
- Pass 2A: Expand patterns from false claims to find similar issues
- Pass 2B: Compare codebase inventory vs documented items (gap detection)
Quick Start
- Identify target docs (user-facing only, skip
plans/,audits/) - Note current git commit for report header
- Run Pass 1 extraction using parallel agents (one per doc)
- Analyze false claims for patterns
- Run Pass 2 expansion searches
- Generate
docs/audits/AUDIT_REPORT_YYYY-MM-DD.md
Claim Types
| Type | Example | Verification |
|---|---|---|
file_ref |
scripts/foo.py |
File exists? |
config_default |
"defaults to 'AI Radio'" | Check schema/code |
env_var |
STATION_NAME |
In .env.example + code? |
cli_command |
--normalize flag |
Script supports it? |
behavior |
"runs every 2 minutes" | Check timers/code |
Verification confidence:
- Tier 1 (auto): file_ref, config_default, env_var, cli_command
- Tier 2 (semi-auto): symbol_ref, version_req
- Tier 3 (human review): behavior, constraint
Pass 2 Pattern Expansion
After Pass 1, analyze false claims and search for similar patterns:
Dead script found: diagnose_track_selection.py
→ Search: all script references → Found 8 more dead scripts
Wrong interval: "every 10 seconds"
→ Search: "every \d+ (seconds?|minutes?)" → Found 3 more
Wrong service name: ai-radio-break-gen.service
→ Search: service/timer names → Found naming inconsistencies
Common patterns to always check:
- Dead scripts:
scripts/*.pyreferences - Timer intervals:
every \d+ (seconds?|minutes?) - Service names:
ai-radio-*.service,*.timer - Config vars:
RADIO_*environment variables - CLI flags:
--flagpatterns in bash blocks
Output Format
Generate docs/audits/AUDIT_REPORT_YYYY-MM-DD.md:
# Documentation Audit Report
Generated: YYYY-MM-DD | Commit: abc123
## Executive Summary
| Metric | Count |
|--------|-------|
| Documents scanned | 12 |
| Claims verified | ~180 |
| Verified TRUE | ~145 (81%) |
| **Verified FALSE** | **31 (17%)** |
## False Claims Requiring Fixes
### CONFIGURATION.md
| Line | Claim | Reality | Fix |
|------|-------|---------|-----|
| 135 | `claude-sonnet-4-5` | Actual: `claude-3-5-sonnet-latest` | Update |
## Pattern Summary
| Pattern | Count | Root Cause |
|---------|-------|------------|
| Dead scripts | 9 | Scripts deleted, docs not updated |
## Human Review Queue
- [ ] Line 436: behavior claim needs verification
Detailed References
For execution checklist and anti-patterns: checklist.md For claim extraction patterns: extraction-patterns.md
More from 2389-research/claude-plugins
binary-re:triage
Use when first encountering an unknown binary, ELF file, executable, or firmware blob. Fast fingerprinting via rabin2 - architecture detection (ARM, x86, MIPS), ABI identification, dependency mapping, string extraction. Keywords - "what is this binary", "identify architecture", "check file type", "rabin2", "file analysis", "quick scan
14judge
Scoring framework for speed-run showdown and any-percent. Invoked at Phase 4 to evaluate implementations using 5-criteria scoring. Do not invoke directly - called by showdown/any-percent.
12binary-re:synthesis
Use when ready to document findings, generate a report, or summarize binary analysis results. Compiles analysis findings into structured reports - correlates facts from triage/static/dynamic phases, validates hypotheses, generates documentation with evidence chains. Keywords - "summarize findings", "generate report", "document analysis", "what did we find", "write up results", "export findings
12product-launcher:product-page-builder
Use when creating or updating a product page for the 2389.ai website. Triggers on "product page", "add to site", "write up for the site", "create product entry".
3deliberation:gathered
Use when user has a stake or perspective in a decision and wants to participate in discernment rather than receive advice - facilitates user alongside agent voices with participatory discipline teaching
3review
Use when auditing a codebase for openness, checking for secrets, missing docs, or gaps before public release. Re-runnable at any point.
3