oss-doc-audit
OSS Doc Audit
Audit public docs against live code and repo policy.
This is not a prose-polish skill. Factual correctness, active-stack alignment,
and functioning guardrails come first. Style cleanup is secondary — for
prose-level slop (emdashes, "Here's why", forced enthusiasm), hand off to the
docs-de-slopify skill after factual drift is resolved. The two skills
compose: this one decides whether a doc should exist and be accurate;
docs-de-slopify decides whether the surviving prose sounds human.
On Trigger
Start the first progress update with:
Using oss-doc-audit ...
If the repo is large, split the read-only audit into parallel concerns after the baseline scan:
- public docs surface
- API/manifest/spec surface
- workflow, release, and licensing surface
- implementation proof surface for any disputed route or payload claims
Use divide-and-conquer when you need parallel agents.
Load references/proof-checklist.md before the first full audit pass.
If the first pass found more drift than expected, load references/drift-patterns.md before the second pass.
Modes
Repo-aware audits resolve an overlay section before scanning. Load the resolved context into the environment:
SKILL_DIR="$HOME/.claude/skills/oss-doc-audit"
[[ -d "$SKILL_DIR" ]] || SKILL_DIR="$HOME/.codex/skills/oss-doc-audit"
eval "$("$SKILL_DIR/scripts/select_mode.py" "$PWD" --format shell)"
scripts/select_mode.py reads client.context.oss_doc_audit from the matching
skillbox-config/clients/{client}/overlay.yaml. No local mode files are part
of the supported contract.
If you need to create a missing client overlay before proceeding:
python3 ~/.claude/skills/skill-issue/scripts/manage_overlays.py create --client-id {CLIENT_ID} --cwd "$PWD" --json
Once resolved, prefer the loaded MODE_* variables
(MODE_ACTIVE_CODEBASE_PATH, MODE_DEPRECATED_PATHS, MODE_BASELINE_COMMANDS,
MODE_DRIFT_MARKERS, ...) over guessing. See
references/mode-template.md for the overlay key
reference.
If a matching overlay exists but lacks client.context.oss_doc_audit,
scripts/select_mode.py fails with a section-missing error. In that case:
- extend the overlay if this repo needs repeatable audits, or
- continue with explicit repo-native inference for a one-off audit
Workflow
1. Establish source of truth
Inspect the repo surfaces that define current reality:
AGENTS.mdCLAUDE.md- root
README.md - primary manifests (
pyproject.toml,package.json,Cargo.toml, etc.) - the active app entrypoint and router registration
Write down:
- active codebase path
- deprecated paths or stacks
- canonical validation commands
- current publish or licensing posture
If the repo has an explicit "active codebase" rule, treat that as binding unless the code clearly contradicts it.
Prefer repo-native guidance over guesswork:
- if
AGENTS.mdnames the active codebase, use that - if
README.mdandCLAUDE.mddisagree, treat that as a finding - if an existing validator fails or crashes, treat the validator itself as part of the audit result
2. Map the public docs surface
Inventory the docs people will actually read first:
- root
README* CONTRIBUTING*docs/.github/contributor docs and workflow docs- package
README*files - API docs, manifests, OpenAPI specs, changelogs, release notes
Separate:
- active contributor docs
- historical or archived docs
- generated specs
Do not spend time grading archived material unless it is still linked from the active surface.
3. Run existing validators before trusting them
If repo-local checks exist, run them first. Broken validators are findings.
Typical examples:
- docs hygiene scripts
- manifest or route parity checks
- OpenAPI parity checks
- package README validation
- docs CI workflows
Prefer repo-native commands over inventing new ones. If a validator points at a deprecated stack, call that out explicitly.
4. Compare docs to code
Prioritize findings that would mislead an OSS reader:
- docs that describe routes that do not exist
- docs that present
501stubs as shipped APIs - stale stack instructions after a migration
- examples that call dead scripts or dead workflow files
- response payloads that no longer match the implementation
- licensing or package metadata mismatches
- leaked private infrastructure details, local paths, or internal-only values
Search for drift with targeted greps driven by the repo's own reality:
- deprecated path names
- old stack names
- removed commands
- missing workflow files
- mismatched endpoint paths
Do not stop at the docs. Open the implementation or router file that proves the claim is wrong.
When a repo mixes active and deprecated stacks, explicitly test whether the docs-validation toolchain still points at the deprecated tree.
Treat checked-in API specs such as docs/api-reference*.yaml as active public
docs when they are part of the contributor surface.
Use references/drift-patterns.md when you need to broaden the second pass beyond generic “stale docs” language.
5. Grade OSS readiness
Use the 100-point rubric in references/rubric.md.
Start at 100 and subtract once per distinct issue cluster. Grade the repo as it is today, not as it could be after cleanup.
If any fail gate in the rubric is present, state it clearly. A repo with dead endpoint docs or broken doc validators is not "100%" ready.
Call out the difference between:
- repository readiness score
- audit workflow quality
Do not inflate the repository score just because the audit found the problems.
5b. Choose an output mode
Three output modes are supported. Pick based on the user's ask and the doc-tree size:
- Rubric mode (default) — single 100-point score + ranked cleanup queue. Use when the repo has <40 docs or the ask is "grade OSS readiness".
- Per-file scorecard mode — score every doc on five axes. Use when the repo has 40+ docs and the user needs file-by-file fate decisions (keep, rewrite, delete). See references/report-template.md for the scorecard format.
- Tier fate mode — classify every doc into Tier 1–7 (delete/rewrite → keep). Use as a companion to scorecard mode when stakeholders need a defensible "what goes, what stays" list before bulk deletion.
The modes compose: run scorecard to generate per-file scores, then bucket into tiers, then produce one rubric score for the repo overall.
Per-file scorecard axes
Each axis is scored 1 (worst) to 5 (best). Low scores on multiple axes are stronger evidence for deletion than a single low score.
| Axis | Meaning | 1 = | 5 = |
|---|---|---|---|
| Helpfulness | Does it answer a real question a reader will have? | Answers nothing real | Answers a frequent high-value question |
| Accuracy | Does it match the current codebase and stack? | Fabricated or targets dead stack | Verified against live code |
| Brevity | Is it the right length for its value? | Bloated or padded | Tight, no filler |
| Redundancy | Is it the only copy? (5 = not redundant) | Third copy of the same content | Unique source of truth |
| Necessity | Would anyone miss it if deleted? | No reader needs this | Blocks onboarding or decisions |
Tier fate taxonomy
| Tier | Label | Meaning | Default action |
|---|---|---|---|
| 1 | Harmful / fictional | Actively misleads: fabricated claims, dead-stack refs, wrong APIs | Delete or rewrite from scratch |
| 2 | Heavy slop | AI-generated filler with low signal, generic tutorials | Delete |
| 3 | Deprecated but referenced | Old content still linked from live surface | Delete + sweep links |
| 4 | Near-empty stubs | Placeholder or redirect-only pages | Delete |
| 5 | Redundant copies | Duplicate of a canonical source | Delete, keep canonical |
| 6 | Needs trimming | Useful core, padded with slop | Trim + hand to docs-de-slopify |
| 7 | Keep | Accurate, necessary, non-redundant | Leave alone |
Fabricated compliance claims, invented benchmarks, and fictional pricing are Tier 1 — not Tier 6. They mislead readers regardless of how they are written. See references/drift-patterns.md for the fabrication smell catalog.
6. Produce a ranked cleanup queue
Use references/report-template.md.
Rank by impact on OSS readers:
- incorrect docs that change behavior expectations
- broken validation or CI guardrails
- stale contributor or release workflow docs
- security, privacy, and infrastructure leakage
- style or tone cleanup
Each queue item should name:
- the problem
- the affected file(s)
- the proof file(s)
- the expected fix
- the likely score recovery
6b. Post-deletion link sweep
After executing any deletion from the cleanup queue, every remaining doc that referenced the deleted files becomes a potential broken link. Before declaring the cleanup done, sweep for dangling references.
-
For each deleted file, grep the remaining doc tree and any manifests/indexes for its basename and path:
# For a deleted file like docs/guides/compliance-legal.md rg -l 'compliance-legal' docs/ README.md SECURITY.md *.md rg -l 'compliance-legal' docs/manifest.json deploy/reverse-proxy/static/ -
Check these common reference surfaces even if nothing obvious matches:
- root
README.mdandCLAUDE.mddoc indexes docs/manifest.json,docs/index.md, or equivalent TOC files- "Related Guides" / "See also" sections in surviving docs
llms.txt,sitemap.xml, reverse-proxy static indexes- package
README.mdfiles with cross-links - security audit trackers and OSS readiness inventories
- root
-
Edit each broken link. Remove the list item rather than leaving a dead anchor. Do not "archive" the deleted file by leaving a tombstone.
-
Leave audit trackers (
OSS_HYGIENE_INVENTORY.md, audit reports) alone — they reference historical state, not live links.
This step is non-optional when deleting more than a handful of files. Stakeholders will find broken links before they find the cleanup PR.
6c. Volume report for stakeholder comms
When the cleanup involves bulk deletion (10+ files), produce a one-paragraph volume summary alongside the rubric score. Stakeholders track lines-removed more intuitively than rubric deltas.
Format:
Deleted N files (~M lines) across K categories. Largest categories:
- <category>: <n> files (worst offender: <file>)
- <category>: <n> files (worst offender: <file>)
Edited E files to fix broken index links. Repo now has F docs, all of which
are either verified useful or tracked in audit files.
Example from a real run:
Deleted 48 files (~16,100 lines) across 8 categories. Largest categories:
- Supabase/Node.js-era content: 13 files (docker/README.md)
- Fabricated compliance claims: 4 files (compliance-legal.md)
- docs-ai/ duplicates of canonical: 7 files
Edited 11 files to fix broken index links. Repo now has 119 docs.
7. Improvement loop
When the user wants iteration:
- fix the highest-ranked queue items
- rerun repo-local validators
- rerun the audit
- rerun the grade
- patch this skill if the audit missed a class of issue
If the new run still misses obvious findings, improve this skill before doing another broad cleanup pass.
When a validator changes from crash or targets deprecated stack to a clean
runtime failure, move the remaining issue from the guardrail bucket into
correctness/content drift.
Typical reasons to patch the skill after a run:
- it missed a whole drift cluster such as
501stubs documented as shipped - it trusted a broken validator without verifying its target stack
- it failed to compare docs payload examples against real response schemas
- it missed publish-surface contradictions across README, package manifest, and repo license
When the first pass finds repo-specific drift markers, add a reusable probe list to a reference file instead of relying on memory. For common proof patterns, see references/proof-checklist.md.
Output Requirements
Always include:
Score: <n>/100Fail Gates:present or noneTop Findings:ordered by severityRanked Cleanup Queue:ordered by score recovery and reader impactCompleted In This Loop:when iterating on an existing queueValidation Run:commands executed and whether they passedNext Loop:what to fix first before rerunning
If no issues are found, say so plainly and still report what you checked.
More from build000r/skills
openclaw-client-bootstrap
Build a production-ready OpenClaw client setup for DigitalOcean, Tailscale, Telegram, and SPAPS using a reusable hardened template with read-only defaults and human approval. Use for "set up OpenClaw on a droplet", "create a first claw kit", "bootstrap client box", or approval-gated OpenClaw deployment work.
20unclawg-internet
Run self-service OpenClaw onboarding with browser device auth, agent machine-key provisioning, a soul interview, and discovery-mode setup. Use for "/unclawg-internet", "set me up", "connect to openclaw", "onboard me", "sign up for openclaw", or approval-gated setup.
15domain-scaffolder-backend
|
7unclawg-discover
Run multi-platform customer discovery across Reddit, Hacker News, Twitter/X, and LinkedIn, then output a ranked engagement feed for downstream workflows. Use for "/unclawg-discover", "find customers", "find leads", "find posts to reply to", "build engagement queue", or agent-builder prospecting.
3remotion-best-practices
Best practices for Remotion - Video creation in React. Use when working with Remotion compositions, animations, sequences, or video rendering. Covers project setup for a shared Remotion hub, animation patterns, timing/interpolation, audio, captions, and media handling.
3divide-and-conquer
Decompose complex work into independent parallel sub-agents with no write overlap, synthesize or consume a `WORKGRAPH.md` execution artifact, and launch describe-style worker briefs before review. Use before spawning multiple agents for multi-file, multi-domain, or naturally parallel tasks.
3