oss-doc-audit

Installation

SKILL.md

OSS Doc Audit

Audit public docs against live code and repo policy.

This is not a prose-polish skill. Factual correctness, active-stack alignment, and functioning guardrails come first. Style cleanup is secondary — for prose-level slop (emdashes, "Here's why", forced enthusiasm), hand off to the docs-de-slopify skill after factual drift is resolved. The two skills compose: this one decides whether a doc should exist and be accurate; docs-de-slopify decides whether the surviving prose sounds human.

On Trigger

Start the first progress update with:

Using oss-doc-audit ...

If the repo is large, split the read-only audit into parallel concerns after the baseline scan:

public docs surface
API/manifest/spec surface
workflow, release, and licensing surface
implementation proof surface for any disputed route or payload claims

Use divide-and-conquer when you need parallel agents.

Load references/proof-checklist.md before the first full audit pass.

If the first pass found more drift than expected, load references/drift-patterns.md before the second pass.

Modes

Repo-aware audits resolve an overlay section before scanning. Load the resolved context into the environment:

SKILL_DIR="$HOME/.claude/skills/oss-doc-audit"
[[ -d "$SKILL_DIR" ]] || SKILL_DIR="$HOME/.codex/skills/oss-doc-audit"
eval "$("$SKILL_DIR/scripts/select_mode.py" "$PWD" --format shell)"

scripts/select_mode.py reads client.context.oss_doc_audit from the matching skillbox-config/clients/{client}/overlay.yaml. No local mode files are part of the supported contract.

If you need to create a missing client overlay before proceeding:

python3 ~/.claude/skills/skill-issue/scripts/manage_overlays.py create --client-id {CLIENT_ID} --cwd "$PWD" --json

Once resolved, prefer the loaded MODE_* variables (MODE_ACTIVE_CODEBASE_PATH, MODE_DEPRECATED_PATHS, MODE_BASELINE_COMMANDS, MODE_DRIFT_MARKERS, ...) over guessing. See references/mode-template.md for the overlay key reference.

If a matching overlay exists but lacks client.context.oss_doc_audit, scripts/select_mode.py fails with a section-missing error. In that case:

extend the overlay if this repo needs repeatable audits, or
continue with explicit repo-native inference for a one-off audit

Workflow

1. Establish source of truth

Inspect the repo surfaces that define current reality:

AGENTS.md
CLAUDE.md
root README.md
primary manifests (pyproject.toml, package.json, Cargo.toml, etc.)
the active app entrypoint and router registration

Write down:

active codebase path
deprecated paths or stacks
canonical validation commands
current publish or licensing posture

If the repo has an explicit "active codebase" rule, treat that as binding unless the code clearly contradicts it.

Prefer repo-native guidance over guesswork:

if AGENTS.md names the active codebase, use that
if README.md and CLAUDE.md disagree, treat that as a finding
if an existing validator fails or crashes, treat the validator itself as part of the audit result

2. Map the public docs surface

Inventory the docs people will actually read first:

root README*
CONTRIBUTING*
docs/
.github/ contributor docs and workflow docs
package README* files
API docs, manifests, OpenAPI specs, changelogs, release notes

Separate:

active contributor docs
historical or archived docs
generated specs

Do not spend time grading archived material unless it is still linked from the active surface.

3. Run existing validators before trusting them

If repo-local checks exist, run them first. Broken validators are findings.

Typical examples:

docs hygiene scripts
manifest or route parity checks
OpenAPI parity checks
package README validation
docs CI workflows

Prefer repo-native commands over inventing new ones. If a validator points at a deprecated stack, call that out explicitly.

4. Compare docs to code

Prioritize findings that would mislead an OSS reader:

docs that describe routes that do not exist
docs that present 501 stubs as shipped APIs
stale stack instructions after a migration
examples that call dead scripts or dead workflow files
response payloads that no longer match the implementation
licensing or package metadata mismatches
leaked private infrastructure details, local paths, or internal-only values

Search for drift with targeted greps driven by the repo's own reality:

deprecated path names
old stack names
removed commands
missing workflow files
mismatched endpoint paths

Do not stop at the docs. Open the implementation or router file that proves the claim is wrong.

When a repo mixes active and deprecated stacks, explicitly test whether the docs-validation toolchain still points at the deprecated tree.

Treat checked-in API specs such as docs/api-reference*.yaml as active public docs when they are part of the contributor surface.

Use references/drift-patterns.md when you need to broaden the second pass beyond generic “stale docs” language.

5. Grade OSS readiness

Use the 100-point rubric in references/rubric.md.

Start at 100 and subtract once per distinct issue cluster. Grade the repo as it is today, not as it could be after cleanup.

If any fail gate in the rubric is present, state it clearly. A repo with dead endpoint docs or broken doc validators is not "100%" ready.

Call out the difference between:

repository readiness score
audit workflow quality

Do not inflate the repository score just because the audit found the problems.

5b. Choose an output mode

Three output modes are supported. Pick based on the user's ask and the doc-tree size:

Rubric mode (default) — single 100-point score + ranked cleanup queue. Use when the repo has <40 docs or the ask is "grade OSS readiness".
Per-file scorecard mode — score every doc on five axes. Use when the repo has 40+ docs and the user needs file-by-file fate decisions (keep, rewrite, delete). See references/report-template.md for the scorecard format.
Tier fate mode — classify every doc into Tier 1–7 (delete/rewrite → keep). Use as a companion to scorecard mode when stakeholders need a defensible "what goes, what stays" list before bulk deletion.

The modes compose: run scorecard to generate per-file scores, then bucket into tiers, then produce one rubric score for the repo overall.

Per-file scorecard axes

Each axis is scored 1 (worst) to 5 (best). Low scores on multiple axes are stronger evidence for deletion than a single low score.

Axis	Meaning	1 =	5 =
Helpfulness	Does it answer a real question a reader will have?	Answers nothing real	Answers a frequent high-value question
Accuracy	Does it match the current codebase and stack?	Fabricated or targets dead stack	Verified against live code
Brevity	Is it the right length for its value?	Bloated or padded	Tight, no filler
Redundancy	Is it the only copy? (5 = not redundant)	Third copy of the same content	Unique source of truth
Necessity	Would anyone miss it if deleted?	No reader needs this	Blocks onboarding or decisions

Tier fate taxonomy

Tier	Label	Meaning	Default action
1	Harmful / fictional	Actively misleads: fabricated claims, dead-stack refs, wrong APIs	Delete or rewrite from scratch
2	Heavy slop	AI-generated filler with low signal, generic tutorials	Delete
3	Deprecated but referenced	Old content still linked from live surface	Delete + sweep links
4	Near-empty stubs	Placeholder or redirect-only pages	Delete
5	Redundant copies	Duplicate of a canonical source	Delete, keep canonical
6	Needs trimming	Useful core, padded with slop	Trim + hand to `docs-de-slopify`
7	Keep	Accurate, necessary, non-redundant	Leave alone

Fabricated compliance claims, invented benchmarks, and fictional pricing are Tier 1 — not Tier 6. They mislead readers regardless of how they are written. See references/drift-patterns.md for the fabrication smell catalog.

6. Produce a ranked cleanup queue

Use references/report-template.md.

Rank by impact on OSS readers:

incorrect docs that change behavior expectations
broken validation or CI guardrails
stale contributor or release workflow docs
security, privacy, and infrastructure leakage
style or tone cleanup

Each queue item should name:

the problem
the affected file(s)
the proof file(s)
the expected fix
the likely score recovery

6b. Post-deletion link sweep

After executing any deletion from the cleanup queue, every remaining doc that referenced the deleted files becomes a potential broken link. Before declaring the cleanup done, sweep for dangling references.

For each deleted file, grep the remaining doc tree and any manifests/indexes for its basename and path:

# For a deleted file like docs/guides/compliance-legal.md
rg -l 'compliance-legal' docs/ README.md SECURITY.md *.md
rg -l 'compliance-legal' docs/manifest.json deploy/reverse-proxy/static/

Check these common reference surfaces even if nothing obvious matches:
- root README.md and CLAUDE.md doc indexes
- docs/manifest.json, docs/index.md, or equivalent TOC files
- "Related Guides" / "See also" sections in surviving docs
- llms.txt, sitemap.xml, reverse-proxy static indexes
- package README.md files with cross-links
- security audit trackers and OSS readiness inventories
Edit each broken link. Remove the list item rather than leaving a dead anchor. Do not "archive" the deleted file by leaving a tombstone.
Leave audit trackers (OSS_HYGIENE_INVENTORY.md, audit reports) alone — they reference historical state, not live links.

This step is non-optional when deleting more than a handful of files. Stakeholders will find broken links before they find the cleanup PR.

6c. Volume report for stakeholder comms

When the cleanup involves bulk deletion (10+ files), produce a one-paragraph volume summary alongside the rubric score. Stakeholders track lines-removed more intuitively than rubric deltas.

Format:

Deleted N files (~M lines) across K categories. Largest categories:
- <category>: <n> files (worst offender: <file>)
- <category>: <n> files (worst offender: <file>)
Edited E files to fix broken index links. Repo now has F docs, all of which
are either verified useful or tracked in audit files.

Example from a real run:

Deleted 48 files (~16,100 lines) across 8 categories. Largest categories:
- Supabase/Node.js-era content: 13 files (docker/README.md)
- Fabricated compliance claims: 4 files (compliance-legal.md)
- docs-ai/ duplicates of canonical: 7 files
Edited 11 files to fix broken index links. Repo now has 119 docs.

7. Improvement loop

When the user wants iteration:

fix the highest-ranked queue items
rerun repo-local validators
rerun the audit
rerun the grade
patch this skill if the audit missed a class of issue

If the new run still misses obvious findings, improve this skill before doing another broad cleanup pass.

When a validator changes from crash or targets deprecated stack to a clean runtime failure, move the remaining issue from the guardrail bucket into correctness/content drift.

Typical reasons to patch the skill after a run:

it missed a whole drift cluster such as 501 stubs documented as shipped
it trusted a broken validator without verifying its target stack
it failed to compare docs payload examples against real response schemas
it missed publish-surface contradictions across README, package manifest, and repo license

When the first pass finds repo-specific drift markers, add a reusable probe list to a reference file instead of relying on memory. For common proof patterns, see references/proof-checklist.md.

Output Requirements

Always include:

Score: <n>/100
Fail Gates: present or none
Top Findings: ordered by severity
Ranked Cleanup Queue: ordered by score recovery and reader impact
Completed In This Loop: when iterating on an existing queue
Validation Run: commands executed and whether they passed
Next Loop: what to fix first before rerunning

If no issues are found, say so plainly and still report what you checked.

Related skills

More from build000r/skills

Installs

Repository

build000r/skills

GitHub Stars

First Seen

Apr 10, 2026

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass