Audit Skill

Score an agent skill against Anthropic's published best practices and the 5 canonical design patterns. Two modes: report (diagnose) and fix (diagnose + rewrite).

Modes

Report mode (default): Score each criterion, produce a grade, list fixes by impact.

Fix mode: Score, then rewrite the failing sections. Use when the user says "fix it", "improve it", "make it better", or "audit and fix". Ask before overwriting.

Step 1 — Locate the Skill

Find the skill to audit:

A path the user provides
The current directory if it contains SKILL.md
A recently created skill from a skill-creator session

Read SKILL.md. Also inventory the folder — check for references/, scripts/, assets/.

Step 2 — Classify the Pattern

Load references/patterns.md. Determine which of the 5 patterns the skill uses:

Tool Wrapper — on-demand library/API context
Generator — consistent output from templates
Reviewer — score against a checklist
Inversion — interview user before acting
Pipeline — strict multi-step with checkpoints

Name composites explicitly (e.g., "Inversion + Pipeline"). Flag unclear patterns as a quality issue.

Step 3 — Score Against Checklist

Load references/checklist.md. For each of the 10 criteria, assign pass, warn, or fail.

For every warn/fail, write a specific fix — not "improve the description" but "add these trigger phrases: 'when user asks about X, Y, Z'".

Step 4 — Grade

Grade	Criteria
A	0 fails, ≤2 warns
B	0 fails, 3-4 warns
C	1-2 fails OR 5+ warns
D	3+ fails

Step 5 — Report

## Skill Audit: {name}

**Grade: {A-D}** | **Pattern: {name}** | **{lines} lines** | **{n} files**

| # | Criterion | Score | Notes | Fix |
|---|-----------|-------|-------|-----|
| 1 | Name | ✅/⚠️/❌ | ... | ... |

### Top 3 Fixes (by impact)
1. ...

### Strengths
- ...

Step 6 — Fix Mode

If the user asked for fixes, or if Grade is C/D and user agrees:

Description: Rewrite frontmatter with 5+ trigger phrases. Show before/after diff.
Structure: If >300 lines, move content to references/. Create the files.
Gotchas: If missing, write 3+ based on the skill's domain.
Hard gates: If Pipeline/Inversion but no gates, add "DO NOT proceed until..." between key phases.
Scripts: If inline shell/Python operations described, extract to scripts/.

Show each diff. Ask "Apply this change?" before writing. Load references/examples.md for before/after patterns.

Gotchas

A 50-line Tool Wrapper can be Grade A. Don't penalize short skills — penalize dumping everything inline.
Description is the #1 undertriggering cause. If it reads like a README summary, that's a fail.
"ALWAYS"/"NEVER" in caps are yellow flags — explain the WHY instead. Note but don't auto-fail.
Zero gotchas is suspicious. Every real workflow has edge cases.
Composites are expected. "Inversion + Pipeline" is the recommended approach for complex skills.
Fix mode is conservative — only rewrite sections that scored warn or fail.

audit-skill