repair-skill
Skill Repair
Audit and improve an existing skill against a gold standard. Unlike create-skill (which generates from scratch), this skill diagnoses violations and identifies gaps — what is broken, what is missing, and what would raise quality. The output is a structured improvement plan covering all dimensions.
Phase 1: Load the Skill
Read $ARGUMENTS as the path to a skill directory or SKILL.md file.
- If a directory: read
SKILL.md, then list and note which ofreferences/,examples/,scripts/,assets/exist and which are referenced from SKILL.md - If a file: read it directly, then discover sibling resource directories
If the path is missing or ambiguous, use AskUserQuestion to resolve before proceeding.
Load both reference files before Phase 2:
${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/skill-anatomy.md— gold standard for correct anatomy, three-level loading model, directory type definitions, degrees of freedom, naming conventions, body conventions. Required for Dimensions 5, 6, and 7.${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/frontmatter-options.md— complete frontmatter field catalog, valid values, tool list, tool selection framework. Required for Dimensions 1 and 2.
Proceed to Phase 2 when: SKILL.md is read, sibling directories are cataloged, and both reference files are loaded.
Phase 2: Audit
Run each dimension independently. For each finding record: the dimension code, what is wrong or missing, which principle it violates or which gold standard it falls short of, and the specific change required. Proceed to Phase 3 when all 7 dimensions are evaluated.
Finding types:
- Violation — something present that contradicts a rule
- Gap — something absent that would improve the skill against the gold standard
- Improvement — something that works but could be meaningfully tightened
Severity:
- critical — breaks triggering or wastes significant context on every invocation
- major — degrades generalization, reliability, or workflow correctness
- minor — polish; the skill works but isn't as good as it could be
Dimension 1 — Frontmatter Quality
The description is the only part of a skill that is always in context. Every token here costs budget across every session. Audit for violations and gaps:
Violations:
- Person and framing: Is the description third-person ("This skill should be used when...")? First-person or imperative framing reads as an instruction to execute, not a triggering condition to evaluate. Critical if wrong.
- Scalar type: Does the description use
>(folded scalar)? The|literal scalar preserves newlines and can produce unexpected whitespace when parsed. Minor. - Trigger phrase authenticity: Are quoted phrases verbatim user speech — the exact tokens a user would type? Paraphrases ("hook creation tasks") have lower routing match rates than natural language ("create a hook"). Major if paraphrased.
- Token density: Does the description restate the skill name, explain what skills are, or include meta-commentary? Every such token is budget waste. Minor per instance, major if systemic.
Gaps:
- Trigger phrase coverage: Are there 3–5+ varied phrases? Single-phrase descriptions miss synonym space. Does coverage include the naive phrasing a user would use who has never heard of this skill? Major if sparse.
- Missing
argument-hint: Does the skill read$ARGUMENTSor$1/$$2without anargument-hintfield? The hint is shown in autocomplete — its absence means users don't know what to pass. Minor. - Name validity: Is the skill name lowercase, hyphens only, max 64 chars? Verb-led for commands? Namespaced when it aids routing clarity? These constraints ensure filesystem compatibility, command-line ergonomics, and unambiguous routing. Minor if wrong.
- Trigger accuracy: Mentally generate 3 prompts that should trigger this skill and 3 that should NOT (from adjacent domains). Does the description cover the should-triggers and exclude the shouldn't-triggers? Sparse coverage or broad false-trigger surface is a routing quality gap. Major if coverage is sparse.
- Token budget: Is the description over 100 tokens? Per-session cost scales with description length across all installed skills. Over 150 tokens is a violation (major); 100–150 is a gap (minor) — tighten by prioritizing trigger phrases over prose.
- Negative triggers absent: For skills in crowded domains (multiple skills with overlapping concerns), does the description include explicit "Not for X" exclusions? Negative triggers sharpen the routing decision boundary. Minor.
Dimension 2 — Execution Modifiers
Modifiers left at their defaults are not errors — omitting them is correct when defaults apply. Audit for mismatches (violations) and missing configuration (gaps).
Refer to frontmatter-options.md for the complete field catalog, model selection table,
and tool selection framework.
Violations:
- Is
model: opusset for a task thatsonnetwould handle? Overuse wastes cost. - Does the skill have unrestricted
Bashwhen a scoped pattern (Bash(git:*)) would work? - Does the skill have tools in
allowed-toolsit never uses? Dead entries add noise. - Is
agent:set withoutcontext: fork? Theagent:field is only valid alongsidecontext: fork. Note:context: forkwithoutagent:is valid — it defaults togeneral-purpose.
Gaps:
- Does the skill do heavy analysis that would benefit from
model: opus? - Is
context: forkabsent when it should be present, or present when it shouldn't be? Add it only when ALL three hold: (1) outputs are predictable and deterministic — not open-ended analysis or conversation; (2) the primary deliverable is a side effect the user doesn't need to read inline (file written, commit created, PR opened); (3) the skill has noAskUserQuestion. Do not add it when the skill is interactive or when the output itself is what the user asked for (reports, audits, research, transcripts, advice). - Does the skill invoke other skills or spawn agents without
SkillorTaskinallowed-tools? - Does the skill require user decisions mid-workflow but lacks
AskUserQuestioninallowed-tools? - Does the skill read a file path from
$1but uses aReadtool call instead of@$1inline injection? A tool round-trip is being wasted. Minor. - Could real-time data (git status, env vars, file tree) be injected using dynamic content syntax (bang + backtick-wrapped command) instead of a tool call? Minor per instance.
Dimension 3 — Intensional vs Extensional Instruction
A rule stated with its reasoning generalizes to every input. An example that implies a rule requires the reading model to reverse-engineer the rule — two reasoning hops instead of one, covering only the shape of that example.
Violations:
- Does it show a good/bad contrast and leave the principle implicit? The principle should be stated first; the contrast confirms it, not carries it.
- Is a "Common Mistakes" or "Bad/Good examples" section doing the work that a single principle sentence could do more efficiently? Major.
- Would removing the examples leave the rule intact and still actionable? If yes, the examples are redundant. If no, the rule hasn't been stated yet — state it.
Gaps:
- Are there instruction blocks that tell Claude what to do but not why? Adding the reasoning makes the instruction generalize to edge cases not covered by the current examples. Major per uncovered block.
Dimension 4 — Agentic vs Deterministic Split
Load ${CLAUDE_PLUGIN_ROOT}/skills/create-skill/references/script-patterns.md before auditing this dimension. It contains
the five signal patterns for recognizing a script candidate, CLI design conventions,
common archetypes (init, validate, transform, package, query), and the delegation
pattern for using create-cli to design the interface.
Skills mix LLM-guided reasoning (agentic) and script execution (deterministic). The split
should be deliberate — see the Degrees of Freedom table in skill-anatomy.md.
Violations:
- Code blocks that are repeated or identical across invocations — these are
deterministic operations being re-generated each time. They belong in
scripts/. Inlining costs context tokens on every run; scripts execute without being loaded. - Prose that describes a deterministic sequence — if the steps are always the same regardless of input, a script is more reliable than asking the model to reproduce them.
- Scripts that exist but aren't referenced in SKILL.md — Claude won't use them. A script without a reference in SKILL.md specifying when and how to invoke it is invisible to the skill workflow. Major.
- Vague script references — "run the validation script if needed" is not actionable. References must state the trigger condition, the exact invocation, and how to interpret the output. Minor.
Gaps — apply the five signal patterns from script-patterns.md to each workflow step:
- Signal 1 (Repeated Generation): Does any step produce the same structure with different parameters across invocations? → Parameterized script candidate. Major.
- Signal 2 (Unclear Tool Choice): Does any step require combining multiple standard tools in a fragile sequence to accomplish something naturally expressible as a single function? → Script the procedure. Major.
- Signal 3 (Rigid Contract): Does any step have an input/output shape clear enough
to write
--helptext for right now? → CLI candidate; delegate design tocreate-cli. - Signal 4 (Dual-Use Potential): Would any step be useful to run independently from the terminal, outside the skill workflow? → Design as proper CLI from the start.
- Signal 5 (Consistency Critical): Does any step need to produce identical output for identical inputs — not "similar" but reproducible? → Script, not LLM generation.
- Judgment steps with no criteria — "analyze the situation" is agentic but unanchored. Agentic steps need explicit criteria for what to consider and what constitutes a good outcome. Major per uncovered step.
Dimension 5 — Verbosity and Context Efficiency
Every token in SKILL.md is loaded into context when the skill triggers. Audit for tokens
that consume budget without improving outcomes, and for content that belongs in
references/ instead.
Refer to the size invariants table in skill-anatomy.md to calibrate severity.
Violations:
- Prose that restates the section header — "## Validation" followed by "In this section we will validate..." is pure redundancy. Minor per instance.
- Hedging language — "you might want to consider", "it could be useful to", "generally speaking". Replace with direct imperatives or remove. Minor per instance.
- Code blocks illustrating a principle stateable in one sentence — a good/bad YAML contrast often collapses to one intensional rule. Major if pattern is frequent.
- Repeated guidance across sections — the same rule in a "Best Practices" section and a "Common Mistakes" section. Consolidate to one location. Minor.
- "When to Use This Skill" section in the body — body loads only after triggering; routing guidance here is never read by the routing decision. Dead tokens every invocation. Major.
- Headers deeper than H3 — signals content that belongs in
references/. Minor. - SKILL.md over ~500 lines — requires
references/deferral. Major. - Extraneous documentation files (
README.md,CHANGELOG.md,INSTALLATION.md) in the skill directory — never loaded into context, add noise to the package. Minor per file.
Gaps:
- Would a
references/file reduce SKILL.md size? Identify sections only needed for specific sub-tasks and flag them as deferral candidates. Major if SKILL.md > 300 lines. - Would a
references/file for domain-specific data help? Lookup tables, option catalogs, field definitions — these are reference data, not instructions. Major.
Dimension 6 — Workflow Clarity
A skill's process should be sequential, complete, and have explicit exit conditions at each phase. Audit for broken workflow and for missing structure that would help.
Violations:
- Is the process structured as numbered phases with clear names? Without explicit phases the model can't track progress or know which step it's in. Major if unstructured.
- Does each phase have an explicit exit condition? Without one, the model doesn't know when to stop iterating on a phase and may loop or skip prematurely. Major if missing.
- Are there half-thought steps — phases that describe intent without specifying what to do or how to evaluate the result? Major per uncovered phase.
- Does the skill handle missing, ambiguous, or malformed input?
Gaps:
- Is there a delivery phase that tells Claude what to produce and in what format? Many skills describe the process clearly but leave the output format implicit. Major if absent.
- Would a validation checklist at the end of the workflow catch errors that prose instructions miss? Minor.
- Would an
examples/directory help users understand what the expected output looks like? Minor.
Dimension 7 — Anatomy Completeness
Refer to skill-anatomy.md for the gold standard directory anatomy and the Gap Analysis
Checklist. This dimension asks: does the skill's structure match its complexity tier, and
what is absent that would raise it?
Use the Gap Analysis Checklist from skill-anatomy.md directly. For each "yes"
answer, record a gap at the appropriate severity.
Violations:
- Does the skill have a
scripts/directory with scripts not referenced in SKILL.md? Major — referenced or delete. - Does the skill have a
references/directory with files not pointed to from SKILL.md? Major — referenced or delete. - Does the naming violate conventions (uppercase, underscores, over 64 chars)? Minor.
Gaps — ask for each absent directory:
- Missing
scripts/: Is there a deterministic operation that would be more reliable scripted? Does the same code block appear or would it appear in multiple invocations? - Missing
references/: Does SKILL.md exceed 300 lines? Are there sections only needed for specific sub-tasks? Is there domain-specific reference data? - Missing
examples/: Does the skill produce output users adapt? Are there ambiguous instructions a working example would clarify better than prose? - Missing resource pointers in SKILL.md: Are there directories present but not referenced — invisible to Claude unless it guesses to look?
Phase 3: Improvement Report
Present findings as a structured report. Split violations from gaps — a violation is something wrong, a gap is something missing that would improve the skill.
SKILL IMPROVEMENT REPORT: <skill-name>
Current tier: [simple / standard / complex] — [lines] lines, [directories present]
VIOLATIONS
──────────
CRITICAL
[D1] Description uses first-person — routing model reads as instruction, not trigger.
Fix: rewrite as "This skill should be used when the user asks to..."
MAJOR
[D3] Body teaches frontmatter quality by bad/good contrast; principle never stated.
Fix: state the rule ("quoted phrases must be verbatim user speech because routing
matches on literal tokens") then keep the contrast as confirmation.
[D5] "When to Use This Skill" section in body — dead tokens every invocation.
Fix: move routing guidance to frontmatter description, delete body section.
MINOR
[D1] Description uses | scalar instead of >.
Fix: change to >.
GAPS (what would improve this skill)
─────────────────────────────────────
MAJOR
[D7] SKILL.md is 420 lines with no references/ directory. Three sections (option catalog,
field definitions, examples table) are only needed for specific sub-tasks.
Improvement: extract to references/; add load pointer in SKILL.md for each.
[D4] File-path validation logic is inlined but must produce consistent output.
Improvement: move to scripts/validate-input.py; reference from Phase 2.
MINOR
[D2] Skill reads $1 as a file path but uses Read tool — @$1 injection would save a
tool round-trip.
Improvement: replace Read call with @$1 inline injection.
[D7] No examples/ directory; skill produces config output users adapt.
Improvement: add examples/ with one representative output file.
Group violations by severity, then gaps by severity. For each: dimension code, what is wrong or missing, the principle or gold standard it falls short of, the exact fix.
Ask: "Apply all critical and major items? Or select specific ones?"
Phase 4: Apply Improvements
Apply confirmed items in order: critical violations → major violations → major gaps → minor violations → minor gaps.
For each item:
- State what is being changed or added and why (principle reference, not just "you asked")
- Make the edit or create the file
- Confirm the change is consistent with surrounding content
Explain Your Choices
After applying improvements, briefly explain:
- What was changed and why — reference the principle: "Rewrote description as third-person because first-person framing is parsed as an instruction to execute, not a triggering condition to evaluate"
- What was added and why — "Created references/options.md and deferred the option catalog because SKILL.md was 420 lines and the catalog is only needed for the configuration sub-task"
- What was left unchanged and why — "Left
modelunset — workload doesn't justify overriding the default" - What remains for the user to address — "The examples/ gap requires domain knowledge to fill; a placeholder directory was created"
Phase 4 is complete when all confirmed items are applied, the explanation is delivered, and the validation checklist passes.
Validation
After applying all improvements, load ${CLAUDE_PLUGIN_ROOT}/skills/repair-skill/references/quality-checklist.md
and run the quality standards check followed by the item-by-item validation checklist.
Report any failing items before delivering final results.