Skill Review & Validation

Process

Step 1: Locate Skill Files

Accept input as:

Path to a skill directory
Skill name (search in ~/.claude/skills/)
URL to a GitHub repository

Read all .md files, scripts, and asset files.

Step 2: Structure Validation

Run python scripts/validate_skill.py <path> for programmatic checks.

Manual verification:

SKILL.md exists (exact case)
No README.md inside skill folder
Folder name matches name field
Valid kebab-case naming (1-64 chars)
No "claude" or "anthropic" in name

Step 3: Frontmatter Audit

Check	Pass Criteria
Name format	kebab-case, 1-64 chars, no leading/trailing hyphens
Description present	Non-empty, 1-1024 characters
Description has WHAT	Explains capabilities
Description has WHEN	Includes trigger phrases
Description has keywords	Domain-specific terms included
No XML tags	No < or > characters
Optional fields valid	license, compatibility (<500 chars), metadata

Step 4: Triggering Analysis

Assess the description for activation quality:

Under-triggering risks:

Too generic ("Helps with projects")
Missing common paraphrases
No domain keywords
Missing file type mentions (if relevant)

Over-triggering risks:

Too broad ("Processes documents")
Overlaps with built-in Claude capabilities
Missing negative triggers for disambiguation

Generate test queries:

5 queries that SHOULD trigger the skill
5 queries that SHOULD NOT trigger
3 edge cases (ambiguous queries)

Step 5: Instruction Quality

Criterion	Score (0-10)
Specificity	Are instructions actionable? (not "validate properly")
Completeness	All workflows covered?
Error handling	Common failures addressed?
Examples	Concrete examples provided?
Progressive disclosure	Detailed docs in references/ not SKILL.md?
Length	Under 500 lines / 5000 tokens?
Cross-references	Clear links to references/scripts?

Step 6: Architecture Review (Multi-skill)

For skills with sub-skills:

Main skill has clear routing table
Sub-skills have focused responsibilities
Cross-references are valid (files exist)
Naming follows parent-child convention
Shared references in parent, not duplicated
Agents have clear roles (if Tier 4)

Step 7: Script Quality (if present)

Docstrings with purpose, input, output
CLI interface (argparse or similar)
Structured output (JSON)
Error handling (try/except with clear messages)
No hardcoded paths or secrets
Minimal dependencies

Step 8: Generate Skill Health Score

Scoring methodology (0-100):

Category	Weight	Checks
Frontmatter Quality	25%	Name, description, format
Trigger Accuracy	20%	WHAT + WHEN + keywords
Instruction Quality	25%	Specificity, completeness, examples
Structure Compliance	15%	File naming, organization, references
Script Quality	10%	If applicable (full marks if no scripts needed)
Progressive Disclosure	5%	Proper use of 3-level system

Step 9: Generate Trigger Eval Set

After reviewing, generate a structured trigger eval set for ongoing testing:

Run python scripts/generate_eval_set.py <path> to auto-generate a starter set
Review and refine the generated queries:
- Ensure 8-10 should-trigger queries cover different phrasings and edge cases
- Ensure 8-10 should-not-trigger queries are near-misses (not obviously irrelevant)
- Include casual speech, typos, and uncommon domain uses in should-trigger set
Save the eval set to evals/evals.json in the skill directory

Good queries are realistic and specific (include file paths, context, domain details). Bad queries are overly generic ("format this data") or obviously irrelevant.

Run python scripts/optimize_description.py <path> --eval-set evals/evals.json to score the current description and get improvement suggestions
Recommend running /skill-forge eval <path> for full functional evaluation

Step 10: Generate Report

# Skill Review: [name]

## Health Score: [X]/100

## Summary
[2-3 sentence assessment]

## Scores by Category
| Category | Score | Notes |
|----------|-------|-------|
| Frontmatter | X/25 | [issues] |
| Triggering | X/20 | [issues] |
| Instructions | X/25 | [issues] |
| Structure | X/15 | [issues] |
| Scripts | X/10 | [issues] |
| Disclosure | X/5 | [issues] |

## Critical Issues (fix immediately)
- [issue 1]
- [issue 2]

## High Priority (fix within 1 week)
- [issue 1]

## Recommendations
- [suggestion 1]
- [suggestion 2]

## Suggested Test Queries
### Should Trigger
1. [query]
2. [query]
3. [query]

### Should NOT Trigger
1. [query]
2. [query]
3. [query]

skill-forge-review