skill-builder
Skill Builder Workflow
Create, evaluate, and improve Agent skills to production quality.
Quick Start
| Mode | When to Use | Starting Step |
|---|---|---|
| Create | Building a new skill from scratch | Step 1 |
| Evaluate | Scoring an existing skill | Step 4 |
| Improve | Upgrading a skill to 100/100 | Step 5 |
Skill Files
| File | Purpose |
|---|---|
SKILL.md |
This workflow |
SCORING.md |
Structure + Efficacy rubrics (MUST READ before scoring) |
TEMPLATES.md |
Starter templates and patterns (MUST READ before creating) |
EXAMPLES.md |
Before/after improvement examples |
CHECKLIST.md |
50-point validation checklist |
Mode 1: Create a New Skill
Step 1: Gather Requirements
Ask the user:
- What does the skill do? (core capability)
- When should it activate? (trigger contexts)
- What tools/scripts are needed? (dependencies)
- What's the expected output? (deliverables)
- What input quality issues are common? (see Input Decomposition below)
- What does this assume the user knows? (see User Assumptions below)
Input Decomposition
[!IMPORTANT] Most real-world inputs are messy. If the domain typically has vague, incomplete, or poorly-structured input, the skill MUST include a transformation step.
Ask: "What does bad input look like in this domain?"
| Input Quality | Skill Must Include |
|---|---|
| Usually clean and structured | No transformation needed |
| Sometimes vague or incomplete | Validation step that asks for clarification |
| Often messy or ambiguous | Decomposition step with probing questions to transform input |
Decomposition step pattern:
### Step N: Decompose Input
Transform raw input into structured form using these probes:
| Probe | Purpose |
|-------|---------|
| "What specifically happened?" | Extract concrete actions |
| "What was the outcome?" | Capture measurable results |
| "How often does this occur?" | Establish patterns |
User Capability Assumptions
List what the skill assumes the user can do. For each assumption, either:
- (a) Remove it by adding a compensating step, OR
- (b) Document it as a prerequisite
| Assumption | Compensation Strategy |
|---|---|
| User can provide structured input | Add decomposition step |
| User knows domain terminology | Add glossary or explain inline |
| User can make judgment calls | Add decision logic with explicit criteria |
| User knows quality standards | Add validation checklist |
Step 1.5: Identify the Hardest Parts
[!CRITICAL] State-of-the-art skills solve the hard problems, not just the easy ones. Before designing the workflow, identify where experts struggle and novices get stuck.
Ask: "What are the 2-3 hardest judgment calls in this domain?"
Signs of a hard judgment call:
- Experts disagree on the right answer
- Multiple valid options exist
- Context determines the best choice
- Novices consistently get it wrong
For each hard part, the skill MUST include:
| Hard Part Type | Required Solution |
|---|---|
| Ambiguous categorization | Disambiguation logic with explicit criteria |
| Quality/intensity judgment | Calibration guidance with thresholds |
| Context-dependent choice | Decision matrix or if/then rules |
| Subjective evaluation | Rubric with concrete examples |
Example pattern for disambiguation:
| If X could be A or B... | Ask this to disambiguate |
|-------------------------|--------------------------|
| [Ambiguous situation 1] | Was the emphasis on [criterion]? → A. On [other criterion]? → B |
| [Ambiguous situation 2] | Did it primarily [test for A] or [test for B]? |
[!WARNING] A lookup table is not disambiguation. If your skill has a reference table but no logic for handling cases that match multiple entries, it's incomplete.
Step 2: Assess Complexity & Choose Structure
[!CAUTION] Default to Simple. Only upgrade complexity if the skill genuinely needs it. Ask: "Would this skill work without this file?" If yes, don't add it.
Complexity Assessment:
| If the skill... | Then it's... |
|---|---|
| Does ONE thing, linear flow, no scripts, <5 decision points | Simple |
| Multi-step workflow, needs reference tables, moderate domain knowledge | Standard |
| Many conditionals, requires scripts, extensive domain expertise, high failure modes | Complex |
Structure by Complexity:
| Complexity | Structure |
|---|---|
| Simple | SKILL.md only |
| Standard | SKILL.md + REFERENCE.md or EXAMPLES.md |
| Complex | Above + TESTING.md + scripts/ |
[!TIP] Signs you're over-engineering:
- Adding TESTING.md with obvious scenarios ("it should work")
- Creating REFERENCE.md that repeats the workflow
- Writing EXAMPLES.md when 2 inline examples suffice
Read TEMPLATES.md for starter templates.
Step 3: Write the SKILL.md
Use templates from TEMPLATES.md. Ensure:
- Frontmatter — valid YAML with
name(must match folder name) anddescription - Description — includes BOTH what it does AND when to use it
- "Why?" line — one sentence after title explaining the problem this solves
- Workflow — clear, numbered steps
- Progressive disclosure — link to supporting files (only if needed)
[!TIP] Description is critical for discovery. Include multiple trigger keywords.
Step 3.a: Register the Skill
[!CRITICAL] Do NOT edit AGENTS.md manually.
- Run the
skills-index-updaterskill or script:python3 ~/.claude/skills/skills-index-updater/scripts/update_skill_index.py - Verify
AGENTS.mdcontains your new/updated skill.
After creating, proceed to Step 4 to evaluate.
Mode 2: Evaluate an Existing Skill
Step 4: Score the Skill
[!CRITICAL] Read SCORING.md completely before scoring. It contains both rubrics and scoring worksheets.
Process:
- Read all skill files (SKILL.md + supporting files)
- Score Structure (0-100): 9 categories — documentation completeness
- Score Efficacy (0-100): 6 categories — actual effectiveness
- Use Combined Score Matrix in SCORING.md for verdict
- Identify gaps in both dimensions
Present results using the format in SCORING.md.
If either score < 90, proceed to Step 5.
Mode 3: Improve to 100/100
Step 5: Plan Improvements
Based on evaluation, prioritize:
| Priority | Fixes | Target |
|---|---|---|
| P1 Critical | Missing frontmatter, invalid YAML, empty description | Required to function |
| P2 Important | Missing triggers, no examples, no progressive disclosure | Required for 95+ |
| P3 Polish | Missing troubleshooting, no quick start, terminology issues | Required for 100 |
Step 6: Execute Improvements
[!CAUTION] Get user approval before making changes. Present the plan and wait for confirmation.
Work systematically:
- Fix frontmatter first (skill won't load without valid YAML)
- Enhance description with trigger keywords
- Add progressive disclosure if SKILL.md > 200 lines
- Create supporting files as needed
- Add quality sections (Troubleshooting, Quick Start)
Step 7: Verify Final Score
- Re-read all skill files
- Re-score against both rubrics
- Confirm scores meet target
- Present final structure and summary
Validation Checklist (Quick)
Before declaring complete:
-
namein frontmatter matches folder name -
descriptionincludes what AND when - "Why?" line present after title
- SKILL.md under 500 lines
- Structure matches complexity (not over-engineered)
- Examples show concrete input/output
- Consistent terminology throughout
Full checklist: CHECKLIST.md
Troubleshooting
| Problem | Solution |
|---|---|
| Skill not discovered | Check description has trigger keywords |
| Low Structure score | Add missing sections per SCORING.md rubric |
| Low Efficacy score | Simplify — skill may be doing too many things |
| Frontmatter errors | Validate YAML syntax, check for reserved words |
| User confused by skill | Add Quick Start, improve decision density |
Reference
- SCORING.md — Structure + Efficacy rubrics with worksheets
- TEMPLATES.md — Starter templates and common patterns
- EXAMPLES.md — Before/after improvement examples
- CHECKLIST.md — 50-point validation checklist
More from dparedesi/agent-global-skills
skill-feedback
Generate improvement reports for skills or CLI packages you authored. Use when ending a session where you worked on your own skill, when the user mentions "skill-feedback", "capture improvements", "session learnings", or when friction was observed during skill/package usage.
13humanize
Convert AI-written text to more human-like writing through subtle edits. Use when text reads "too AI", when the user mentions "humanize", "sounds robotic", "AI-written", "make it natural", or when editing for a more conversational voice.
5cli-onboarding
Production-ready first-time user experience patterns for CLI tools including setup wizards, first-run detection, doctor/diagnostic commands, actionable error messages, edge case handling, and update notifications. When building CLI packages that need: (1) Interactive setup commands, (2) Configuration validation, (3) Onboarding flows for new users, (4) Publishing preparation, or any CLI UX improvements
3synap-assistant
Manage a personal knowledge capture system. Use when the user wants to capture ideas, track todos, organize projects, review their synap, or mentions "synap", "brain dump", "capture this", "add to my list", "what's on my plate", "what should I focus on", or "daily review".
3orchestration
MANDATORY - You must load this skill before doing anything else. This defines how you operate.
3token-pacing
Calculate the optimal token usage burn rate to reach exactly 100% usage by reset. Use when the user asks about token budget, usage limits, spending speed, or "will I run out". Supports Claude, Gemini, Codex, VS Code, and other providers.
3