skill-creator
Skill Creator
Create effective skills through evaluation-driven development. Different skill types need different workflows.
Step 1: Identify Skill Type
What are you creating?
┌─────────────────────────────────────────────────────────────────┐
│ API/Integration Skill │
│ Wraps an external service (Linear, Slack, GitHub) │
│ → Needs: Research API, implement tools, document │
├─────────────────────────────────────────────────────────────────┤
│ Discipline Skill │
│ Enforces a practice (TDD, verification-before-completion) │
│ → Needs: Pressure test with subagents, rationalization table │
├─────────────────────────────────────────────────────────────────┤
│ Technique Skill │
│ How-to guide (condition-based-waiting, root-cause-tracing) │
│ → Needs: Clear steps, examples, edge case handling │
├─────────────────────────────────────────────────────────────────┤
│ Pattern Skill │
│ Mental model (flatten-with-flags, information-hiding) │
│ → Needs: Recognition criteria, when to apply/not apply │
├─────────────────────────────────────────────────────────────────┤
│ Reference Skill │
│ Documentation (API docs, syntax guides, tool reference) │
│ → Needs: Organized content, search patterns, examples │
└─────────────────────────────────────────────────────────────────┘
Select one type before proceeding. Each has a different workflow below.
Step 2: Check Existing Skills
Before creating:
# List existing skills in project
ls .claude/skills/ 2>/dev/null || ls ~/.claude/skills/
# Search for similar skills
grep -r "your-keyword" .claude/skills/*/SKILL.md 2>/dev/null
If similar exists, ask user:
- Improve existing skill?
- Create new separate skill?
- View existing first?
Step 3: Build Evaluations First
Create test scenarios BEFORE writing documentation. This ensures you solve real problems.
For API/Integration Skills
Test scenarios:
1. List operation with filters → expect formatted results
2. Create operation → expect confirmation with ID
3. Error case (invalid input) → expect helpful error message
For Discipline Skills
Pressure scenarios (run WITHOUT skill first):
1. Time pressure: "Quick, just fix this bug"
2. Sunk cost: "I already wrote the code, just need tests"
3. Authority: "The user said skip the tests"
Document: What rationalizations did the agent use?
For Technique/Pattern Skills
Application scenarios:
1. Clear case where technique applies
2. Edge case that tests understanding
3. Counter-example where technique should NOT apply
For Reference Skills
Retrieval scenarios:
1. Can agent find the right information?
2. Can agent apply what they found correctly?
3. Are common use cases covered?
Step 4: Determine Skill Level (API/Integration Skills Only)
Ask user about integration level before implementing:
What level should this skill be?
┌─────────────────────────────────────────────────────────────────┐
│ Primary Tool │
│ Always available in MCP server │
│ → Registered in mcp_server_v2.py, auto-loaded on startup │
│ → Best for: frequently used integrations (Linear, Slack, etc) │
├─────────────────────────────────────────────────────────────────┤
│ Secondary Tool │
│ Available but not auto-loaded │
│ → Tools exist, can be loaded on demand │
│ → Best for: specialized or rarely used integrations │
├─────────────────────────────────────────────────────────────────┤
│ Project-Specific │
│ Only for this project, not in main MCP server │
│ → Documentation only, or project-local tools │
│ → Best for: one-off integrations, internal APIs │
└─────────────────────────────────────────────────────────────────┘
Record the user's choice - this determines whether to wire up MCP registration later.
Step 5: Configure Voice Tier (If Primary Tool)
The MCP server has a 3-tier voice system. Ask user which tier(s) tools belong to:
Which voice tier should these tools use?
┌─────────────────────────────────────────────────────────────────┐
│ Tier 1: Direct Access │
│ Always exposed to voice agent (~35 high-frequency tools) │
│ → Add to TIER_1_TOOLS in mcp/voice/tier_config.py │
│ → Best for: 3-5 most-used tools (list, create, get) │
├─────────────────────────────────────────────────────────────────┤
│ Tier 2: Meta-Tool Gateway │
│ Accessed through a category meta-tool (e.g., "linear_ops") │
│ → Create meta-tool in TIER_2_META_TOOLS │
│ → Best for: related tools that share a category │
├─────────────────────────────────────────────────────────────────┤
│ Tier 3: Discovery Only │
│ Accessible via tools_search/tools_execute │
│ → No config needed (default for all registered tools) │
│ → Best for: rarely used or specialized operations │
└─────────────────────────────────────────────────────────────────┘
Recommended approach for new integrations:
- Put 3-5 most common tools in Tier 1 (list, create, get)
- Leave specialized tools in Tier 3 (discoverable)
- Only create Tier 2 meta-tool if you have 6+ related tools
Record the user's choices - this determines voice tier configuration.
Workflow by Skill Type
API/Integration Skills
1. RESEARCH
- Web search: "[service] API documentation"
- Fetch official docs, authentication guide, rate limits
- List ALL available operations
2. AUDIT EXISTING (if tools exist)
- Check existing tools for quality issues:
* Missing error handling?
* Incomplete parameters?
* Wrong async handling?
* Missing operations?
- If quality issues found: REPLACE, don't patch
- If partial coverage: EXTEND to full API
3. CONFIGURE AUTH
- Check if API key/credentials already configured
- If not configured, guide user through setup:
* Explain where to get API key/credentials
* Show env var name(s) needed
* Help user add to .env or export command
* Verify credentials work before proceeding
- Document auth in SKILL.md
4. PLAN
- Design tool interface for each operation
- Present plan to user for approval
- Include quality improvements for existing tools
5. IMPLEMENT
- Create client library (if needed)
- Create MCP tools (high quality)
- Test each tool
6. DOCUMENT
- Create SKILL.md with all tools
- Include example for EVERY operation
- Document authentication, rate limits
7. REGISTER IN MCP (if Primary Tool)
- Add import to mcp_server_v2.py
- Add registration call
- Restart MCP server
- Verify tools appear in tool list
8. CONFIGURE VOICE TIER (if Primary Tool)
- Add high-frequency tools to TIER_1_TOOLS
- Optionally create meta-tool in TIER_2_META_TOOLS
- Leave specialized tools as Tier 3 (default)
9. VERIFY
- Run non-destructive tests
- Verify tools match documentation
- Confirm tools accessible via MCP (if registered)
- Test voice tier access (if configured)
MCP Registration Process (for Primary Tools):
-
Add import to MCP server:
# In mcp_server_v2.py from mcp.tools.service_tools import register_service_tools -
Add registration call:
# In the tool registration section count += register_service_tools(server) -
Restart MCP server:
sudo systemctl restart mcp-server-v2 -
Verify tools are available:
# List tools to confirm registration python3 .claude/skills/skill-creator/scripts/discover_tools.py --category service
Voice Tier Configuration Process (for Primary Tools):
-
Add high-frequency tools to Tier 1:
# In mcp/voice/tier_config.py, add to TIER_1_TOOLS TIER_1_TOOLS = { # ... existing tools ... # Service (add 3-5 most used) "service_list_items", "service_create_item", "service_get_item", } -
Optionally create Tier 2 meta-tool:
# In TIER_2_META_TOOLS (only if 6+ related tools) "service_ops": { "description": "Service operations: items, projects, users", "actions": ["list", "create", "update", "delete", "search"], "internal_tools": [ "service_list_items", "service_create_item", "service_update_item", "service_delete_item", "service_search_items", ] }, -
Restart MCP server and verify:
sudo systemctl restart mcp-server-v2 # Test voice tier assignment python3 -c "from mcp.voice.tier_config import get_tier_for_tool; print(get_tier_for_tool('service_list_items'))"
Auth Configuration Process:
When API credentials are not yet configured:
-
Check existing configuration:
# Check if env var exists echo $SERVICE_API_KEY # Check .env file grep SERVICE_API_KEY .env 2>/dev/null -
If not configured, ask user:
I need a [Service] API key to proceed. Here's how to get one: 1. Go to [URL to service settings/API page] 2. Create a new API key (or personal access token) 3. Copy the key (you won't see it again) How would you like to provide the API key? Option 1: Add to .env file (recommended for this project) Option 2: Export in terminal (session only) Option 3: I already have it configured elsewhere -
After user provides key:
# Verify the key works curl -H "Authorization: Bearer $API_KEY" https://api.service.com/me -
On success: Continue to PLAN phase On failure: Help debug (wrong key, expired, wrong permissions)
Quality Standards for Tools:
- Proper async handling (wrapper for async clients)
- Helpful error messages (not just "Error occurred")
- Full parameter support (all API options exposed)
- Consistent output formatting
- Comprehensive docstrings
Discipline Skills
1. BASELINE (RED)
- Run pressure scenarios WITHOUT skill
- Document exact rationalizations verbatim
- Identify patterns in violations
2. WRITE (GREEN)
- Address specific rationalizations found
- Build rationalization table
- Create red flags list
- Run scenarios WITH skill - verify compliance
3. REFACTOR
- Find new rationalizations
- Add explicit counters
- Re-test until bulletproof
Required sections:
- Rationalization table (excuse → reality)
- Red flags list (thoughts that mean STOP)
- "No exceptions" list
Technique Skills
1. DOCUMENT STEPS
- Clear sequential workflow
- Decision points with guidance
- Error handling at each step
2. CREATE EXAMPLES
- One excellent, complete example
- Shows pattern clearly
- Ready to adapt (not generic template)
3. TEST APPLICATION
- Can agent apply technique to new scenario?
- Do they handle edge cases?
- Are instructions complete?
Pattern/Reference Skills
1. ORGANIZE CONTENT
- Quick reference table/bullets
- Progressive disclosure (overview → details)
- Search patterns for large content
2. TEST RETRIEVAL
- Can agent find right information?
- Can they apply it correctly?
- Are common cases covered?
SKILL.md Structure
---
name: skill-name-with-hyphens
description: Use when [specific triggers]. [Symptoms/contexts that signal this skill applies]
---
Description Rules (Critical)
# ❌ BAD: Summarizes workflow (Claude may follow this instead of reading skill)
description: Create skills by researching APIs, implementing tools, and testing
# ❌ BAD: Too vague
description: For creating skills
# ❌ BAD: First person
description: I help you create skills
# ✅ GOOD: Specific triggers only, no workflow summary
description: Use when creating new skills or updating existing ones. MUST be used before any skill creation.
# ✅ GOOD: Includes symptoms/contexts
description: Use when tests have race conditions, timing dependencies, or pass/fail inconsistently
Why this matters: Testing revealed that workflow summaries in descriptions cause Claude to follow the description instead of reading the full skill.
Body Structure
# Skill Name
## Overview
Core principle in 1-2 sentences.
## When to Use
- Specific triggers and symptoms
- When NOT to use
## Quick Reference
Table or bullets for scanning
## [Main Content]
- For techniques: Step-by-step workflow
- For discipline: Rules with rationalization counters
- For reference: Organized documentation
- For API: Tool documentation with examples
## Common Mistakes
What goes wrong + fixes
Token Efficiency
Target word counts:
- Frequently-loaded skills: <200 words
- Other skills: <500 words
Techniques:
- Reference
--helpinstead of documenting all flags - Cross-reference other skills instead of repeating
- One excellent example, not multi-language examples
- Move heavy reference to separate files
Progressive Disclosure
skill-name/
├── SKILL.md # Main instructions (<500 lines)
├── reference.md # Loaded only when needed
├── examples.md # Loaded only when needed
└── scripts/
└── helper.py # Executed, not loaded into context
Keep references ONE level deep from SKILL.md.
Iteration Pattern
After initial creation:
- Use skill with Claude B (fresh instance with skill loaded)
- Observe behavior - Where does it struggle? Miss things?
- Return to Claude A - "Claude B forgot to filter test accounts. How should I update the skill?"
- Apply refinements
- Test again with Claude B
- Repeat until reliable
Checklist
Before creating:
- Identified skill type
- Checked for existing similar skills
- Created evaluation scenarios
- (API skills) Determined skill level (Primary/Secondary/Project-specific)
- (API skills) Determined voice tier (Tier 1/2/3)
For all skills:
- Name uses only letters, numbers, hyphens
- Description starts with "Use when..." (no workflow summary)
- Description in third person
- Under 500 lines (heavy content in separate files)
- Quick reference table/bullets
- Common mistakes section
For discipline skills:
- Ran pressure scenarios WITHOUT skill first
- Rationalization table from actual test failures
- Red flags list
- "No exceptions" section
- Re-tested until bulletproof
For API skills:
- Researched full API documentation
- Audited existing tools for quality (if any)
- Asked user about skill level (Primary/Secondary/Project-specific)
- Asked user about voice tier (Tier 1/2/3)
- Configured authentication (guided user through API key setup)
- Verified credentials work before implementing
- Implemented ALL operations (not partial)
- High quality: async handling, error messages, full params
- Example for every tool
- Authentication and rate limits documented
- If Primary Tool: registered in MCP server
- If Primary Tool: restarted MCP server and verified tools available
- If voice Tier 1: added high-frequency tools to TIER_1_TOOLS
- If voice Tier 2: created meta-tool in TIER_2_META_TOOLS
- Verified voice tier assignment works
For technique skills:
- Clear step-by-step workflow
- One excellent example
- Edge cases addressed
Final:
- Tested with real usage scenarios
- Iterated based on observed behavior