skill-improver
Skill Improver
Process user feedback from skill retrospectives and update skill files to improve them over time.
When to Use
- User asks to "review skill feedback" or "improve skills based on usage"
- You notice feedback files in
.claude/feedback/ - User mentions a skill didn't work well or missed something
- Periodic review (monthly) to incorporate learnings
How It Works
Step 1: Gather Feedback
Read all feedback files in .claude/feedback/:
ls -la .claude/feedback/retro-*.md
Look for patterns:
- Multiple users reporting same missing step → add to skill
- Benchmarks don't match user's context → add context-specific ranges
- Workflow confusing → restructure or add clarifications
- Skill incomplete → add missing sections
Step 2: Identify High-Impact Changes
Prioritize updates based on:
High Priority (do first):
- Missing critical steps that users had to figure out themselves
- Incorrect benchmarks or numbers
- Confusing workflow that requires clarification
- Safety issues or errors
Medium Priority:
- Additional examples or templates
- Better explanations of existing steps
- Alternative approaches for different contexts
Low Priority:
- Nice-to-have additions
- Stylistic improvements
- Minor clarifications
Step 3: Update Skill Files
For each skill needing updates:
3a. Add a "Learnings" Section
If the skill doesn't have one, add at the end:
## Learnings from Use
**[Date]**: [Brief description of what was learned]
- **Feedback**: [What users reported]
- **Update**: [What we changed]
- **Result**: [Expected improvement]
3b. Update Main Content
If feedback suggests core changes:
- Add missing steps to checklists
- Update benchmarks with ranges (e.g., "20-30% for B2C, 50-70% for B2B")
- Restructure workflow if confusing
- Add "Common Pitfalls" section if users make same mistakes
3c. Version the Change
At the top of the skill, track versions:
---
name: skill-name
version: 1.2.0
last_updated: 2026-01-22
changelog:
- v1.2.0 (2026-01-22): Added missing step for X based on user feedback
- v1.1.0 (2026-01-15): Updated benchmarks for Y context
- v1.0.0 (2026-01-01): Initial release
---
Step 4: Archive Processed Feedback
Move processed feedback to archive:
mkdir -p .claude/feedback/archive
mv .claude/feedback/retro-2026-01-22-*.md .claude/feedback/archive/
Keep a summary of learnings in .claude/feedback/SUMMARY.md:
# Feedback Summary
## [Skill Name]
**Total feedback sessions**: 12
**Last updated**: 2026-01-22
**Key learnings**:
- Added step for X (reported by 3 users)
- Updated benchmarks for B2B context (reported by 5 users)
- Clarified workflow around Y (reported by 2 users)
**Patterns**:
- Users in enterprise context need higher benchmarks
- Early-stage startups need more examples
- Non-technical users need clearer explanations of jargon
Example Workflow
Scenario: product-market-fit skill needs improvement
Step 1: Review Feedback
Read .claude/feedback/retro-2026-01-22-143022.md:
## Feedback
**Missed important steps?** yes
**Improvements needed:**
The Sean Ellis test threshold of 40% seems high for B2B enterprise products.
We're at 32% "very disappointed" but our retention is 85% D30 which is excellent.
Should the skill mention that thresholds vary by product type?
Step 2: Identify Pattern
Check other feedback files → 3 more users report B2B context needs different benchmarks.
Step 3: Update Skill
Edit .claude/skills/product-market-fit/SKILL.md:
Before:
## Sean Ellis Test (40% Rule)
"How would you feel if you could no longer use [product]?"
- ≥40% "Very disappointed" = Strong PMF
After:
## Sean Ellis Test (Context-Dependent Thresholds)
"How would you feel if you could no longer use [product]?"
**Thresholds by product type:**
- **Consumer B2C**: ≥40% "Very disappointed" = Strong PMF
- **SMB B2B**: ≥35% "Very disappointed" = Strong PMF
- **Enterprise B2B**: ≥30% "Very disappointed" = Strong PMF (longer sales cycles, different buying psychology)
**Why the difference?**
- Enterprise buyers are more rational than emotional
- Switching costs are higher (contracts, integrations)
- Retention is a better PMF signal for B2B (see Step 2)
Add to Learnings section:
## Learnings from Use
**2026-01-22**: Refined Sean Ellis thresholds by product type
- **Feedback**: 4 users reported 40% threshold too high for B2B enterprise
- **Update**: Added context-specific thresholds (B2C 40%, SMB 35%, Enterprise 30%)
- **Result**: More accurate PMF diagnosis for different product types
Step 4: Archive & Track
mv .claude/feedback/retro-2026-01-22-*.md .claude/feedback/archive/
Update .claude/feedback/SUMMARY.md:
## product-market-fit
**Total feedback sessions**: 4
**Last updated**: 2026-01-22
**Key learnings**:
- Added context-specific Sean Ellis thresholds (reported by 4 users)
- B2B needs different benchmarks than B2C
**Next improvements to consider**:
- Add industry-specific retention benchmarks
- Include examples from different verticals
Quality Checklist
Before updating any skill, ensure:
- Feedback is from multiple users (pattern, not outlier)
- Change makes skill more accurate, not just more complex
- Benchmarks are sourced or validated (not anecdotal)
- Update is backward compatible (doesn't break existing workflows)
- Learnings section documents why we made the change
- Version number incremented appropriately (semver)
- Processed feedback archived, not deleted
Feedback Categories
Track feedback by type to identify systemic issues:
Category 1: Missing Steps
Example: "Skill forgot to mention we need to segment cohorts by acquisition channel" Action: Add step to checklist
Category 2: Incorrect Benchmarks
Example: "40% D30 retention is not 'strong' for our B2B SaaS, it's average" Action: Update benchmarks with context (B2C vs B2B vs Enterprise)
Category 3: Confusing Workflow
Example: "I didn't know whether to do cohort analysis before or after Sean Ellis test" Action: Number steps clearly, add workflow diagram
Category 4: Missing Context
Example: "Skill assumes I have 1000+ users, what if I only have 50?" Action: Add "Early Stage Adaptation" section
Category 5: Tool-Specific Issues
Example: "How do I calculate D30 retention in Google Analytics?" Action: Add "Implementation in Common Tools" section
Best Practices
Do:
✅ Look for patterns across multiple feedback sessions ✅ Update skills incrementally (small, tested changes) ✅ Document why changes were made (Learnings section) ✅ Preserve feedback history (archive, don't delete) ✅ Version skills so users know what changed
Don't:
❌ Update based on single piece of feedback (might be outlier) ❌ Make skills overly complex trying to cover every edge case ❌ Remove content without understanding why it was there ❌ Ignore feedback for more than 30 days (patterns emerge) ❌ Update without testing the new version
Automation Ideas
Weekly Digest (optional)
Create a script to summarize new feedback:
#!/bin/bash
# .claude/hooks/learning/weekly-feedback-digest.sh
echo "📊 Feedback Digest (Last 7 Days)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
find .claude/feedback -name "retro-*.md" -mtime -7 | while read -r file; do
echo ""
echo "File: $(basename $file)"
grep "Skills Used" -A 5 "$file"
grep "Improvements needed:" -A 3 "$file"
done
Auto-Tag for Review
When feedback mentions specific issues, auto-tag:
- "missing step" → tag for immediate review
- "wrong number" → tag for fact-check
- "confusing" → tag for clarity rewrite
Success Metrics
Track improvement over time:
- Feedback frequency: Decreasing = skills getting better
- Repeated issues: Should approach zero over time
- User satisfaction: Track "Did this skill help?" responses
- Skill usage: Updated skills should see increased usage
Meta: This Skill Improves Itself
This skill should follow its own advice:
Learnings from Use:
[To be filled as this skill gets used and improved]
Version History:
- v1.0.0 (2026-01-22): Initial release - framework for skill improvement
Next Steps:
- Review feedback in
.claude/feedback/ - Identify patterns and prioritize updates
- Update skill files with improvements
- Document learnings
- Archive processed feedback
- Commit changes with clear message
The more you use this system, the better your skills become. It's a continuous improvement loop.