evolve
Evolve -- Research-Driven Self-Upgrade Pipeline
This skill researches the latest Claude Code platform state, audits Elle against it, plans upgrades, executes them, and verifies results. It auto-updates the reference files in this skill's references/ directory to keep the knowledge base current between runs.
Context Detection
Determine execution mode before doing anything else.
Check: Does <cwd>/personal-assistant/.claude-plugin/plugin.json exist AND does <cwd>/.git exist AND does <cwd>/CONVENTIONS.md exist?
-
YES -- Source Mode. Operating in the marketplace repo where Elle is maintained. Operate on skill files, hooks, and references directly. Bump version and update CHANGELOG when done.
-
NO -- Deployed Mode. Show this disclaimer:
Note: Evolve is designed to run from Elle's source repo where the plugin is maintained. Running here will audit and modify your installed copy at
~/.claude/plugins/cache/.... These changes will be overwritten on the next plugin update.To run from source, navigate to your marketplace repo.
Proceed with deployed copy?
Wait for confirmation. If declined, stop.
Set ELLE_ROOT:
- Source mode:
<cwd>/personal-assistant - Deployed mode: the installed plugin path (find via the highest version directory in
~/.claude/plugins/cache/ai-launchpad/personal-assistant/)
Phase 1: Research
Run six research tasks in parallel using subagents. Each subagent summarizes its findings.
1A. Claude Code Changelog
Fetch latest Claude Code releases:
WebFetch: https://github.com/anthropics/claude-code/releases
Also check the raw changelog:
WebFetch: https://raw.githubusercontent.com/anthropics/claude-code/main/CHANGELOG.md
Extract from the last 3-5 releases:
- New tool capabilities or parameters
- New hook events or rule features
- Changes to skill loading, triggering, or frontmatter
- New agent/subagent patterns
- Deprecations or breaking changes
- Performance improvements relevant to skill design
1B. Claude Code Documentation (Discovery-Driven)
Discover the Claude Code documentation landscape. Do NOT hardcode doc page URLs -- discover them dynamically.
-
Find available documentation:
WebSearch: "Claude Code" site:docs.anthropic.com WebFetch: https://docs.anthropic.com/en/docs/claude-code/overview -
From the search results and overview page, identify and fetch pages covering these categories:
- Core infrastructure: skills, hooks, rules, CLAUDE.md
- Agent capabilities: subagents/Agent tool, agent teams
- Plugin system: commands, output styles, MCP servers, plugin architecture
- Memory & context: auto memory, context management
- Configuration: settings, permissions, tool gating
- Workflow: worktrees, session management
-
For each fetched page, extract:
- Current best practices and recommended patterns
- New features not in
${CLAUDE_SKILL_DIR}/references/platform-capabilities.md - Deprecated patterns Elle might still use
- Capabilities Elle doesn't leverage yet
1C. Skill-Creator Best Practices
Find and read the installed skill-creator:
Glob: ~/.claude/plugins/cache/*/skill-creator/*/skills/skill-creator/SKILL.md
Extract:
- Current skill writing patterns
- Progressive disclosure guidelines (line limits, reference files)
- Description optimization guidance (triggering, "pushy" descriptions)
- Testing and evaluation patterns
1D. Superpowers & Platform Patterns
Check official plugins for patterns worth adopting:
Glob: ~/.claude/plugins/cache/*/superpowers/*/skills/*/SKILL.md
Note structural patterns, frontmatter conventions, or composition techniques.
1E. Plugin Architecture
Understand the full plugin specification and what Elle could be using:
-
Read the marketplace conventions (source mode only):
Read: <cwd>/CONVENTIONS.md -
Examine other installed plugins for structural patterns:
Glob: ~/.claude/plugins/cache/*/*/skills/*/SKILL.md Bash: ls ~/.claude/plugins/cache/*/*/ | head -50Focus on: plugin.json fields, agents/ directories, MCP server integration, output style patterns.
-
Compare Elle's plugin structure against the full spec:
- What directories does Elle have vs. what's available?
- What plugin.json fields exist vs. what Elle uses?
- Are other plugins using agents/ or MCP servers effectively?
1F. Model Capability Assessment
Check for model-level improvements that may supersede Elle's skills:
-
Search for recent Anthropic model releases:
WebSearch: "Anthropic Claude" model release OR capability update site:anthropic.com WebSearch: "Claude" new capabilities 2026 -
Fetch the models overview page:
WebFetch: https://docs.anthropic.com/en/docs/about-claude/models -
Extract capabilities relevant to skill obsolescence:
- Built-in tool use improvements (web search, code execution)
- Reasoning and planning improvements
- Multi-step task handling
- Areas where dedicated prompting/orchestration adds less value
-
Compare against Elle's skill inventory:
- For each skill, ask: "Does the model now do this well enough without specialized prompting?"
- Flag skills where the answer is "yes" or "probably"
-
Cross-reference against
${CLAUDE_SKILL_DIR}/references/platform-capabilities.mdModel Capabilities table to detect changes since last run.
Research Output
After all research completes, compile a Research Summary with sections:
- New Claude Code Features (from changelog + docs)
- Updated Skill Best Practices (from skill-creator)
- Platform Patterns (from superpowers and other plugins)
- Plugin Architecture Gaps (capabilities Elle doesn't use yet)
- Deprecations & Breaking Changes
- Model Capability Overlap (skills potentially superseded by model improvements)
In source mode: save to <cwd>/.docs/upgrade-research/personal-assistant-<date>.md
In deployed mode: present in-session only.
Phase 1.5: Obsolescence Screen
Using research findings from Phase 1 (especially 1A, 1B, and 1F), evaluate whether each skill, agent, and major component in Elle is still necessary.
Classification
For each skill and agent, assign one of:
| Rating | Meaning | Action |
|---|---|---|
| Active | Platform/model doesn't replicate this. Skill adds clear value. | Proceed to structural audit in Phase 2 |
| Augmented | Platform/model handles the basics, but skill adds meaningful structure, guardrails, or workflow orchestration on top. | Audit normally, but note what the platform handles natively |
| Superseded | Platform/model now does this natively with comparable quality. Skill adds marginal value over a direct prompt. | Skip structural audit. Recommend removal in Phase 3 plan. |
How to Evaluate
For each skill, answer these three questions:
-
What does this skill do that a direct prompt to Claude cannot? If the answer is only "it saves typing a prompt" -- likely Superseded.
-
Does this skill enforce a process, workflow, or multi-step structure? Process skills (TDD, debugging, code review) remain valuable even when the model can do each step -- the skill enforces discipline. Likely Active.
-
Has the platform added a native feature that replaces this skill's core function? Example: Claude Code added native web search -> a "web research" skill that just wraps web search is Superseded. But a "competitor analysis" skill that orchestrates multiple searches into a structured report may be Augmented.
Output
Present results as a table:
| Component | Type | Rating | Rationale |
|---|---|---|---|
| [name] | skill/agent | Active/Augmented/Superseded | [1-line reason] |
If any component is rated Superseded, flag it for the user:
Obsolescence flag: [N] component(s) appear superseded by platform/model capabilities. These will appear in the "Recommend Removal" section of the upgrade plan. Review the rationale -- override to Active or Augmented if you disagree.
Proceed to Phase 2, skipping structural audit for Superseded components.
Phase 2: Audit
2A. Skill Inventory
For each skill in ${ELLE_ROOT}/skills/, read the SKILL.md and evaluate:
- Frontmatter fields (follows latest conventions from Phase 1?)
- Line count (warn if over 500)
- Whether it uses
references/orscripts/for progressive disclosure - Composition patterns (
plugin:skillsyntax used correctly?) - Hardcoded paths or stale assumptions
- Obsolescence rating from Phase 1.5 (skip detailed audit for Superseded components)
2B. Architecture Audit
Check the deployed system state:
cat ${ELLE_ROOT}/hooks/hooks.json
ls -la ~/.claude/rules/elle-core.md
ls ~/.claude/.context/core/
cat ~/.claude/settings.json | python3 -c "import sys,json; d=json.load(sys.stdin); print(json.dumps(d.get('hooks',{}), indent=2))" 2>/dev/null
Verify:
- SessionStart hooks registered (not UserPromptSubmit)
- elle-core.md exists in rules/
- No duplicate notification hooks between plugin and settings.json
- Output style set correctly
- All expected context files present
2C. Gap Analysis
Compare current state against Phase 1 research:
| Category | Check |
|---|---|
| Skill structure | Frontmatter follows latest conventions? Under 500 lines? Progressive disclosure? |
| Descriptions | Triggering descriptions specific enough per skill-creator guidance? |
| Patterns | Uses latest Claude Code features? No deprecated patterns? |
| Composition | Follows plugin:skill invocation syntax? |
| Hook design | Non-blocking? SessionStart < 2 seconds? Uses additionalContext? |
| Plugin architecture | Using all beneficial plugin components (agents, MCP, output styles)? |
| System state | Platform capabilities and best practices references current? |
2D. Reference Freshness
Compare ${CLAUDE_SKILL_DIR}/references/platform-capabilities.md and best-practices.md against research findings. Flag entries that are:
- Outdated (platform has changed)
- Missing (new capabilities not listed)
- Wrong (deprecated patterns still listed as current)
2E. Full Plugin Inventory
Check all plugin components, not just skills:
echo "=== Skills ===" && ls ${ELLE_ROOT}/skills/
echo "=== Hooks ===" && cat ${ELLE_ROOT}/hooks/hooks.json 2>/dev/null
echo "=== Output Styles ===" && ls ${ELLE_ROOT}/output-styles/ 2>/dev/null
echo "=== Plugin.json ===" && cat ${ELLE_ROOT}/.claude-plugin/plugin.json
echo "=== Agents ===" && ls ${ELLE_ROOT}/agents/ 2>/dev/null || echo "No agents/ directory"
Evaluate:
- Output styles: leveraging latest features? Under 120 lines?
- Plugin.json: all available fields used correctly?
- Missing components: would agents/ or MCP servers benefit Elle?
- Structural patterns: anything other plugins do that Elle should adopt?
2F. System State Check
Read ${CLAUDE_SKILL_DIR}/references/platform-capabilities.md System State section:
- Last evolve run date -- flag if > 60 days ago
- Claude Code version at last audit -- compare against latest from Phase 1A
- Elle version -- compare against plugin.json
- Platform docs last fetched -- flag if > 60 days ago
If System State section is missing, flag as "first evolve run" and note that all findings are new.
Phase 3: Plan
Present a structured upgrade plan:
## Upgrade Plan: personal-assistant (Elle)
### Summary
- Current version: [from plugin.json]
- Proposed version: [with semver justification]
- Skills affected: N
- Reference files to update: N
### High Priority (Breaking/Deprecated)
1. [Change]
- **Why**: [What's wrong or deprecated]
- **What**: [Specific change]
- **Risk**: [What could break]
### Superseded (Recommend Removal)
Components flagged as Superseded in Phase 1.5. Review before approving removal.
1. [Component name] ([type])
- **What it does**: [1-line summary]
- **What replaces it**: [platform feature or model capability]
- **Migration**: [user-facing steps if any -- update docs, remove references]
- **Risk**: [what's lost if removed]
### Medium Priority (New Features)
1. [Change]
- **Why**: [What capability this enables]
- **What**: [Specific change]
### Low Priority (Polish)
1. [Change]
### Not Recommended
- [Considered but rejected, and why]
For removed components in source mode:
- Add CHANGELOG entry under "Removed"
- Version bump: MINOR if user-invocable skill, PATCH if internal-only component
Phase 4: Execute
Apply approved changes:
- One file at a time -- read the full file before editing
- Preserve intent -- upgrade patterns without changing what skills do
- Explain reasoning -- prefer explaining why over heavy-handed MUSTs
- Keep prompts lean -- remove things that aren't pulling their weight
Reference File Updates
Update ${CLAUDE_SKILL_DIR}/references/platform-capabilities.md and best-practices.md with findings from Phase 1. These files serve as "last known state" for future evolve runs.
Model Capabilities Update
Update the Model Capabilities table in ${CLAUDE_SKILL_DIR}/references/platform-capabilities.md with findings from Phase 1F. Add new capabilities, update proficiency levels, and revise skill design implications.
System State Update
After all changes are applied, update the System State section in ${CLAUDE_SKILL_DIR}/references/platform-capabilities.md:
| Field | Value |
|---|---|
| Elle version | [from plugin.json after bump] |
| Last evolve run | [today's date] |
| Claude Code version at last audit | [latest version from Phase 1A research] |
| Platform docs last fetched | [today's date] |
| Model capabilities last assessed | [today's date] |
Version and Changelog (Source Mode Only)
After all changes:
- Bump version in
${ELLE_ROOT}/.claude-plugin/plugin.jsonper semver - Update
${ELLE_ROOT}/CHANGELOG.mdin Keep a Changelog format
Phase 5: Verify
Validation Checks
for skill in $(find ${ELLE_ROOT}/skills -name "SKILL.md"); do
echo "=== $skill ==="
wc -l "$skill"
head -5 "$skill"
echo ""
done
- Diff review (source mode) -- show
git difffor user review - Frontmatter check -- all skills have valid frontmatter with name and description
- Line count -- no SKILL.md over 500 lines
- Cross-reference check -- any
plugin:skillreferences still valid - Convention check -- changes follow CONVENTIONS.md
Deferred Findings
For recommendations the user deferred or rejected, log to ~/.claude/.context/core/improvements.md as Active Proposals:
### ENHANCEMENT evolve-deferred -- [Short description]
- **Evidence**: Evolve audit [date] -- [what was found]
- **Current behavior**: [what exists now]
- **Proposed change**: [what was recommended]
- **Status**: Deferred
- **Source**: Evolve audit
Final Summary
Report:
- Changes made (with file paths)
- Version bump applied (if source mode)
- CHANGELOG entry (if source mode)
- Reference files updated
- Deferred items logged to improvements.md
- Any manual follow-up needed