mutate
/mutate
Use mutation testing to measure test strength on a narrow, already-stabilized scope.
This skill is for adapter selection, scoped execution, and survivor triage.
It complements crap:
crapranks risk from complexity plus coverage.mutatefinds assertions and scenarios your tests still miss.- Mutation testing does not directly raise or lower CRAP. CRAP moves only when code complexity or coverage changes.
Modes
- Analysis mode: choose adapter, scope, and run plan without executing.
- Execution mode: run a scoped mutation pass, triage survivors, improve tests, and rerun.
Enter execution mode only when the user explicitly asks to run it, for example:
/mutaterun mutation testingmutate this filekill the surviving mutantsharden these tests
Supported v1 Adapters
Use references/adapter-matrix.md to pick the right adapter and command family.
rust:cargo-mutantspython:mutmutjavascript/typescript:StrykerJS
If the target language or build stack has no credible adapter, stop and say so. Do not invent generic mutation commands.
Runtime Contract
Resolve and run scripts/analyze_mutants.py relative to this SKILL.md.
Use references/state-model.md for the normalized
ledger schema and references/one-shot-loop.md
for the execution loop.
Examples:
python3 scripts/analyze_mutants.py
python3 scripts/analyze_mutants.py /path/to/repo --top 20
python3 scripts/analyze_mutants.py /path/to/repo --write-ledger
python3 scripts/analyze_mutants.py /path/to/repo --adapters mutmut,stryker --top 20
Hand-Off From crap
Use crap first when the repo is broad, noisy, or has weak coverage data.
- Prioritize coverage bootstrap, characterization tests, and simplification
while the scoped
FINAL_SCOREis>= 30. - Start mentioning or running
mutatewhen the target scope is below30, coverage is numeric, the baseline tests are green, and the scope is narrow enough to mutate cheaply. - Treat
FINAL_SCORE < 8as a healthy end-state for stabilized hotspots, not as the prerequisite for runningmutate. - If the user explicitly wants mutation testing while the scoped score is
still
>= 30, keep the scope tight and warn that the signal may be noisy until the hotspot stabilizes.
Baseline Contract
Before any mutation run:
- Identify the canonical test command for the chosen scope.
- Run that baseline first. If it is red, stop and fix or narrow scope before mutating.
- Confirm the workspace is clean enough to reason about generated artifacts and temporary edits.
- Prefer disposable outputs over in-place mutation. If the adapter mutates the worktree in place, warn before continuing and verify cleanup afterward.
- Keep scope explicit: file, module, or package. Do not silently mutate the whole repo unless it is already small and stable.
Scope Discipline
- Name the exact target file, module, or package in every close-out.
- Prefer one hotspot file or one package per run.
- If the adapter supports file filters, use them.
- If the adapter does not support fine scope directly, narrow via config or test selection rather than mutating the whole repo silently.
- Do not compare a narrow mutation pass to repo-wide health claims.
Output Flow
Phase 1: Choose adapter and scope
- Detect language and build tool.
- Read references/adapter-matrix.md.
- State which adapter you are using and why.
- If multiple adapters are possible, prefer the repo's existing one.
- Run
scripts/analyze_mutants.pyfirst when mutation artifacts already exist so you have a deterministic backlog before launching more test hardening.
Phase 2: Establish baseline
- Run the canonical test path for the target scope.
- Stop if the baseline is red or flaky.
- If the adapter depends on coverage or test-mapping metadata, generate it with the repo-native path.
Phase 3: Run mutation testing
- Use the narrowest practical target.
- Prefer config files already committed in the repo.
- Keep raw adapter output concise but preserve counts for:
- killed mutants
- surviving mutants
- timeouts, no coverage, and equivalent-style noise when the adapter reports them
- When the repo wants durable tracking, rerun
scripts/analyze_mutants.py --write-ledgerafter the mutation pass so the backlog is normalized into.mutate/ledger.json.
Phase 4: Triage survivors
Bucket each survivor before writing production code:
- missing assertion or missing example
- incidentally covered code that lacks direct tests
- equivalent or low-value mutant worth excluding
- tooling noise or flaky timeout
Prefer stronger tests first. Change production code only when the survivor exposes confusing structure or accidental complexity.
Phase 5: Close the loop
- Rerun the baseline test path.
- Rerun the mutation pass for the same scope.
- If this work came from
crap, reruncrapon the same scope after the test changes land. - Report:
- scope
- adapter
- baseline status
- surviving mutants before and after
- exclusions added
- whether CRAP moved on rerun
One-Shot Execution
If the user explicitly asks to run the loop, use references/one-shot-loop.md.
Reporting Rules
- Treat
scripts/analyze_mutants.pyas the formatting source of truth for the normalized backlog report. - State the adapter and exact command family used.
- Separate real survivors from equivalent or noise buckets.
- Do not claim a higher mutation score automatically means lower CRAP.
- If you exclude mutants, name the exact rule or config location and justify it.
- Prefer test additions over broad exclusion lists.
- When the run follows
crap, frame the result ascrap -> mutate -> crap, not as a replacement for CRAP analysis. - Keep the final line machine-readable and unique:
FINAL_TODO: <value>.
More from build000r/skills
openclaw-client-bootstrap
Build a production-ready OpenClaw client setup for DigitalOcean, Tailscale, Telegram, and SPAPS using a reusable hardened template with read-only defaults and human approval. Use for "set up OpenClaw on a droplet", "create a first claw kit", "bootstrap client box", or approval-gated OpenClaw deployment work.
20unclawg-internet
Run self-service OpenClaw onboarding with browser device auth, agent machine-key provisioning, a soul interview, and discovery-mode setup. Use for "/unclawg-internet", "set me up", "connect to openclaw", "onboard me", "sign up for openclaw", or approval-gated setup.
15domain-scaffolder-backend
|
7unclawg-discover
Run multi-platform customer discovery across Reddit, Hacker News, Twitter/X, and LinkedIn, then output a ranked engagement feed for downstream workflows. Use for "/unclawg-discover", "find customers", "find leads", "find posts to reply to", "build engagement queue", or agent-builder prospecting.
3remotion-best-practices
Best practices for Remotion - Video creation in React. Use when working with Remotion compositions, animations, sequences, or video rendering. Covers project setup for a shared Remotion hub, animation patterns, timing/interpolation, audio, captions, and media handling.
3divide-and-conquer
Decompose complex work into independent parallel sub-agents with no write overlap, synthesize or consume a `WORKGRAPH.md` execution artifact, and launch describe-style worker briefs before review. Use before spawning multiple agents for multi-file, multi-domain, or naturally parallel tasks.
3