improving-tests
Test Improvement
Improve tests by making them behavioral, lean, and useful. Tests are a design tool, not a line-count sport.
Use TaskCreate / TaskUpdate to track:
- Choose mode
- Explore test structure
- Run coverage or failing-test loop
- Review with language agent
- Apply improvements one cluster at a time
- Verify and report
Phase 1: Choose Mode
$ARGUMENTS:
review→ identify weak, duplicate, brittle, or missing testsrefactor→ combine to table-driven/parametrized/test.each, remove wastecoverage→ add tests for uncovered business behaviortdd→ red-green-refactor loop for a feature or bugfull→ review + refactor + coverage- empty → ask what to do
If empty, ask one question with options: review existing, refactor tests, fill coverage gaps, TDD loop, or full improvement.
Testing Principles
- Test behavior through public interfaces, not implementation details.
- The module interface is the test surface.
- Mock only system boundaries: external APIs, network, time, randomness, filesystem, subprocesses.
- Do not mock your own internal collaborators just to make tests easy.
- Prefer integration-style tests when they give a clear, stable signal.
- One logical assertion per test case; multiple property checks are fine after one setup.
- Delete old shallow tests once deeper interface tests cover the behavior.
- No pointless tests for getters, constructors, default props, or generated glue.
Phase 2: Background Exploration
Spawn exploration agents in parallel when available:
Test structure scan:
- Find test files: *_test.go, test_*.py, *.test.ts, *.spec.ts
- Identify frameworks and helpers
- Find table-driven / parametrize / test.each patterns
- Locate mocks, fixtures, integration tests
Coverage analysis:
- Go: go test -coverprofile=/tmp/cc-cov.out ./... && go tool cover -func=/tmp/cc-cov.out
- Python: pytest --cov=. --cov-report=term-missing
- TypeScript: bun test --coverage
Exclude generated code, mocks, fixtures, type-only files, and trivial CLI entrypoints from coverage pressure.
Phase 3: TDD Mode
Use this for tdd, test-first, or red-green-refactor requests.
- Confirm the public interface and the first behavior.
- Write one failing test for one behavior.
- Run it and watch it fail for the expected reason.
- Implement the smallest code that passes.
- Run the narrow test.
- Repeat one vertical slice at a time.
- Refactor only when green.
Do not write all tests first. Bulk RED creates imagined tests coupled to guessed implementation.
Phase 4: Review and Improve
Based on language, use the appropriate test agent when available:
- Go →
go-tests - Python →
py-tests - TypeScript →
ts-tests - Web →
web-tests
Focus findings on:
- tests coupled to private helpers or call counts
- tests that should be table-driven / parametrized /
test.each - duplicate scenarios
- weak mocks (
mock.Anything, unspecced mocks, untypedvi.fn) hiding real behavior - missing success, error, and edge cases on business logic
- no usable seam for testing real behavior
Phase 5: Apply Improvements
Preferred consolidation patterns:
| Language | Pattern |
|---|---|
| Go | table-driven with t.Run(tc.name, ...) |
| Python | @pytest.mark.parametrize with pytest.param() |
| TypeScript | it.each([{ input, expected, name }]) |
Extract helpers only after 3+ repetitions and only when the helper improves readability. Hide setup noise; do not hide the behavior under test.
Phase 6: Verify and Report
Run relevant tests:
go test ./...
pytest -v
bun test
Output:
TEST IMPROVEMENT COMPLETE
=========================
Mode: review | refactor | coverage | tdd | full
Tests changed: N
Waste removed: N
Coverage: before → after (if measured)
Key improvements:
- file:line — change
Verification:
- <command> — pass/fail
If no tests or framework exist, report that and ask before creating a new testing stack.
More from alexei-led/cc-thingz
refactoring-code
Batch refactoring via MorphLLM edit_file. Use for "refactor across files", "batch rename", "update pattern everywhere", large files (500+ lines), 5+ edits in same file, or applying an approved architecture-deepening refactor.
3debating-ideas
Dialectic thinking — spawn thesis and antithesis agents to stress-test ideas, then synthesize and verify against code. Use when user says "debate", "argue both sides", "devil's advocate", "stress test this idea", "pros and cons of approach", or wants rigorous evaluation of a design decision.
3linting-instructions
Lint plugin agent/skill prompts against rules derived from Anthropic model cards (Opus 4.6, Sonnet 4.6). Use when authoring or reviewing skills and agents — "lint instructions", "audit prompts", "model card rules".
3learning-patterns
Extract learnings and generate project-specific customizations (CLAUDE.md, commands, skills, hooks). Use when user says "learn", "extract learnings", "what did we learn", "save learnings", "adapt config", or wants to improve Claude Code based on conversation patterns.
3documenting-code
Update project documentation based on recent changes. Use when user says "update docs", "document", "add documentation", "update readme", "write docs", or wants to improve documentation.
3analyzing-usage
>-
3