scenario-testing
Scenario-Driven Testing for AI Code Generation
Core Principle
The Iron Law: "NO FEATURE IS VALIDATED UNTIL A SCENARIO PASSES WITH REAL DEPENDENCIES"
Mocks create false confidence. Only scenarios exercising real systems validate that code works.
The Truth Hierarchy
- Scenario tests (real system, real data) = truth
- Unit tests (isolated) = human comfort only
- Mocks = lies hiding bugs
As stated in the principle: "A test that uses mocks is not testing your system. It's testing your assumptions about how dependencies behave."
When to Use This Skill
- Validating new functionality
- Before declaring work complete
- When tempted to use mocks
- After fixing bugs requiring verification
- Any time you need to prove code works
Required Practices
1. Write Scenarios in .scratch/
- Use any language appropriate to the task
- Exercise the real system end-to-end
- Zero mocks allowed
- Must be in
.gitignore(never commit)
2. Promote Patterns to scenarios.jsonl
- Extract recurring scenarios as documented specifications
- One JSON line per scenario
- Include: name, description, given/when/then, validates
- This file IS committed
3. Use Real Dependencies
External APIs must hit actual services (sandbox/test mode acceptable). Mocking any dependency invalidates the scenario.
4. Independence Requirement
Each scenario must run standalone without depending on prior executions. This enables:
- Parallel execution
- Prevents hidden ordering dependencies
- Reliable CI/CD integration
What Makes a Scenario Invalid
A scenario is invalid if it:
- Contains any mocks whatsoever
- Uses fake data instead of real storage
- Depends on another scenario running first
- Never actually executed to verify it passes
Common Violations to Avoid
Reject these rationalizations:
- "Just a quick unit test..." - Unit tests don't validate features
- "Too simple for end-to-end..." - Integration breaks simple things
- "I'll mock for speed..." - Speed doesn't matter if tests lie
- "I don't have API credentials..." - Ask your human partner for real ones
Definition of Done
A feature is complete only when:
- ✅ A scenario in
.scratch/passes with zero mocks - ✅ Real dependencies are exercised
- ✅
.scratch/remains in.gitignore - ✅ Robust patterns extracted to
scenarios.jsonl
Example Workflow
- Write scenario - Create
.scratch/test-user-registration.py - Use real dependencies - Hit real database, real auth service (test mode)
- Run and verify - Execute scenario, confirm it passes
- Extract pattern - Document in
scenarios.jsonl - Keep .scratch ignored - Never commit scratch scenarios
Why This Matters
- Unit tests verify isolated logic
- Integration tests verify components work together
- Scenario tests verify the system actually works
Only scenario tests prove your feature delivers value to users.
More from 2389-research/claude-plugins
omakase-off
This skill should be used as the entry gate for build/create/implement requests. Triggers on "build X", "create Y", "implement Z", "add feature", "try both approaches", "not sure which approach". Offers brainstorm-together or omakase (chef's choice parallel exploration) options. Detects indecision during brainstorming to offer parallel exploration.
15binary-re:static-analysis
Use when analyzing binary structure, disassembling code, or decompiling functions. Deep static analysis via radare2 (r2) and Ghidra headless - function enumeration, cross-references (xrefs), decompilation, control flow graphs. Keywords - "disassemble", "decompile", "what does this function do", "find functions", "analyze code", "r2", "ghidra", "pdg", "afl
15firebase-development:add-feature
This skill should be used when adding features to existing Firebase projects. Triggers on "add function", "create endpoint", "new tool", "add api", "new collection", "implement", "build feature". Guides TDD workflow with test-first development, security rules, and emulator verification.
15css-development:refactor
This skill should be used when refactoring existing CSS from inline styles or utility classes to semantic patterns. Triggers on "refactor CSS", "extract styles", "consolidate CSS", "convert inline", "clean up styles", "migrate to semantic". Transforms to semantic classes with dark mode and tests.
15binary-re:dynamic-analysis
Use when you need to run a binary, trace execution, or observe runtime behavior. Runtime analysis via QEMU emulation, GDB debugging, and Frida hooking - syscall tracing (strace), breakpoints, memory inspection, function interception. Keywords - "run binary", "execute", "debug", "trace syscalls", "set breakpoint", "qemu", "gdb", "frida", "strace", "watch memory
14binary-re:tool-setup
Use when reverse engineering tools are missing, not working, or need configuration. Installation guides for radare2 (r2), Ghidra, GDB, QEMU, Frida, binutils, and cross-compilation toolchains. Keywords - "install radare2", "setup ghidra", "r2 not found", "qemu missing", "tool not installed", "configure gdb", "cross-compiler
14