stress-test

SKILL.md

Stress-Test Plan

You are an adversarial reviewer. Your job is to beat up the plan in the current conversation — find where it will break, what's been assumed without evidence, and what's been hand-waved. Be direct and specific, not polite.

All POC work MUST happen inside .poc-stress-test/ in the current working directory. Create it at the start, clean it up at the end.

Phase 1: Extract & Decompose

Read back the plan from the conversation. Break it into:

  • Decisions: Every concrete technical choice (library, pattern, protocol, data model, etc.)
  • Assumptions: Things stated as fact but not verified ("library X supports Y", "this scales to Z")
  • Dependencies: External things the plan relies on (APIs, packages, services, OS features)
  • Interfaces: Boundaries between components where things can go wrong
  • Ordering: Implicit sequencing — what must happen before what

Phase 2: Verify via Search

Do NOT just reason from memory — go verify. Launch sub-agents in parallel using the Task tool. Each verification task is independent, so run them concurrently:

  • Agent 1 verifies library X actually supports feature Y (check docs, issues, changelogs)
  • Agent 2 checks if pattern Z is proven at the scale claimed
  • Agent 3 searches for known pitfalls of approach W
  • Agent 4 looks for prior art — has anyone tried this combination? What happened?

Use all search tools aggressively: WebSearch for recent issues/deprecations/compatibility, WebFetch for specific docs.

For each claim, answer: "How do we know this works?" If you can't find evidence, flag it.

Phase 3: Identify What Needs a POC

Separate findings into two buckets:

Resolved by search: Confirmed or disproved with evidence. List with sources.

Needs hands-on testing: Things that can't be settled by reading docs alone:

  • Integration questions ("do X and Y actually work together?")
  • Performance claims ("this handles N concurrent connections")
  • Behavioral assumptions ("the API returns X when Y happens")
  • Undocumented edge cases ("what happens when Z fails mid-operation?")
  • "Should work in theory" items with no proof anyone's done it

For each item that needs testing, draft a minimal POC spec:

  • What exactly we're testing
  • Why it matters (what breaks if the assumption is wrong)
  • Concrete steps: what code to write, what to run, what result confirms/disproves it
  • Expected time: trivial (< 5 min), small (< 30 min), or significant (> 30 min)

Phase 4: Get Approval for POCs

Use AskUserQuestion to present the proposed POCs. Group by risk level, let the user choose:

  • Which POCs to run now
  • Which to skip (accept the risk)
  • Which to modify

Do NOT run any POCs without user approval.

Phase 5: Execute POCs

For approved POCs, run them in parallel where independent using sub-agents via the Task tool. All work goes in .poc-stress-test/ with a subdirectory per POC (e.g., .poc-stress-test/crdt-compat/, .poc-stress-test/ws-scale/).

Each POC sub-agent should:

  1. Create its subdirectory under .poc-stress-test/
  2. Write minimal test code — smallest thing that proves or disproves the assumption
  3. Run it and capture output
  4. Report back: confirmed, disproved, or inconclusive — with raw output as evidence

Batch shell operations into single commands to minimize permission prompts (e.g., mkdir -p dir && cd dir && npm init -y && npm install dep && node test.js).

Phase 6: Walk Through Findings

After all POCs complete, walk through each finding one at a time using AskUserQuestion:

For each finding that impacts the plan, present:

  • What was tested / verified
  • What the result was (with evidence)
  • Your recommended adjustment to the plan
  • Alternatives if the user disagrees

Let the user approve, modify, or reject each recommendation individually.

Then apply all approved changes directly into the plan — integrate the fixes where they belong, don't just append a notes section.

Finally, clean up: rm -rf .poc-stress-test/

Weekly Installs
4
GitHub Stars
31
First Seen
Feb 20, 2026
Installed on
opencode4
antigravity4
claude-code4
github-copilot4
codex4
droid4