test-designer
Test Designer
Independent test-design orchestrator. Encodes Independent Evaluation: the agent writing the tests must not be the agent implementing the feature, and must not inherit the implementation's assumptions.
When to Use
- TDD red phase for a complex / non-trivial feature (multi-file, multi-branch logic, new subsystem)
- Requirement is ambiguous enough that the implementer's tests would likely rationalize the implementation instead of catching bugs
- User explicitly asks for "independent test design", "fresh-eyes tests", or runs
/test-designer
Don't use for:
- Trivial changes (one-line fix, rename) — just write the test inline
- Bug reproduction tests — write directly from the bug report
- Non-code changes (pure docs, pure config, pure prompt)
The Iron Law
The agent designing the tests must not carry the implementation's context. If you (the main Agent) are about to implement the feature, you are disqualified from designing its tests. Dispatch.
Violating this = tests that pass because they mirror the buggy implementation.
Steps
Step 1: Assemble the dispatch package
Collect only these inputs — nothing else:
- Requirement description — "what to do" and acceptance criteria (not "how to do")
- Relevant code file paths — read-only access to the code the feature will touch or integrate with
- Edge case prompts — categories the dispatched agent should enumerate:
- Boundary inputs (empty, max, min, off-by-one)
- Concurrency / ordering (if applicable)
- Resource lifecycle (cleanup on error, partial failure)
- Invariants (data consistency, idempotency)
- Adversarial inputs (malformed, oversized, mis-encoded)
Explicitly exclude:
- The implementation plan or design you've been developing
- Hints about which approach you've chosen
- Code excerpts from a work-in-progress branch
- Your own guesses about "the right way to test this"
Step 2: Choose the executor
| Task shape | Executor | Reason |
|---|---|---|
| Complex, architectural implications | Independent Agent (e.g., codex-agent or claude-code-agent with fresh session) |
True zero-context isolation; can use strongest model at highest effort |
| Medium complexity, current conversation clean | In-conversation subagent | Cheaper; still acceptable if main Agent hasn't yet proposed an implementation |
| Trivial | Don't dispatch — write tests inline |
Default to Independent Agent when the main Agent has already discussed or sketched implementation. Subagent isolation within the same conversation doesn't undo prior context pollution.
Step 3: Dispatch with the strongest model and highest effort
Test design is a correctness-critical reasoning task, not a rote mechanical one. Use:
- Model: strongest reasoning model the runtime offers — inherit if the main Agent is already on that tier; otherwise override. Don't hardcode a specific brand name
- Effort:
xhigh(the maximum level the runtime supports). Escalation ladder:low→medium→high→xhigh - Tools: Read / Grep / Glob on code paths; Write on test files only
- Permission: read-only on non-test files; writable on test files
Example dispatch prompt skeleton:
You are designing failing tests for a feature. You will NOT see or write the
implementation. Your job is to produce executable tests that fail today and
pass only when the feature is correctly implemented.
Requirement:
<paste requirement description + acceptance criteria>
Code paths (read-only, for understanding context):
<list of file paths>
Existing test framework and conventions:
<infer from repo or specify>
Produce:
1. A test plan — enumerate the behaviors being tested (happy path + edge
cases), grouped by category (boundary / concurrency / lifecycle /
invariants / adversarial).
2. Executable test files that fail against the current code (or against
an empty implementation).
3. For each test, one-line rationale explaining the bug it would catch.
Constraints:
- Do NOT propose an implementation.
- Do NOT edit files outside the test directory.
- Cover edge cases explicitly; don't only test the happy path.
- Use the project's existing test framework and style.
Step 4: Validate the returned tests
Before handing the tests to the implementation phase:
- Run the tests — they should FAIL (red), and fail for the reason the rationale predicts. A test that fails on
ImportError, missing fixture, syntax error, or "module not found" is fake red — the test isn't actually exercising the behavior it claims to. Fix the test or drop it. - Scan the rationale — does each test catch a distinct failure mode? Drop duplicates.
- Check coverage — are all edge case categories represented? Request additions if not.
- Confirm the test framework matches — ensure the dispatched agent used the right runner / assertion lib / fixtures.
- Check for shape-to-example tests — a test that asserts on specific happy-path values (e.g., "output equals exactly
[1, 2, 3]for this fixture") is shaping the test to the example, not to the requirement. Such a test passes when the implementation matches the fixture and breaks for any valid variant input. Replace with property-style assertions ("output is sorted and contains all input elements") or add a second test with a different input that exercises the same property.
Step 5: Hand off to implementation
With the validated failing tests in place, implementation proceeds per test-driven-development skill: write minimal code to make them pass (green), then regression.
Output Format (from the dispatched agent)
Require the agent to return:
A test plan (bullet list, grouped by category) followed by the test files. Each test must include a one-line rationale comment. No implementation code. No commentary on how to implement. If assumptions about the code are needed, list them explicitly at the top of the test file.
Anti-patterns
- ❌ Main Agent writes the tests after sketching the implementation — tests will mirror the implementation's assumptions
- ❌ Dispatching with medium effort / weaker model to save cost — test design quality compounds across the whole feature's lifetime
- ❌ Passing the work-in-progress branch contents to the dispatched agent — defeats Independent Evaluation
- ❌ Accepting tests that pass against an empty implementation — those tests don't constrain anything
- ❌ Skipping Step 4 validation — unvalidated tests get merged as fake green
- ❌ Accepting "shape-to-example" tests — a test that asserts on specific happy-path values from the requirement's example data passes whenever input==fixture and breaks for any variant. Use property assertions (sorted, idempotent, contains-all-inputs) or pair the example test with a variant-input test that exercises the same invariant
- ❌ Accepting fake red — a test that fails on
ImportError, missing fixture, or "module not found" looks red but isn't testing anything. Step 4 must verify the test fails for the reason the rationale predicts
Relationship to other skills
brainstorming→ clarifies the requirement (upstream oftest-designer)test-driven-development→ governs the red-green-refactor loop (downstream; consumes the failing tests)systematic-debugging→ kicks in if tests unexpectedly fail after implementation (downstream)verification-before-completion→ runs the tests at the "done" gate (downstream)
Example invocation
User: Starting work on the new plugin dependency resolver. Design tests first.
Assistant:
1. Requirement: "Resolver takes a plugin manifest and returns install order
respecting deps and detecting cycles. Must handle: transitive deps,
diamond deps, self-references, missing deps, cycles."
2. Code paths: src/plugins.ts, .claude/plugins.json schema, tests/ dir
3. Dispatch to an independent-agent skill (fresh session) at `xhigh` effort,
read-only on src/, writable on tests/
4. Agent returns: test plan (5 categories, 18 tests), tests/resolver.test.ts
with failing assertions + per-test rationale comments
5. Main Agent runs tests → all red → validates rationale → hands off
More from ben2pc/g-claude-code-plugins
deep-review
Run a formal, multi-dimensional code review of a pull request. Reads the PR diff, classifies change types, dispatches parallel reviewers by dimension (spec-conformance, correctness incl. test quality, docs-sync, plus conditional robustness/UX/performance/structure and code-quality for non-trivial changes), and synthesizes findings into an actionable punch list. Use when the user asks to review a PR, run /deep-review, mark a PR as ready for review, or requests a formal/thorough code review.
25parallel-implementation
Plan how to slice a non-trivial coding task across parallel subagents. Returns a dispatch plan (file assignments, dependencies, output-format contracts) — the main Agent then executes it with the Agent tool + `isolation: \"worktree\"`. Invoke only when work justifies multi-agent overhead: (a) greenfield 0→1 across multiple independent modules, (b) change touches ≥3 modules, or (c) ≥5 files each with >50 lines of diff. Small changes write inline.
25codex-agent
通过 Codex CLI 将编码、审查、诊断、规划、结构化输出和本机浏览器调研任务委派给独立的 Codex 会话。使用场景包括 `codex exec` 新建任务、`codex exec resume` 续接多轮会话、`codex exec review` 做只读审查,以及需要 `--json` 事件流、`-o` 最终消息落盘、图片输入或 Computer Use 浏览器操作时。
22claude-code-agent
通过 Claude Code CLI 的 `claude -p` Agent SDK 入口,将编码、审查、诊断、规划和结构化输出任务委派给独立 Claude Code 会话。使用场景包括 `--resume` / `--continue` 续接多轮会话、`--output-format json` 单结果输出、`stream-json` 事件流,以及需要 `--worktree`、工具白名单、`--bare` 或结构化 JSON 输出的 scripted / CI 调用。
10ip-diagnosis
在 macOS + Chrome 上排查公网 IPv4/IPv6 出口、国家/地区、ASN/组织、DNS、默认路由、utun 状态,以及浏览器侧 Server Response 与 WebRTC 暴露情况。适用于用户要求检查 IP、地区一致性、VPN/代理接管情况、IPv6 问题或浏览器网络暴露,并输出详细运维报告与复查链接。
7claude-remote
Manage **remote-control** Claude Code sessions (i.e. `claude --remote-control`, NOT regular local sessions) running in Terminal.app — start, stop, list, or resume them for mobile / remote desktop access. Triggers on "start/open/launch a remote claude", "stop/kill remote session", "list remote sessions", "continue last remote conversation in <dir>".
2