Test Architect

Design test strategies, analyze coverage gaps, identify edge cases, diagnose flaky tests, and audit test suite architecture.

Scope: Test design and analysis only. NOT for running tests or CI/CD (devops-engineer), code review (honest-review), or TDD workflow.

Dispatch

$ARGUMENTS	Action
`design <feature/module>`	Design test strategy and pyramid for a feature or module
`generate <file/function>`	Generate test cases (strategy text or actual test code based on context)
`gaps`	Analyze coverage gaps from coverage reports
`edge-cases <function>`	Systematic edge case identification for a function
`flaky`	Diagnose flaky tests from logs and code
`review`	Audit test suite architecture
Empty	Show mode menu with examples

Canonical Vocabulary

Use these terms exactly throughout all modes:

Term	Definition
test pyramid	Layered test distribution: unit (base), integration (middle), e2e (top)
coverage gap	Code path with no test coverage, weighted by complexity risk
edge case	Input at boundary conditions, null/empty, type coercion, overflow, unicode, concurrent
flaky test	Test with non-deterministic pass/fail behavior across identical runs
mutation score	Percentage of injected mutations detected by the test suite
test strategy	Document defining what to test, how, at what layer, with what tools
property-based test	Test asserting invariants over generated inputs (Hypothesis/fast-check)
test isolation	Guarantee that tests do not share mutable state or execution order dependencies
fixture	Reusable test setup/teardown providing controlled state
test surface	Set of public interfaces, code paths, and states requiring test coverage

Mode 1: Design

/test-architect design <feature/module>

Surface Analysis

Read the feature/module code. Map the test surface: public API, internal paths, state transitions, error conditions.
Classify complexity: simple (pure functions), moderate (I/O, state), complex (distributed, concurrent, multi-service).

Pyramid Design

Design test pyramid:
- Unit layer: Pure logic, transformations, validators. Target: 70-80% of tests.
- Integration layer: Database, API, file I/O, service boundaries. Target: 15-25%.
- E2E layer: Critical user flows only. Target: 5-10%.
For each layer, list specific test cases with: description, input, expected output, rationale.
Recommend framework and tooling based on language/ecosystem.
Output: structured strategy document with pyramid diagram, case list, and priority order.

Reference: read references/test-pyramid.md for layer guidance.

Mode 2: Generate

/test-architect generate <file/function>

Read the target file/function. Identify signature, dependencies, side effects.
Determine output format:
- If test file exists for target: generate actual test code matching existing patterns.
- If no test file exists: generate test strategy text with case descriptions.
- If user specifies --code: always generate test code.
Generate test cases covering:
- Happy path (expected inputs and outputs)
- Error path (invalid inputs, exceptions, timeouts)
- Edge cases (run edge-case-generator.py if function has typed parameters)
- Boundary conditions (min/max values, empty collections, null)
Follow framework conventions: read references/framework-patterns.md for pytest/jest/vitest patterns.
Output: test cases or test code with clear section headers per category.

Mode 3: Gaps

/test-architect gaps

Locate coverage reports. Search for:
- coverage.json, coverage.xml, .coverage (Python/coverage.py)
- lcov.info, coverage/lcov.info (JS/lcov)
- htmlcov/, coverage/ directories

Run coverage analyzer:

uv run python skills/test-architect/scripts/coverage-analyzer.py <report-path>

Parse JSON output. Rank gaps by complexity-weighted risk.
For each gap, assess:
- What code is untested and why it matters
- Complexity score (cyclomatic complexity proxy)
- Recommended test type (unit/integration/e2e)
- Priority (P0: security/auth, P1: core logic, P2: utilities, P3: cosmetic)

Render dashboard if 10+ gaps:

Copy templates/dashboard.html to a temporary file
Inject gap data JSON into <script id="data"> tag
Open in browser

Output: prioritized gap list with recommended actions.

Reference: read references/coverage-analysis.md for interpretation guidance.

Mode 4: Edge Cases

/test-architect edge-cases <function>

Read the function. Extract parameter types, return types, and constraints.

Run edge case generator:

uv run python skills/test-architect/scripts/edge-case-generator.py --name "<function_name>" --params "<param1:type,param2:type>"

Parse JSON output. Review generated categories:
- Null/empty: None, "", [], {}, 0, False
- Boundary: min/max int, float limits, string length limits
- Type coercion: "123" vs 123, True vs 1, None vs "null"
- Overflow: large numbers, deep nesting, long strings
- Unicode: emoji, RTL text, zero-width chars, combining marks
- Concurrent: race conditions, deadlocks, stale reads
For each edge case, provide: input value, expected behavior, rationale.
Flag cases where current code would likely fail (no guard, no validation).

Reference: read references/edge-case-heuristics.md for category details.

Mode 5: Flaky

/test-architect flaky

Log Collection

Locate test result logs. Search for:
- CI logs, pytest output, jest output
- .pytest_cache/, test-results/
- Ask user for log path if not found

Run flaky test analyzer:

uv run python skills/test-architect/scripts/flaky-test-analyzer.py <log-path>

Root Cause Classification

Parse JSON output. For each flaky test:
- Failure count vs pass count
- Failure pattern (timing, ordering, resource, state)
- Likely root cause classification:
  - Timing: sleep/timeout dependencies, race conditions
  - Ordering: test execution order dependencies
  - Resource: external service, database, file system
  - State: shared mutable state between tests
  - Environment: platform-specific, timezone, locale
Recommend fix strategy per root cause.
Prioritize by failure frequency and blast radius.

Reference: read references/flaky-diagnosis.md for root cause patterns.

Mode 6: Review

/test-architect review

Scan the test suite. Map: test file count, framework(s), directory structure.
Assess architecture dimensions:
- Pyramid balance: ratio of unit:integration:e2e tests
- Isolation: shared state, global fixtures, test ordering dependencies
- Naming: consistency, descriptiveness, convention adherence
- Coverage distribution: even vs clustered coverage
- Fixture health: duplication, complexity, setup/teardown balance
- Assertion quality: specific assertions vs generic assertTrue
- Speed: identify slow tests (>1s unit, >10s integration)
- Determinism: potential flakiness indicators
Run coverage analyzer if reports exist.
Cross-reference with source code:
- Untested public APIs
- Tests for deleted/renamed code (orphaned tests)
- Missing negative test cases
Output: architecture audit report with scores per dimension, findings, and recommendations.

Reference: read references/test-suite-audit.md for scoring criteria.

Reference Files

Load ONE reference at a time. Do not preload all references into context.

File	Content	Read When
references/test-pyramid.md	Test pyramid layers, distribution targets, anti-patterns	Mode 1 (Design)
references/framework-patterns.md	pytest, jest, vitest patterns and conventions	Mode 2 (Generate), Mode 6 (Review)
references/coverage-analysis.md	Coverage report interpretation, complexity weighting	Mode 3 (Gaps)
references/edge-case-heuristics.md	Edge case categories by data type, generation strategies	Mode 4 (Edge Cases)
references/flaky-diagnosis.md	Flaky test root causes, fix strategies, prevention patterns	Mode 5 (Flaky)
references/test-suite-audit.md	Test architecture scoring rubric, quality dimensions	Mode 6 (Review)
references/property-testing.md	Property-based testing with Hypothesis and fast-check	Mode 1 (Design), Mode 2 (Generate)
references/mutation-testing.md	Mutation testing plan design, tool integration	Mode 1 (Design), Mode 6 (Review)

Script	When to Run
scripts/coverage-analyzer.py	Mode 3 (Gaps) -- parse coverage reports
scripts/edge-case-generator.py	Mode 4 (Edge Cases) -- generate edge cases from function signature
scripts/flaky-test-analyzer.py	Mode 5 (Flaky) -- parse test logs for flaky indicators

Template	When to Render
templates/dashboard.html	Mode 3 (Gaps) with 10+ gaps -- coverage gap visualization

Critical Rules

Never run tests -- design and analyze only. Suggest commands but do not execute.
Never modify source code -- test architecture is advisory, not implementation.
Always recommend the correct test layer (unit/integration/e2e) for each test case.
Edge cases must include rationale -- "why this matters" not just "try this input."
Coverage gaps must be prioritized by risk, not by line count.
Flaky test diagnosis must identify root cause category before recommending fixes.
Framework recommendations must match the project's existing stack.
Property-based testing is recommended only when invariants are identifiable.
Load ONE reference file at a time -- do not preload all references.
Every finding must cite the specific file and function it applies to.
Test generation must follow existing test patterns in the project when present.
Dashboard rendering requires 10+ gaps -- do not render for small gap sets.

test-architect

Test Architect

Dispatch

Canonical Vocabulary

Mode 1: Design

Surface Analysis

Pyramid Design

Mode 2: Generate

Mode 3: Gaps

Mode 4: Edge Cases

Mode 5: Flaky

Log Collection

Root Cause Classification

Mode 6: Review

Reference Files

Critical Rules

More from wyattowalsh/agents

orchestrator

honest-review

skill-creator

wargame

add-badges

email-whiz