Testing Strategies

Use this skill when the main question is "what validation packet do we trust, what confidence level do we actually need, and what should the team do next?"

The job is not to dump a generic pyramid/trophy manifesto or write test code. The job is to:

normalize the current policy packet,
name which gate is actually being decided,
choose one primary policy mode,
name the smallest convincing layer mix,
separate local / PR / release / scheduled expectations,
state exception and flake rules honestly,
route implementation, debugging, release execution, accessibility, game launch, or performance work out immediately.

Read references/intake-packets-and-route-outs.md before handling an unfamiliar policy packet. Read references/gate-truth-and-release-handovers.md when the ambiguity is really about branch blockers vs release-only or platform-launch proof. Read references/validation-matrix.md when choosing the minimum convincing layer mix. Read references/handoff-boundaries.md when deciding whether testing-strategies, backend-testing, debugging, code-review, performance-optimization, or web-accessibility should own the next step.

When to use this skill

The team needs one change-based validation brief instead of vague “add more tests” advice
A developer or reviewer is asking what should run locally, on PR, before release, or on scheduled/nightly cadence
You need to decide whether unit, integration, contract, smoke, exploratory, or manual checks are actually required
A flaky or expensive suite problem is really a gate-policy problem
An escaped bug or incident needs the right regression ratchet without blindly expanding broad E2E coverage
Release-readiness or QA-signoff work needs to be tied back to the real change risk

When not to use this skill

The main task is implementing API/service/database/browser tests, fixtures, mocks, or testcontainers → use backend-testing or the stack-specific implementation skill
The main task is reproducing a failure, isolating why a test is red, or debugging flaky behavior → use debugging
The main task is judging one specific PR, diff, or merge request → use code-review
The main task is accessibility-heavy verification or visual review policy → use web-accessibility or web-design-guidelines
The dominant risk is performance benchmarking, load testing, or frame-budget policy → use performance-optimization or game-performance-profiler
There is no real change or decision point yet; first define what changed and what confidence decision must be made

Instructions

Step 1: Start from the policy packet already in hand

Use references/intake-packets-and-route-outs.md.

Normalize the current packet into one of these shapes:

change-risk-packet — a feature, bugfix, migration, API change, auth change, UI flow change, config/deploy change, or incident follow-up
gate-design-packet — the team is arguing about what belongs in local, PR, release, or scheduled gates
flake-cost-packet — the suite is slow, noisy, brittle, or expensive and policy is unclear
release-readiness-packet — staging smoke, signoff, rollout, or checklist work is present but layer ownership is fuzzy
incident-ratchet-packet — an escaped bug or outage fix needs the smallest lasting regression protection

Capture the minimum useful frame:

Packet: change-risk-packet
Change type: API contract + DB migration
Hotspots: compatibility, migration safety, permissions
Decision point: PR + release
Current evidence: unit tests only

Rule: start from the packet the user already has. Do not demand an ideal QA template before doing useful work.

Step 2: Name the gate truth before choosing layers

Say which decision point is real right now:

merge-gate-truth — branch-blocking evidence, required status checks, or review-blocking proof
release-gate-truth — staging smoke, rollout safety, rollback notes, permissions, packaging, or human signoff
scheduled-breadth-truth — nightly/cron/matrix coverage that improves confidence but should not block every PR

Rules:

Do not let protected-branch tooling masquerade as the whole test strategy; it only enforces the blocking subset.
Do not let release/platform checklists silently expand the PR gate when they are really launch or rollout ownership.
If more than one gate is present, name the primary gate and the follow-up gate explicitly.

Step 3: Choose one primary policy mode

Pick exactly one primary mode:

layer-selection — what validation layers prove the changed behavior?
gate-shaping — what belongs in local vs PR vs release vs scheduled loops?
flake-and-cost-policy — what should block, quarantine, move, or become informational?
incident-regression-ratchet — what is the lowest layer that would have caught the escaped bug?
release-confidence — what final smoke, checklist, or rollout proof is still honestly needed?

Optional: name one secondary mode, but do not flatten every testing conversation into the same checklist.

Step 4: Classify the risk tier and critical path

Use a small risk model:

Tier 0 — low risk: docs, comments, dead code deletion, isolated rename, obvious config metadata
Tier 1 — ordinary product change: routine feature or refactor with limited blast radius
Tier 2 — high risk: public API, migration, auth, billing, external integrations, state machines, concurrency, background jobs
Tier 3 — release-critical / incident-linked: escaped bugs, outage fixes, rollback-sensitive deploy paths, or critical customer journeys

Capture:

critical paths and users affected
failure cost: annoyance, feature break, trust damage, data loss, rollout risk, security risk
evidence already present: tests, screenshots, previews, logs, contract notes, rollout notes, checklists
the decision point: local confidence, PR/merge confidence, release confidence, or long-running scheduled breadth

If the change spans multiple tiers, plan for the highest one.

Step 5: Choose the smallest convincing layer mix

Use references/validation-matrix.md.

Default layer choices:

Unit / component / service when logic, validation, branching, or mapping is the main risk
Integration when wiring, DB semantics, middleware, serialization, jobs, or real dependency behavior matters
Contract / API-level when response shapes, schemas, events, or cross-service/client boundaries changed
Smoke / selective E2E when multiple layers must prove one critical end-to-end journey together
Manual exploratory / checklist validation when visual nuance, device variation, operational edge cases, or human signoff is still the honest answer

Always say both:

what is required now
what is intentionally out of scope for now and why

Examples:

“integration + contract now; no broad browser E2E because the user journey is unchanged”
“release smoke plus rollout checklist; no new unit tests because the only risk lives in staging config and deployment behavior”

Step 6: Separate local, PR, release, and scheduled expectations

A useful policy brief does not pretend one suite fits every loop.

Define the smallest truthful gate split:

Local — fast, cheap, developer-loop proof
PR / merge — changed-surface confidence for risky paths
Release — production-facing smoke, migration safety, rollout checks, or manual signoff items
Scheduled / nightly — broad matrices, expensive combinations, compatibility sweeps, long-running suites

Rules of thumb:

if a suite is too slow or flaky for PRs, move it deliberately instead of silently rerunning it forever
if branch protection / required status checks are in play, name only the checks that truly must block merge
if a release checklist exists, tie it back to the specific risk that still needs human proof
if store/platform launch checklists are now the dominant work, route to the launch or delivery owner instead of stuffing them back into merge coverage
if the packet is really just release coordination, say so instead of pretending every item is a test-layer choice

Step 7: Write explicit exception and flake rules

This step is where strategy becomes operational.

State:

which checks are blocking vs informational
when a flaky test should be quarantined, fixed, moved to scheduled, or removed from the gate
what explanation is required when no new regression coverage is added
whether coverage percentages matter here or are just background reporting
whether manual verification is temporary, release-only, or an intentional long-term choice

Good defaults:

repeated flake is a policy problem, not just a rerun ritual
quarantining can reduce CI noise, but must keep owner + timeout + follow-up visible
“coverage went up” is not proof that the risky scenario is protected
escaped bugs should ratchet in the lowest-layer regression that would actually have caught them

Step 8: Route the next owner immediately

This skill owns policy and confidence decisions, not all downstream work.

Typical route-outs:

backend-testing — write or repair the chosen API/service/database/fixture/contract tests
debugging — investigate why a suite is red, flaky, or environment-specific right now
code-review — judge whether one diff’s current evidence is good enough to approve
deployment-automation — own rollout execution, staging/prod verification sequencing, rollback steps, or release runbooks once the gate is chosen
game-ci-cd-pipeline — own engine/build pipeline implementation or stabilization when the problem is a game CI/CD surface, not policy selection
steam-store-launch-ops — own Steam-specific launch/store/runbook work when the remaining proof is release checklist, page readiness, or launch timing rather than merge confidence
web-accessibility / web-design-guidelines — handle accessibility-heavy or visual-governance validation packets
performance-optimization / game-performance-profiler — handle benchmark, load, latency, or frame-budget policy when performance is the actual dominant risk

If the user asks “what should we test?” stay here. If they ask “how do we write or stabilize those tests?” route out.

Step 9: Produce one concise validation brief

Preferred format:

# Validation Strategy Brief

## Policy packet
- Packet:
- Gate truth:
- Primary mode:
- Risk tier:
- Decision point:

## Required validation now
1. [Layer] ... because ...
2. [Layer] ... because ...
3. [Manual / release check] ... because ...

## Gate split
- Local:
- PR:
- Release:
- Scheduled:

## Out of scope for now
- ...

## Exception / flake policy
- ...

## Next owner
- `backend-testing` / `debugging` / `code-review` / other

A short, explicit brief beats a giant testing manifesto. If the honest answer is “do less, but at the right layer,” say that directly.

Output format

Always return a validation strategy brief, gate-shaping memo, or regression-ratchet brief.

Required qualities:

identify the packet already in hand
name the real gate being decided before expanding into more layers
choose one primary policy mode
classify risk and critical path explicitly
separate local, PR, release, and scheduled expectations when relevant
explain intentional exclusions instead of hand-waving them away
route implementation, debugging, review, accessibility, or performance work to the correct neighboring skill

Examples

Example 1: API + migration packet

Input

This PR changes an API contract and adds a DB migration. What validation should be required before merge?

Output sketch

Packet: change-risk-packet
Gate truth: merge-gate-truth
Primary mode: layer-selection
Risk tier: 2
Required validation:
1. integration test for migration read/write path
2. contract/API check for response compatibility
3. release smoke on the highest-value consumer path
Out of scope: broad browser E2E because the user flow is unchanged
Next owner: backend-testing

Example 2: Flaky browser suite packet

Input

Our Playwright suite is slow and keeps flaking. What testing strategy should this repo adopt?

Output sketch

Packet: flake-cost-packet
Gate truth: merge-gate-truth with scheduled-breadth-truth follow-up
Primary mode: flake-and-cost-policy
Required change:
1. narrow PR browser coverage to critical journeys only
2. move broader combinations to scheduled runs
3. define quarantine/fix rules for repeated flake
4. shift confidence to lower-level integration/component checks where honest
Next owner: backend-testing for implementation, debugging for current flake root cause

Example 3: Release-readiness packet

Input

Engineering says tests are green, but should we require anything else before this release?

Output sketch

Packet: release-readiness-packet
Gate truth: release-gate-truth
Primary mode: release-confidence
Required validation:
1. targeted staging smoke for the changed customer journey
2. migration / rollback checklist item if deploy shape changed
3. explicit note that no new broad regression sweep is required beyond scheduled coverage
Route-out: accessibility-specific signoff to web-accessibility if the change is UI-state heavy

Example 4: Game launch checklist packet

Input

Our build passed CI, but we still have Steam release checklist items and packaging work. Does this stay here?

Output sketch

Packet: release-readiness-packet
Gate truth: release-gate-truth
Primary mode: release-confidence
Required validation: targeted final smoke plus the minimum proof that launch/build checklist items are satisfied
Out of scope: expanding branch-blocking PR checks just because store/platform launch work remains
Next owner: steam-store-launch-ops and/or game-ci-cd-pipeline

Best practices

Start from the packet already in hand, not from a favorite testing slogan.
Prefer the cheapest layer that still proves the risky behavior.
Keep local, PR, release, and scheduled loops distinct.
Treat flaky tests as a policy smell, not just a rerun inconvenience.
Tie release checklists back to actual risk instead of treating them as a separate universe.
State intentional exclusions so residual risk is visible.
Use escaped bugs to ratchet in the lowest-layer regression that would have caught them.
Make merge blockers, release-only proof, and scheduled breadth explicit instead of blending them together.
Route implementation to backend-testing, diagnosis to debugging, rollout execution to deployment-automation, platform/game launch work to steam-store-launch-ops or game-ci-cd-pipeline, and accessibility-heavy validation to web-accessibility.
Use manual validation when it is the honest answer, not as a shameful fallback.
One concise validation brief is more reusable than a giant testing manifesto.

testing-strategies

Testing Strategies

When to use this skill

When not to use this skill

Instructions

Step 1: Start from the policy packet already in hand

Step 2: Name the gate truth before choosing layers

Step 3: Choose one primary policy mode

Step 4: Classify the risk tier and critical path

Step 5: Choose the smallest convincing layer mix

Step 6: Separate local, PR, release, and scheduled expectations

Step 7: Write explicit exception and flake rules

Step 8: Route the next owner immediately

Step 9: Produce one concise validation brief

Output format

Examples

Example 1: API + migration packet

Example 2: Flaky browser suite packet

Example 3: Release-readiness packet

Example 4: Game launch checklist packet

Best practices

References