test-strategy
Discovery Questions
Before writing a single line of strategy, gather context. Check .agents/qa-project-context.md first -- if it exists, use it as the foundation and skip questions already answered there.
Product & Business Context
- What is the product? (SaaS, e-commerce, API platform, mobile app, content site)
- Who are the users? (consumers, enterprise, internal, developers)
- What are the business-critical flows? (signup, checkout, payment, data export, etc.)
- What is the release cadence? (continuous, weekly, bi-weekly, quarterly)
- What regulatory or compliance requirements exist? (SOC2, HIPAA, PCI-DSS, GDPR)
Current Testing State
- What test levels exist today? (unit, integration, E2E, manual, none)
- What is the current test count at each level?
- What frameworks and tools are in use?
- What is the current code coverage percentage?
- What is the target coverage, if any?
- How long does the CI pipeline take end-to-end?
- What is the current flakiness rate?
Pain Points & Goals
- What are the biggest quality pain points? (regressions, slow feedback, flaky tests, gaps)
- What broke in the last 3 releases? What escaped to production?
- What does "good enough quality" look like for this team?
- What is the team's appetite for investment in test infrastructure?
Team & Constraints
- Team size and composition (devs, QA, SDET, manual testers)
- Skill levels with automation tools
- Budget constraints for tooling
- Timeline pressure -- is there a deadline driving this strategy?
Calibrate to your team maturity (set
team_maturityin.agents/qa-project-context.md):
- startup — Focus on a minimal test pyramid: unit tests + a handful of critical E2E paths. Skip contract testing and formal metrics until you have CI running reliably.
- growing — Full pyramid with defined coverage targets, flakiness thresholds, and CI quality gates. Add risk-based prioritization.
- established — Comprehensive strategy with SLA-backed quality gates, multi-environment coverage, advanced tooling (contract testing, chaos, observability), and formal review cadence.
Core Principles
Every strategy produced by this skill adheres to these five principles:
1. Risk-Based Prioritization Over Exhaustive Coverage
Not all code is equal. A payment processing bug costs 1000x more than a tooltip typo. Allocate testing effort proportional to business risk, not code volume. The risk assessment matrix (below) drives where to invest.
2. Test Pyramid Health
A healthy test suite follows the pyramid shape: many fast unit tests, fewer integration tests, fewest E2E tests. When the shape inverts (ice cream cone), feedback is slow, maintenance is high, and confidence is paradoxically low. Diagnose the current shape and prescribe corrections.
3. Shift-Left: Catch Defects Earlier
Every defect found later costs exponentially more. Strategy should push validation earlier: static analysis before tests, unit tests before integration, contract tests before E2E. Design reviews catch architecture bugs that no test can find.
4. Measurable: Every Strategy Element Has a KPI
If you cannot measure it, you cannot improve it. Every section of the strategy must define what success looks like in numbers: coverage targets, flakiness thresholds, defect escape rate goals, MTTR limits.
5. Living Document: Strategy Evolves With the Product
The strategy is reviewed quarterly at minimum. It includes a revision history, owners for each section, and triggers for re-evaluation (new product area, team change, major incident).
Strategy Document Template
Walk through each section below to produce the final strategy document. Tailor depth to the product's complexity -- a 5-person startup needs 5 pages, not 50.
1. Scope & Objectives
Define the boundaries clearly. Ambiguity here causes gaps and wasted effort downstream.
In Scope:
- List every product area, service, and integration that this strategy covers
- Include both functional and non-functional testing types
- Specify platforms and browsers/devices
Out of Scope:
- Explicitly state what is NOT covered and why
- Third-party services tested only at the contract level
- Legacy systems scheduled for deprecation
Objectives:
- State 3-5 measurable objectives with timelines
- Example: "Reduce defect escape rate from 12% to under 5% within two quarters"
- Example: "Achieve 80% unit test coverage on all services launched after Q1 2026"
2. Test Levels & Types
Define each test level, what it covers, who owns it, and the expected volume.
| Level | What It Validates | Owner | Framework | Target Count | Run Frequency |
|---|---|---|---|---|---|
| Unit | Individual functions, business logic, edge cases | Developers | Jest/Vitest/pytest | 70-80% of all tests | Every commit |
| Integration | Service interactions, database queries, API contracts | Developers + QA | Supertest/pytest | 15-20% of all tests | Every PR |
| E2E | Critical user journeys through the full stack | QA/SDET | Playwright/Cypress | 5-10% of all tests | Pre-deploy + nightly |
| API | Contract compliance, response schemas, error handling | Developers | Postman/REST-assured | Per endpoint | Every PR |
| Visual | UI regression, layout shifts, responsive design | QA | Playwright/Percy | Key pages | Nightly |
| Performance | Response times, throughput, resource usage | DevOps/QA | k6/Artillery | Critical paths | Weekly + pre-release |
| Security | OWASP Top 10, dependency vulnerabilities, auth flows | Security/DevOps | OWASP ZAP/Snyk | Per release | Pre-release + scheduled |
| Accessibility | WCAG 2.1 AA compliance, screen reader compat | QA/Frontend | axe-core/pa11y | Key flows | Every PR |
Adjust this table based on what the product actually needs. Not every product needs visual regression testing. Every product needs unit and integration tests.
3. Test Pyramid Analysis
Diagnose the current shape of the test suite and define the target state.
Shapes and What They Mean
HEALTHY PYRAMID ICE CREAM CONE DIAMOND HOURGLASS
/ E2E \ +-----------+ / E2E \
/ ~5-10% \ | E2E ~60% | / Int \ / ~30% \
/ \ | | / ~50% \ +----------+
/ Integration \ +-----------+ / \ | Int ~10% |
/ ~15-20% \ | Int ~20% | +-----------+ +----------+
+---------------+ +-----------+ | Unit ~30% | / Unit \
| Unit ~70% | | Unit ~20% | +-----------+ / ~60% \
+---------------+ +-----------+ +--------------+
Fast feedback, Slow, brittle, Heavy on mocks, Missing middle
high confidence, expensive to run, integration gaps layer, gaps in
cheap to maintain hard to maintain still possible service boundaries
Current State Assessment Worksheet
Fill in these values from the codebase:
Current Test Distribution:
Unit tests: _____ count → _____ %
Integration tests: _____ count → _____ %
E2E tests: _____ count → _____ %
Manual test cases: _____ count (not in pyramid, but track)
Current Shape: [ ] Pyramid [ ] Ice Cream Cone [ ] Diamond [ ] Hourglass [ ] No Shape
CI Pipeline Duration: _____ minutes
Flaky Test Rate: _____ %
Test Suite Pass Rate: _____ %
Target State
Define the target ratios and the timeline to get there:
Target Test Distribution:
Unit: 70-80% → target count: _____
Integration: 15-20% → target count: _____
E2E: 5-10% → target count: _____
Target CI Duration: < _____ minutes
Target Flaky Rate: < _____ %
Action Plan to Shift Toward Healthy Pyramid
If ice cream cone or diamond:
- Freeze E2E growth -- no new E2E tests unless covering a net-new critical path
- Decompose existing E2E tests -- identify E2E tests that validate logic testable at unit level, rewrite them
- Add unit test requirements to PR checklist -- every PR touching business logic must include unit tests
- Set CI gates -- fail PRs where unit:E2E ratio drops below threshold
If hourglass:
- Invest in integration test infrastructure -- database fixtures, service stubs, contract tests
- Identify service boundaries -- each boundary needs integration tests for happy path + error cases
- Use contract testing (Pact or similar) for inter-service communication
4. Risk Assessment Matrix
Map features to risk levels. This directly determines testing depth.
5x5 Risk Matrix
LIKELIHOOD → Rare Unlikely Possible Likely Almost Certain
IMPACT ↓ 1 2 3 4 5
Catastrophic (5) 5-MED 10-HIGH 15-CRIT 20-CRIT 25-CRIT
Major (4) 4-LOW 8-MED 12-HIGH 16-CRIT 20-CRIT
Moderate (3) 3-LOW 6-MED 9-MED 12-HIGH 15-CRIT
Minor (2) 2-LOW 4-LOW 6-MED 8-MED 10-HIGH
Negligible (1) 1-LOW 2-LOW 3-LOW 4-LOW 5-MED
Risk-to-Testing Action Map
| Risk Level | Testing Action | Automation | Monitoring |
|---|---|---|---|
| CRITICAL (15-25) | Full automation + manual exploratory + load test | Mandatory, runs on every commit | Real-time alerts, synthetic monitoring |
| HIGH (10-14) | Full automation + periodic manual review | Mandatory, runs on every PR | Dashboard + daily checks |
| MEDIUM (5-9) | Automation for happy path + key error cases | Recommended | Weekly review |
| LOW (1-4) | Manual testing or skip | Optional | None required |
Example Risk Assessment
| Feature Area | Impact | Likelihood | Risk Score | Testing Approach |
|---|---|---|---|---|
| Payment processing | 5 - Catastrophic | 3 - Possible | 15 - CRIT | Automated E2E + unit + contract + monitoring |
| User authentication | 5 - Catastrophic | 2 - Unlikely | 10 - HIGH | Automated E2E + security scan + unit |
| Dashboard rendering | 2 - Minor | 3 - Possible | 6 - MED | Unit + visual regression |
| Email preferences | 1 - Negligible | 2 - Unlikely | 2 - LOW | Manual verification |
5. Environment Strategy
Define which environments exist and what testing happens in each.
| Environment | Purpose | Test Types | Data | Deploy Trigger |
|---|---|---|---|---|
| Local | Developer feedback | Unit, integration | Mocked/seeded | On save |
| CI | Automated validation | Unit, integration, lint, SAST | Ephemeral | On push/PR |
| Staging | Pre-production validation | E2E, visual, performance, security | Production-like (anonymized) | On merge to main |
| Production | Monitoring & smoke | Smoke tests, synthetic monitoring | Live | On deploy |
Key decisions to document:
- How is test data managed in each environment?
- Are environments ephemeral (preview deployments) or long-lived?
- Who has access to each environment?
- How are environment-specific configurations managed?
6. Tool Selection Rationale
Do not pick tools first. Understand needs first, then select tools that fit.
Decision Matrix Template
| Criteria (weight) | Tool A | Tool B | Tool C |
|---|---|---|---|
| Fits tech stack (25%) | |||
| Team familiarity (20%) | |||
| Community & docs (15%) | |||
| CI integration (15%) | |||
| Maintenance cost (10%) | |||
| Speed of execution (10%) | |||
| License cost (5%) | |||
| Weighted total |
Score each 1-5, multiply by weight, sum for weighted total.
Total Cost of Ownership
Beyond license fees, account for:
- Setup time: How long to configure CI, write first tests, train team
- Writing time: How long to write a typical test (measure this -- time 5 tests)
- Maintenance time: How often do tests break due to framework updates
- Debug time: When a test fails, how long to diagnose (good error messages matter)
- Infrastructure cost: Browser farms, parallel execution, cloud runners
Common Stack Recommendations
| Product Type | Unit | Integration | E2E | API | Visual |
|---|---|---|---|---|---|
| React SaaS | Vitest | Testing Library + MSW | Playwright | Supertest | Playwright screenshots |
| Next.js | Vitest | Testing Library + MSW | Playwright | Supertest | Playwright screenshots |
| Python API | pytest | pytest + testcontainers | pytest + requests | pytest | N/A |
| Mobile (RN) | Jest | Detox | Detox/Appium | Supertest | Appium screenshots |
| Vue SaaS | Vitest | Testing Library + MSW | Playwright | Supertest | Playwright screenshots |
These are starting points, not mandates. Document why you chose or deviated.
7. Entry/Exit Criteria
Define what must be true before testing starts (entry) and what must be true before testing is considered done (exit) at each level.
Unit Testing
- Entry: Code compiles, function has a clear contract (inputs/outputs documented)
- Exit: All branches covered, edge cases tested, no skipped tests, coverage target met
Integration Testing
- Entry: Unit tests pass, dependent services available (or stubbed), test data seeded
- Exit: All service boundaries tested, error paths validated, no flaky tests
E2E Testing
- Entry: Integration tests pass, staging deployed, test accounts provisioned
- Exit: All critical user journeys pass, no P0/P1 defects open, performance within SLA
Release
- Entry: All test levels pass, no CRITICAL/HIGH defects open, release notes drafted
- Exit: Smoke tests pass in production, monitoring shows no anomalies for 30 min, rollback plan verified
8. Quality Gates & Definition of Done
Define automated gates that prevent bad code from moving forward.
PR Gate (runs on every pull request)
- All unit tests pass
- All integration tests pass
- Code coverage does not decrease (or meets minimum threshold)
- No new linting errors
- SAST scan passes (no new high/critical findings)
- Bundle size does not increase beyond threshold
- At least one approval from code reviewer
Merge Gate (runs on merge to main)
- All PR gate checks pass
- E2E smoke suite passes against preview deployment
- No merge conflicts
- Branch is up to date with main
Deploy Gate (runs before production deployment)
- Full E2E suite passes on staging
- Performance benchmarks within acceptable range
- Security scan passes
- Feature flags configured correctly
- Rollback plan documented and tested
Nightly Gate (runs on schedule)
- Full E2E suite including edge cases
- Visual regression tests
- Performance/load tests
- Accessibility scan
- Dependency vulnerability scan
- Results reviewed by QA lead next morning
9. Metrics & KPIs
Track these metrics to know if the strategy is working.
| Metric | Definition | Target | Tracking Cadence |
|---|---|---|---|
| Code Coverage | Lines/branches covered by unit + integration tests | >80% for critical services, >60% overall | Per PR (automated) |
| Test Pyramid Ratio | Unit:Integration:E2E percentage split | 70:20:10 (within 10% tolerance) | Monthly |
| Flakiness Rate | % of test runs with non-deterministic failures | <2% | Weekly |
| Defect Escape Rate | % of defects found in production vs. total defects | <5% | Per release |
| Mean Time to Recovery (MTTR) | Average time from defect detection to fix deployed | <4 hours for P0, <24h for P1 | Per incident |
| CI Pipeline Duration | Time from push to green/red signal | <15 minutes for PR, <30 min for full | Weekly |
| Test Velocity | New tests written per sprint | Positive trend, no target number | Per sprint |
| Defect Density | Defects per 1000 lines of code | Decreasing trend | Monthly |
| Automation Rate | % of test cases automated vs. total | >80% for regression suite | Quarterly |
| False Positive Rate | % of test failures that are not real bugs | <5% | Weekly |
How to Use These Metrics
- Do not use metrics to punish teams. Use them to identify systemic issues.
- Do track trends over time, not absolute numbers. A team going from 30% to 60% coverage is doing great.
- Do set realistic targets based on current state. Jumping from 20% to 90% coverage in one quarter is not a plan, it is a fantasy.
- Do review metrics quarterly with engineering leadership. Celebrate improvements.
- Do investigate spikes. A sudden increase in flakiness signals an infrastructure problem, not a laziness problem.
10. Timeline & Milestones
Roll out the strategy in phases. Trying to do everything at once guarantees nothing gets done well.
Phase 1: Foundation (Weeks 1-4)
- Complete risk assessment for all product areas
- Set up CI pipeline with unit test gate
- Establish baseline metrics (current coverage, flakiness, pipeline time)
- Write unit tests for top 5 highest-risk areas
- Select and configure E2E framework
- Exit criteria: CI runs unit tests on every PR, baseline metrics documented
Phase 2: Coverage Expansion (Weeks 5-10)
- Add integration tests for all service boundaries
- Write E2E tests for top 10 critical user journeys
- Implement visual regression testing for key pages
- Set up test data management
- Configure nightly test runs
- Exit criteria: All critical paths have E2E coverage, integration tests cover all APIs
Phase 3: Quality Gates (Weeks 11-14)
- Enable coverage gates on PRs (no decrease allowed)
- Add performance benchmarks to CI
- Implement security scanning in pipeline
- Set up monitoring dashboards for all KPIs
- Exit criteria: All four gates (PR, merge, deploy, nightly) are active and enforced
Phase 4: Optimization (Weeks 15-20)
- Identify and fix or quarantine flaky tests
- Optimize CI pipeline for speed (parallelization, caching)
- Implement test impact analysis (run only affected tests)
- Set up synthetic monitoring in production
- First quarterly strategy review
- Exit criteria: CI under 15 min, flakiness under 2%, first strategy revision published
Ongoing
- Quarterly strategy review and revision
- Monthly metrics review with team
- Continuous test maintenance (refactor, de-flake, retire)
Anti-Patterns
Watch for these common failures. If you spot them, call them out explicitly in the strategy document.
100% Coverage Targets
Diminishing returns kick in hard past 80%. The last 20% requires testing getters, setters, and trivial code while ignoring the integration gaps where real bugs live. Set coverage targets per module based on risk, not a blanket number.
Ice Cream Cone (Inverted Pyramid)
Too many E2E tests, too few unit tests. Symptoms: CI takes 45+ minutes, tests break constantly due to UI changes, nobody trusts the test suite. Fix by freezing E2E growth and decomposing existing E2E tests into lower levels.
Strategy as One-Time Document
A strategy written once and never updated is worse than no strategy, because it gives false confidence. Build in review triggers: quarterly calendar review, post-incident review, new product area launch, team composition change.
Tool-First Thinking
"We should use Playwright" is not a strategy. It is a tool choice masquerading as a plan. Start with what you need to validate, then pick tools that fit. The strategy document should justify tool choices, not lead with them.
No Metrics = No Accountability
A strategy without measurable targets is a wish list. Every section should connect to a KPI. If you cannot define what success looks like for a strategy element, question whether it belongs in the strategy.
Testing in Isolation
QA strategy that lives only in the QA team's wiki is invisible to developers. The strategy must be integrated into the development workflow: PR templates, CI gates, Definition of Done. If developers do not see it daily, it does not exist.
Copy-Paste Strategy
Taking another company's strategy verbatim ignores your product's unique risk profile, team skills, and constraints. Use templates as starting points, but every section must be tailored to your specific context.
Automating Everything Immediately
Manual exploratory testing has enormous value, especially early in a product's life. Automate regression, keep exploration manual. The strategy should specify what stays manual and why.
Output Format
The final strategy document should follow this structure:
# QA Strategy: [Product Name]
## Version [X.Y] | Last Updated: [Date] | Owner: [Name]
### 1. Executive Summary (1 paragraph)
### 2. Scope & Objectives
### 3. Test Levels & Types (table)
### 4. Test Pyramid Analysis (current → target)
### 5. Risk Assessment (matrix + feature mapping)
### 6. Environment Strategy (table)
### 7. Tool Selection (decisions + rationale)
### 8. Entry/Exit Criteria (per level)
### 9. Quality Gates (per stage)
### 10. Metrics & KPIs (table with targets)
### 11. Timeline & Milestones (phased)
### 12. Risks to the Strategy Itself
### 13. Revision History
Done When
- A strategy document exists at an agreed location with all 13 sections populated (Executive Summary through Revision History)
- Test pyramid target ratios are defined with concrete counts and a timeline to reach them
- Entry and exit criteria are written for each test level (unit, integration, E2E, release)
- Tool selection decisions are documented with a scored rationale matrix, not just tool names
- Quality gates are defined for all four stages (PR, merge, deploy, nightly) with specific pass/fail thresholds
Related Skills
- risk-based-testing -- deep dive into risk assessment methodology
- qa-metrics -- detailed KPI definitions, dashboards, and trend analysis
- release-readiness -- go/no-go checklists and release confidence scoring
- test-planning -- sprint-level test planning and estimation
- ci-cd-integration -- pipeline configuration and gate implementation
- shift-left-testing -- techniques for moving validation earlier