Test Plan Generation Skill

What Is a Test Plan?

A Test Plan is the master document that defines the scope, approach, resources, and schedule of all testing activities for a software project or feature. As specified by IEEE 829 (Standard for Software and System Test Documentation), a test plan communicates the intent of testing to all stakeholders and provides a framework for organizing, tracking, and evaluating test efforts. It answers fundamental questions: what will be tested, how it will be tested, who will test it, when testing will happen, and what criteria determine whether testing is complete.

Within the software development lifecycle a Test Plan sits downstream of the Software Requirements Specification (SRS) and Technical Design documents. It translates functional and non-functional requirements into concrete, verifiable test cases. A well-constructed Test Plan reduces the risk of undetected defects reaching production, provides measurable quality gates, and serves as the contractual agreement between development, QA, and product teams on what "done" means from a quality perspective.

This skill treats the Test Plan as a living artifact. It is authored once, but it evolves as requirements change, new risks emerge, and test execution reveals areas needing deeper coverage.

Six-Step Workflow

Every Test Plan generated by this skill follows a disciplined six-step process. Each step must be completed before moving to the next.

Step 1 -- Scan Project and Test Infrastructure

Before writing any test documentation, scan the project to build situational awareness of both the application under test and the existing test infrastructure.

Glob the project tree to discover the repository structure, source modules, and naming conventions. Use patterns such as **/*.md, **/package.json, **/pyproject.toml, **/*.test.*, **/*.spec.*, **/__tests__/**, or language-specific test directories to map the landscape.
Read the README (or equivalent entry-point documentation) to understand the project's purpose, architecture, and deployment model.
Identify existing test frameworks and tooling by scanning configuration files such as jest.config.*, pytest.ini, vitest.config.*, .github/workflows/*, Makefile, or docker-compose.test.yml. Understanding what test infrastructure already exists prevents duplicated effort and ensures the plan integrates with the team's workflow.
Catalog the tech stack, runtime environments, and external dependencies so the test environment section of the plan is accurate and complete.

Step 2 -- Find Upstream SRS and Tech Design Documents

The Test Plan must trace every test case back to a requirement. Automatically scan for upstream documents.

Search the docs/ directory for files matching patterns like */srs.md, */tech-design.md, */prd.md, or equivalent naming conventions.
Extract requirement IDs (e.g., FR-XXX-NNN, NFR-XXX-NNN) from the SRS to build the requirements-to-test-cases traceability matrix later.
Extract component and module names from the Technical Design to inform test module organization and integration test boundaries.
If no upstream documents are found, inform the user and ask whether to proceed with a standalone test plan or wait until requirements documentation is available.

Step 3 -- Clarify Questions

After scanning, present the user with targeted clarifying questions. Good questions surface missing context that cannot be inferred from the codebase. Typical areas to probe include:

The scope of testing: which features, modules, or user flows must be covered.
Quality objectives: what level of confidence is required before release.
Known risk areas: parts of the system that are fragile, recently changed, or historically defect-prone.
Non-functional requirements: performance targets, security compliance requirements, compatibility matrix (browsers, devices, OS versions).
Test environment constraints: availability of staging environments, test data sensitivity, third-party service sandboxes.
Timeline and resource constraints: how much calendar time is available for testing, and who will execute tests.
Automation expectations: what percentage of tests should be automated, and what frameworks are preferred.

Do not proceed to generation until the user has answered enough questions to fill the core sections of the template.

Step 4 -- Generate Test Plan

Using the answers from Step 3 and the context from Steps 1 and 2, generate the full Test Plan by filling in every applicable section of references/template.md. Follow the writing guidelines and standards described in the sections below. Generate all Mermaid diagrams inline. Assign test case IDs, priorities, and types as you go.

Step 5 -- Build Traceability

After generating the Test Plan, construct the Requirements Traceability Matrix (RTM). Map every SRS requirement ID (both functional and non-functional) to one or more test case IDs. Flag any requirements that lack test coverage and any test cases that do not trace back to a stated requirement. The RTM is the primary mechanism for proving that testing is complete and aligned with the specification.

Step 6 -- Quality Check

Validate the completed Test Plan against every item in references/checklist.md. Fix any issues before presenting the final document to the user. Summarize the checklist results so the user can see what passed and whether any items were intentionally skipped (with justification).

Test Strategy Design -- The Test Pyramid

The test strategy follows the well-established test pyramid model, with one critical modification: any test that touches the database must use a real database, not a mock.

The Modified Test Pyramid

Unit Tests (Pure Logic) form the base for code with no database dependency — business calculations, input validations, data transformations, utility functions. These are fast, isolated, and need no mocks or stubs. They are the cheapest to write and maintain.
Unit Tests (DB-Touching) cover any method that reads or writes to the database — repositories, services with queries, data access layers. These MUST use a real database (TestContainers, in-memory test DB, or dedicated test instance). Mocking the database at this level gives false confidence: tests pass, but SQL syntax errors, constraint violations, ORM mapping bugs, and transaction behavior go undetected until production.
Integration Tests verify that components work together — API endpoints processing requests through to database, message queue consumers persisting data, service-to-service calls. Always use a real database. Mock only external third-party services you don't control.
System Tests validate the application as a whole in a production-like environment with a real database, real cache, and real message queues.
Acceptance Tests sit at the peak, verifying business requirements with real data scenarios in staging or production-like environments.

Why Real Database Over Mocks

Mocking the database hides real bugs:

SQL syntax and query logic errors — the mock doesn't execute SQL, so wrong queries pass
Constraint violations — unique keys, foreign keys, NOT NULL constraints are invisible to mocks
Transaction behavior — rollback on error, isolation levels, deadlocks don't exist in mocks
ORM mapping bugs — field name mismatches, type conversions, lazy loading issues go undetected
Index and performance issues — mocks return instantly regardless of query efficiency
Database-specific behavior — NULL handling, string collation, timezone conversion, JSON operators differ between databases

The only acceptable use of mocks is for external third-party services (payment gateways, email/SMS providers, external APIs) that you don't control and may be unavailable during tests. Your own database, cache, and message queues should always be tested for real.

Tools like TestContainers make real database testing as easy as mocking — they spin up a real database in Docker, run tests, and auto-cleanup. There is no longer a valid excuse to mock the database.

The plan must specify the approximate distribution of tests across these levels (e.g., 60% unit, 25% integration, 10% system/E2E, 5% acceptance) and include a Pyramid Distribution Rationale explaining why this specific split was chosen for this project. A generic distribution is not acceptable — the rationale must reference project-specific factors: "Unit tests are weighted at 70% because the domain logic is heavily algorithmic with minimal external I/O" or "Integration tests are elevated to 40% because this service is primarily a data aggregation layer with little pure logic to unit test." Any distribution that deviates significantly from the standard pyramid must explain what makes this project structurally different.

Test Case Writing Standards

ID Format

Every test case receives a unique identifier following this pattern:

TC-<MODULE>-<NNN>

TC is the fixed prefix indicating a test case.
MODULE is a short uppercase code (three to five characters) representing the feature area or module. Examples: AUTH, PAY, DASH, NOTIF, CART, SRCH.
NNN is a zero-padded three-digit sequence number starting at 001.

Examples: TC-AUTH-001, TC-PAY-012, TC-CART-003.

Test Case Structure

Each test case is an implementation guide for engineers. It must be detailed enough to be translated directly into test code. Each test case must include the following fields:

TC ID -- The unique identifier following the format above.
Title -- Use the pattern: [action] [condition] [expected outcome] (e.g., "Create user with valid email returns 201 and saves to database").
Module -- The feature area or component under test.
Preconditions -- The exact database state that must exist before the test runs. Specify which records exist in which tables with which field values. Also state authentication state and any config/feature flags. Do NOT write vague preconditions like "user is logged in" — instead write "User record exists in users table with id='uuid-1', role='admin', status='active'; Auth token valid for user uuid-1".
Steps -- Numbered steps with concrete test data. Use real values: name: "John Doe", email: "test@example.com", age: 25. Never use placeholders like [valid name] or [valid email]. Specify exact HTTP method, endpoint, headers, and request body.
Expected Result -- Two parts: (1) API Response — exact HTTP status, response body structure, error codes. (2) DB State After — for any write operation, specify exactly what to query in the database and what values to assert. A test that only checks the API response without verifying database state is incomplete.
DB Verify -- The specific database query and assertion to run after the test (e.g., "Query users WHERE email = 'test@example.com' → verify name = 'John Doe', status = 'active'").
Priority -- P0 (critical path, must pass), P1 (important, should pass), or P2 (desirable, nice to verify).
Type -- Functional, performance, security, compatibility, regression, or usability.
DB Approach -- Real DB (for any test touching the database) or Mock (ONLY for external third-party services, with justification).
Automation Status -- Automated, to-be-automated, or manual.

Test Types Coverage

The Test Plan must address the following test types, allocating appropriate effort to each based on the project's risk profile:

Functional Testing -- Verifies that the software behaves according to functional requirements. Covers positive flows, negative flows, and edge cases.
Performance Testing -- Validates response times, throughput, and resource utilization under expected and peak loads. Includes load testing, stress testing, and endurance testing where applicable.
Security Testing -- Identifies vulnerabilities including injection attacks, authentication/authorization flaws, data exposure, and compliance gaps. References OWASP Top 10 where relevant.
Compatibility Testing -- Ensures correct behavior across target browsers, devices, operating systems, and screen resolutions as defined in the project's compatibility matrix.
Regression Testing -- Confirms that existing functionality remains unbroken after code changes. Typically automated and executed as part of the CI/CD pipeline.
Usability Testing -- Evaluates the user experience against accessibility standards (WCAG) and general usability heuristics.

Test Methods

Three primary test methods are applied depending on the test level and objective:

Black-Box Testing -- Tests the software without knowledge of internal implementation. Used primarily for functional testing, acceptance testing, and E2E testing. Techniques include equivalence partitioning, boundary value analysis, decision table testing, and state transition testing.
White-Box Testing -- Tests with full knowledge of the internal code structure. Used primarily for unit testing and code coverage analysis. Techniques include statement coverage, branch coverage, and path coverage.
Gray-Box Testing -- Combines elements of both approaches. The tester has partial knowledge of the internal structure. Used for integration testing, API testing, and security testing where understanding data flow helps design more effective tests.

Risk-Based Test Prioritization

Not all features carry equal risk. The Test Plan applies risk-based prioritization to focus testing effort where it matters most.

Identify risk areas by analyzing business impact (revenue, user trust, regulatory compliance) and technical complexity (new technology, third-party integrations, high cyclomatic complexity).
Assign risk scores by combining likelihood of failure with severity of impact. Use a simple matrix: High/Medium/Low for each dimension. Each risk score must include a Risk Reasoning note — a one-sentence justification for both the likelihood and impact ratings. "High likelihood" must name a concrete reason (e.g., "first integration with this third-party API, no prior experience") and "High impact" must quantify the consequence (e.g., "payment flow failure directly blocks revenue"). Risk scores without reasoning are not acceptable.
Allocate test depth proportionally -- high-risk areas receive deeper coverage with more test cases, more negative scenarios, and more boundary value analysis. Low-risk areas receive baseline coverage. Document the depth decision for each risk area so reviewers can verify the allocation is proportional.
Map risks to test cases so the traceability matrix shows not just requirement coverage but also risk mitigation coverage.

Entry and Exit Criteria

Entry Criteria

Entry criteria define the preconditions that must be met before testing begins. They act as a gate to prevent wasted effort on an untestable build. Typical entry criteria include: code complete for the features in scope, build successfully deployed to the test environment, unit tests passing with minimum coverage thresholds, test data prepared and loaded, and upstream documentation (SRS, Tech Design) reviewed and approved.

Exit Criteria

Exit criteria define the measurable conditions that must be met before testing is declared complete. They provide an objective, defensible answer to "are we done testing?" Typical exit criteria include: all P0 and P1 test cases executed, overall pass rate at or above 95%, no open Critical or Major defects, requirements traceability matrix showing 100% coverage of in-scope requirements, and performance benchmarks met.

Defect Management

Severity Classification

Defects are classified into four severity levels:

Critical -- System crash, data loss, security breach, or complete feature failure with no workaround. Requires immediate attention and blocks release.
Major -- Significant feature malfunction or performance degradation with a difficult or unacceptable workaround. Must be resolved before release.
Minor -- Cosmetic issue, minor inconvenience, or edge case failure with an easy workaround. Can be deferred to a subsequent release if necessary.
Trivial -- Typographical error, minor UI misalignment, or improvement suggestion. No functional impact. Fix at convenience.

Defect Lifecycle

Every defect follows a defined lifecycle: New, Assigned, In Progress, Fixed, Verified, and Closed. If verification fails, the defect is Reopened and cycles back through the process. The Test Plan must include a Mermaid state diagram illustrating this lifecycle.

Requirements Traceability Matrix

The Requirements Traceability Matrix (RTM) is one of the most critical sections of the Test Plan. It provides a bidirectional mapping between SRS requirements and test cases.

Every functional requirement (FR-XXX-NNN) must map to at least one test case.
Every non-functional requirement (NFR-XXX-NNN) must map to at least one test case.
Every test case must trace back to at least one requirement.
Coverage status is marked as: Covered, Partially Covered, or Not Covered.
Any gaps must be flagged with notes explaining why coverage is missing and what the plan is to address it.

Data Integrity Testing

Every test plan must include a dedicated section for data integrity test cases. These tests can ONLY be caught with a real database — mocks will always pass regardless of constraint violations. Data integrity tests must cover:

Unique constraint enforcement — Attempt to insert a duplicate value for a unique field; verify the database rejects it with the correct error.
Foreign key constraint enforcement — Attempt to insert a child record with a nonexistent parent ID; verify referential integrity is maintained.
Cascade operations — Delete a parent record and verify child records are correctly cascaded (deleted, nullified, or blocked depending on schema design).
Transaction rollback — Trigger a multi-step operation where an intermediate step fails; verify ALL changes are rolled back and the database returns to its pre-operation state.
Concurrent update handling — Simulate two users updating the same record simultaneously; verify optimistic locking, version checking, or last-write-wins behavior works correctly.
Soft delete behavior — Delete a record and verify it's flagged (e.g., deleted_at set) rather than physically removed; verify soft-deleted records are excluded from normal queries.
NOT NULL constraint enforcement — Send null for a required field; verify the database-level constraint catches it.

Test Case Design Principles

Positive and Negative Testing

Every feature must have both positive test cases (verifying correct behavior with valid inputs) and negative test cases (verifying proper error handling with invalid, unexpected, or malicious inputs). A ratio of roughly 60% positive to 40% negative is a useful starting guideline for most features.

Boundary Value Analysis

For any input that has defined ranges or limits, include test cases at the exact boundary, one value below the boundary, and one value above the boundary. This technique catches off-by-one errors and range validation defects that are among the most common bugs in software.

Edge Cases

Beyond boundary values, identify and test edge cases specific to the domain: empty inputs, maximum-length strings, concurrent operations, timezone transitions, Unicode characters, null values, and other scenarios that stress the system's assumptions.

Reference Files

This skill relies on two reference files stored alongside it.

references/template.md -- The full Test Plan template following IEEE 829 structure with placeholder text for every section. The generated Test Plan is built by filling in this template.
references/checklist.md -- A quality checklist organized into four categories (Completeness, Quality, Consistency, Format). The checklist is used during Step 6 to validate the finished document.

Always read both files before generating a Test Plan so that any updates to the template or checklist are picked up automatically.

Output Location

The finished Test Plan is written to:

docs/<feature-name>/test-plan.md

where <feature-name> is a lowercase, hyphen-separated slug derived from the feature name (for example, docs/user-authentication/test-plan.md or docs/payment-processing/test-plan.md). If the docs/<feature-name>/ directory does not exist, create it. If a file with the same name already exists, confirm with the user before overwriting.

test-plan-generation