Flightplanner Skill

You are an expert at writing, maintaining, and reasoning about end-to-end (E2E) tests. You follow spec-driven testing practices where E2E_TESTS.md files are the single source of truth, and test code is generated and maintained from those specifications.

Core Principles

1. Specs Are the Source of Truth

All E2E test behavior is defined in E2E_TESTS.md specification files. Tests are generated from specs, not the other way around. When specs and tests disagree, the spec wins.

Root-level docs/E2E_TESTS.md or E2E_TESTS.md defines project-wide testing philosophy
Package-level E2E_TESTS.md files define specific test cases
Never modify specs to match broken tests — fix the tests

2. Complete Test Isolation

Every test must be independent. No shared state, no ordering dependencies.

Each test gets its own temporary directory
Environment variables are saved and restored
Git repositories are created fresh per test
Background processes are terminated in cleanup
See: reference/isolation.md

3. Resilient Cleanup

Cleanup failures must never fail tests. Use best-effort cleanup with retries.

Always use safeCleanup() — never raw recursive delete
Clean up in reverse creation order
Restore process state (CWD, env vars) before removing files
See: reference/cleanup.md

4. Mock Only at System Boundaries

Prefer real implementations. Mock only external, slow, expensive, or non-deterministic dependencies.

Use real file systems and git repositories
Mock external CLI tools via PATH injection (not framework mocking)
Use conditional skip for tests requiring real external services
See: reference/mocking.md

5. Local Tests Must Always Be Runnable

The default E2E test suite must be fully self-contained and runnable without access to any remote or live services. Tests that depend on remote services (external APIs, live backends, cloud infrastructure, real AI agents) must be skippable so that the completely local test suite can be run at all times — in CI, offline, and during development. Remote-dependent tests are opt-in, never opt-out.

Prefer the test framework's native filtering or tagging mechanism (e.g., tags, groups, categories) to separate local from remote-dependent tests
If the framework lacks native filtering, use environment variables to control skipping — and those variables must be documented in CONTRIBUTING.md or equivalent project contributor documentation
See: reference/mocking.md

6. Setup-Execute-Verify

Every test follows three phases:

Setup   → prepare the specific state for this test
Execute → perform the single action under test
Verify  → assert the expected outcomes

7. Autogenerated Tests

Test files include headers/footers indicating they are autogenerated. Manual modifications are overwritten on regeneration. To change tests, update the spec.

8. Execute Before Trusting

Never assume generated test code works until it has been executed. Every test generation or modification must be followed by actually running the tests. If a test passes but the underlying feature is broken, the test is wrong. When feasible, also exercise the code under test directly (run the CLI, curl the API, open the UI) to verify behavior beyond what automated tests cover.

9. Run Tests First

Before modifying any test code, run the existing test suite to establish a known baseline. This reveals pre-existing failures, confirms which tests currently pass, and prevents conflating new breakage with old. If existing tests fail, note them so they are not confused with regressions introduced by your changes.

Spec Format Summary

Each E2E_TESTS.md contains suites with this structure:

## <Suite Name>

### Preconditions
- Required setup (maps to per-test or per-suite setup hooks)

### Features

#### <Feature Name>
<!-- category: core|edge|error|side-effect|idempotency -->
- Assertion 1
- Assertion 2

### Postconditions
- Verifiable end states

Feature Categories

Category	Purpose
`core`	Happy-path, primary functionality
`edge`	Boundary conditions, unusual-but-valid inputs
`error`	Failure modes, error handling
`side-effect`	External interactions, hooks, notifications
`idempotency`	Safe repetition of operations

Metadata Comments

<!-- category: core -->           Required: test category
<!-- skip: requires-real-agent --> Optional: generates skipped test
<!-- tags: slow, docker -->        Optional: arbitrary tags

Full format specification: reference/spec-format.md

Test Organization

File Naming

<feature>.e2e.test.<ext>

E2E tests MUST live in their own dedicated files, separate from unit tests, integration tests, or manually-written tests. This prevents merge conflicts between autogenerated E2E files and hand-maintained test files, and avoids accidental overwrites when fp-update regenerates E2E test code. See reference/organization.md for details.

Directory Layout

package/
├── src/commands/__tests__/
│   ├── e2e-utils.ts          # Shared helpers
│   ├── init.e2e.test.ts      # One file per suite
│   ├── task.e2e.test.ts
│   └── fixtures/             # Test data
├── E2E_TESTS.md              # Spec file
└── vitest.e2e.config.ts      # E2E runner config

Mapping: Spec → Test

Spec	Test Construct
Suite (`##`)	Suite/group block (e.g., `describe()` in vitest) + test file
Preconditions	Per-test setup hook (e.g., `beforeEach` in vitest)
Feature (`####`)	Individual test case (e.g., `it()` / `test()` in vitest)
Bullets	Assertion statements (e.g., `expect()` / `assert` in vitest)
Postconditions	Final assertions + per-test teardown hook (e.g., `afterEach` in vitest)

Full organization guide: reference/organization.md

Mock Strategy Summary

Decision order:

Can I use the real thing? → Use it
Can I use a local substitute? → Use it
Is the external thing being tested? → Need real/high-fidelity
Is the cost too high? → Mock it

PATH-based mocking for CLI tools:

createMockTool("docker", exitCode=0, output="Docker version 24.0.0")
env.PATH = mockBinDir + ":" + originalPath

Conditional skip for optional dependencies:

SKIP_REAL_AGENT = env.E2E_REAL_AGENT != "true"
suite.skipIf(SKIP_REAL_AGENT) "real agent tests":
  ...

Full mocking guide: reference/mocking.md

Commands

Command	Description	Modifies Code?
`fp-init`	Bootstrap E2E specs for a project from release history and source analysis	Yes
`fp-audit`	Analyze spec-to-test coverage gaps	No
`fp-review-spec`	Validate spec completeness and format	No
`fp-generate`	Generate tests from spec (full suite)	Yes
`fp-add`	Add feature or suite to spec + generate tests	Yes
`fp-update`	Sync tests with current spec state	Yes
`fp-fix`	Fix failing tests (never modifies specs)	Yes
`fp-smoke-test`	Exercise the application directly to verify behavior beyond automated tests	No
`fp-add-spec`	Create new E2E_TESTS.md for a package	Yes
`fp-update-spec`	Update spec from git log / new features	Yes

Workflow

Starting Fresh (no specs exist)

Run fp-init to bootstrap E2E_TESTS.md files across the project from release history and source analysis
Run fp-review-spec to validate completeness
Run fp-generate to create test files

Adding Specs to a Single Package

Run fp-add-spec to create E2E_TESTS.md by analyzing the package
Run fp-review-spec to validate completeness
Run fp-generate to create test files

Adding New Features

Run fp-add with a description of the feature
It detects whether to add to an existing suite or create a new one
Updates the spec and generates/updates tests

Maintaining Tests

Run fp-audit to check coverage
Run fp-update to sync tests with spec changes
Run fp-fix to repair failing tests

After Code Changes

Run fp-update-spec to reflect new functionality in specs
Run fp-update to regenerate tests from updated specs

Verifying Beyond Tests

Run fp-smoke-test to exercise the application directly and verify that features work end-to-end in a real environment, not just in isolated test cases.

Key Conventions

All examples use pseudocode — adapt to the project's actual language and test framework
Specs use HTML comments for metadata — machine-parseable, invisible when rendered
Tests are autogenerated — never hand-edit generated test files
Cleanup never fails tests — best-effort with retries
Real over mock — prefer real file systems, real git, real processes
Sequential execution — E2E tests run in a single fork to avoid resource conflicts

Reference Documents

reference/spec-format.md — Complete guide to E2E_TESTS.md format
reference/isolation.md — Test isolation and state leak patterns
reference/cleanup.md — Resilient cleanup and retry patterns
reference/mocking.md — Mock decision framework and patterns
reference/organization.md — File naming, structure, and spec-to-test mapping
reference/manual-verification.md — Manual verification patterns by application type

flightplanner