cmd-pr-test-plan
PR Test Plan
Generate a manual test plan for the changes in the current branch. The plan should focus on what a developer/reviewer needs to manually verify — real user flows, integration behavior, and observable outcomes. Leave input validation, error branches, and edge cases to unit tests.
Instructions
Step 1: Detect base branch
Try these methods in order:
BASE_BRANCH=$(gh repo view --json defaultBranchRef -q '.defaultBranchRef.name' 2>/dev/null)
BASE_BRANCH=$(git remote show origin 2>/dev/null | grep "HEAD branch" | cut -d: -f2 | xargs)
If both fail, ask the user.
Step 2: Gather change context
Run all of these and capture the results:
git diff $BASE_BRANCH...HEAD --name-only
git diff $BASE_BRANCH...HEAD --stat
git log $BASE_BRANCH..HEAD --oneline
Step 3: Detect project tooling
Check what's available in the project so you can reference real commands (not generic guesses):
- Makefile targets:
make help 2>/dev/null || grep -E '^[a-z_-]+:.*##' Makefile makefiles/*.mk 2>/dev/null - Package manager: Look for
pyproject.toml(uv/pip),package.json(npm/pnpm),Cargo.toml(cargo),go.mod(go) - Test runners: Look for
pytest.ini,pyproject.toml [tool.pytest],jest.config.*,.mocharc.* - Project docs: Read
AGENTS.md,CLAUDE.md,CONTRIBUTING.md, orREADME.mdfor project-specific test/build instructions - CI config: Check
.github/workflows/,Makefile, orTaskfile.ymlfor existing test commands
Prefer project Makefile targets and documented commands over raw tool invocations. If the project has make test_unit, use that instead of uv run pytest tests/unit/.
Step 4: Categorize changes and confirm with user
Group changed files into categories. Common categories (adapt based on actual changes):
- Feature code -- new commands, API routes, services, UI components
- Configuration / docs -- config files, markdown, schemas, manifests
- Tests -- new or modified test files
- Build / deploy -- Makefiles, CI, Dockerfiles, scripts
- Deletions -- removed files or deprecated code
Present the detected categories to the user with a summary of what changed in each. Ask them to confirm or adjust before generating the full plan.
Example confirmation format:
I found 3 change areas in this branch:
1. CLI agent mode -- new --agent flag on setup command (cli/commands/setup.py, cli/cli.py)
2. Skills restructuring -- SKILL.md rewrite, new reference docs, deleted shell scripts
3. Test fixes -- E2E test stability improvements (4 test files)
Should I generate the test plan for all 3, or would you like to adjust?
Step 5: Generate the test plan
For each confirmed category, generate a test section following these rules:
Severity & importance markers
Tag every test step with one of these emojis in the step title:
| Emoji | Meaning | When to use |
|---|---|---|
| 🔴 | Critical | Core functionality — if this fails, the feature is broken |
| 🟢 | Expected | Standard behavior that should work — moderate confidence but worth verifying |
| 🔵 | Nice-to-have | Polish, UX, non-blocking — skip if short on time |
Example: 1a. 🔴 **Pre-register a profile end-to-end**
Formatting rules
- Numbered sections with separator lines (
---) between them - Numbered sub-steps within each section (1a, 1b, 1c...)
- Each sub-step has an emoji tag + bold title describing what to test
- Each sub-step has a copy-paste command in a fenced code block (or manual UI steps if applicable)
- Each sub-step has a "Verify:" line stating what success looks like
- One command per code block -- never stack multiple commands in one block with comments between them
- Use Makefile targets when available instead of raw tool commands
- For commands requiring env vars, put them inline:
GROVE_API_URL=http://localhost:8000 make test_e2e_suite
What to focus on (and what to skip)
DO include — things you must verify manually:
- Happy-path user flows end-to-end (the main thing the feature does)
- Integration points — does component A actually talk to component B correctly?
- State transitions — does data persist, propagate, and display correctly across the system?
- Resumption / retry behavior — if a multi-step process fails midway, does retry work?
- UI rendering — does the new section/field/page show up and look right?
- API response shape — do new fields appear in real responses?
- Existing behavior preserved — does the change break anything that was already working?
DO NOT include — leave these for unit tests:
- Invalid input validation (wrong types, missing fields, malformed data)
- Boundary values and off-by-one checks
- Error message wording verification
- Permission/auth edge cases (401/403 responses)
- Schema validation failures
Quick smoke test section
Always end with a "Quick Smoke Test" section -- the 2-3 commands a reviewer would run if they only have 60 seconds. Tag each with the appropriate emoji.
Step 6: Write output
- Write the plan to
TEST_PLAN.mdin the repo root - Print a summary to the terminal showing the section count and a one-liner per section
Terminal summary format:
Wrote TEST_PLAN.md with 4 sections:
1. CLI Agent Mode -- 4 test steps (🔴×2, 🟢×1, 🔵×1)
2. Skills Restructuring -- 3 test steps (🔴×1, 🟢×2)
3. Automated Tests -- 2 test steps (🟢×2)
4. Quick Smoke Test -- 3 commands
Run `cat TEST_PLAN.md` to view the full plan.
Style Reference
Follow the same style used in cmd-pr-description:
- Bold the what, plain text the how
- No fluff -- every step must verify something real that a human needs to see
- Copy-paste ready -- a reviewer should never need to edit a command
- Separate code blocks -- one command per block, bold header above it
More from olshansk/agent-skills
session-commit
Capture learnings from the current coding session and update AGENTS.md. Use when the user asks to close the loop, run session-commit, record best practices, or update agent instructions based on recent work.
30skills-dashboard
Scrape skills.sh and generate an interactive HTML dashboard showing skill distribution by publisher, installs, and categories. Rerun anytime to get fresh data.
29cmd-clean-code
Improve code readability without altering functionality using idiomatic best practices
25cmd-idiot-proof-docs
Simplify documentation for clarity and scannability with approval-gated edits
18cmd-rss-feed-generator
Generate Python RSS feed scrapers from blog websites, integrated with hourly GitHub Actions
18cmd-proofread
Proofread posts before publishing for spelling, grammar, repetition, logic, weak arguments, broken links, and optionally reformat for skimmability or shape the writing vibe toward a known author's style
17