trust-but-verify
Trust But Verify
Verify that a feature implementation actually matches its plan by testing it in a real browser.
Core principle: Plans describe intent. Code describes implementation. Only the browser shows reality. This skill bridges all three — reading the plan, analyzing the diff, and verifying the result in a live browser.
When to Use
- After completing a feature branch and before merging
- When a plan exists in
docs/plans/for the current work - When the diff touches frontend source files (UI changes)
- When you want confidence that the UI matches the spec
- When recommended by
superpowers:finishing-a-development-branch
Not for: Backend-only changes, API-only work, or branches without a plan.
Process
digraph trust_but_verify {
"Phase 1:\nGather Context\n(subagent)" [shape=box];
"Phase 2:\nPreflight Check\n(main)" [shape=box];
"Phase 3:\nBrowser Verification\n(main + MCP)" [shape=box];
"Phase 4:\nReport\n(subagent)" [shape=box];
"Print summary\n+ link to report" [shape=doublecircle];
"Phase 1:\nGather Context\n(subagent)" -> "Phase 2:\nPreflight Check\n(main)";
"Phase 2:\nPreflight Check\n(main)" -> "Phase 3:\nBrowser Verification\n(main + MCP)";
"Phase 3:\nBrowser Verification\n(main + MCP)" -> "Phase 4:\nReport\n(subagent)";
"Phase 4:\nReport\n(subagent)" -> "Print summary\n+ link to report";
}
Phase 0: Check for App Navigator
Before starting, check if ~/.claude/skills/app-navigator/app-map.md exists.
If it does NOT exist:
"I notice the app hasn't been mapped yet. Running
/app-navigator setupfirst will map all your routes, build login playbooks, and document UI patterns — which makes verification much faster and more accurate.Would you like to run
/app-navigator setupfirst, or proceed without it?"
If the user says yes, invoke the app-navigator skill. When it completes, continue to Phase 1. If the user says no, proceed without it — Phase 1 will derive pages from the plan and diff only.
If it exists: proceed to Phase 1.
Phase 1: Gather Context
Dispatch a subagent (type: general-purpose) with the prompt template from ./analysis-prompt.md.
The subagent reads:
- The ExecPlan from
docs/plans/(find the most recent plan matching the branch name or topic) git diff main...HEADto see what files changedgh pr viewto get PR description (if a PR exists)~/.claude/skills/app-navigator/app-map.md(if it exists)~/.claude/skills/app-navigator/playbooks/(if they exist)
The subagent returns a verification checklist — a structured markdown document listing:
- Pages/routes to visit
- UI elements to verify on each page
- Happy path interactions to perform
- Edge cases to test
- Error states to trigger
- Responsive checks needed (only for pages in the diff)
Phase 2: Preflight Check
App URL: Read ~/.claude/projects/<project>/memory/reference_local_auth.md.
- If found: extract the App URL
- If not found: ask the user for the local app URL (e.g.
http://localhost:5173), save it
Dev server: Check if the app is reachable:
curl -s -o /dev/null -w "%{http_code}" <App URL> 2>/dev/null || echo "unreachable"
If unreachable:
"The dev server at isn't reachable. You'll need to start it. Want me to start it, or will you handle it?"
Setup: Create the verification output directory:
mkdir -p docs/verification
Gate: Do not proceed to Phase 3 until the server is confirmed reachable.
Phase 3: Browser Verification
Initial Load & Authentication:
- Navigate to the App URL with
mcp__playwright__browser_navigate mcp__playwright__browser_snapshotto see what's on screen- If there's a login form: ask the user for credentials (email/password), fill the form, submit, wait for redirect. Save credentials to
reference_local_auth.mdfor future runs. - If there's a workspace/org selector or first-run setup: handle it (select first option or ask user which to pick)
- If the app loads directly: proceed — no auth needed
- If login fails or redirects back to login: ask user to verify credentials
- On future runs, if
reference_local_auth.mdhas saved credentials, use them automatically. Only ask the user again if they fail.
For each checklist item:
-
Navigate to the target page
- Use
mcp__playwright__browser_navigatewith the full URL mcp__playwright__browser_wait_forwithtextset to a known element on the target page- If page doesn't load in 30 seconds: record FAIL, move to next item
- Use
-
Verify elements
mcp__playwright__browser_snapshotto get the page structure- Check each expected element from the checklist against the snapshot
- Record PASS/FAIL for each element
-
Happy path interactions
- Follow the checklist's step-by-step interaction sequence
- Use
mcp__playwright__browser_click,browser_fill_form,browser_type,browser_select_optionas needed - After each interaction,
browser_snapshotto verify the expected outcome - Record PASS/FAIL/CONCERN for each interaction
-
Edge cases and error states
- Follow the checklist's edge case scenarios
- Test empty states, invalid input, boundary values
- Record results
-
Responsive checks (only for pages that changed in the diff)
mcp__playwright__browser_resizeto 1440x900 (desktop) -- screenshotmcp__playwright__browser_resizeto 768x1024 (tablet) -- screenshotmcp__playwright__browser_resizeto 375x812 (mobile) -- screenshot- Record any layout issues
mcp__playwright__browser_resizeto 1440x900 (reset to desktop before next item)
-
Screenshots
mcp__playwright__browser_take_screenshotat key states- Save to
docs/verification/screenshots/<branch>/with naming:<page-slug>-<state>-<viewport>.png - Create the directory if it doesn't exist:
mkdir -p docs/verification/screenshots/<branch>
-
Session handling
- If any page redirects to login: re-authenticate using the same login flow, then resume from the current checklist item
Collect all results as structured markdown to pass to Phase 4.
Phase 4: Report
Dispatch a subagent (type: general-purpose) with the prompt template from ./report-prompt.md.
Pass the subagent:
- The verification results markdown from Phase 3
- The original plan reference
- Screenshot file paths
- Branch name and PR link (if any)
The subagent writes the full report to docs/verification/YYYY-MM-DD-<branch-slug>.md (where <branch-slug> is the branch name with / replaced by -) and returns a concise summary.
Print the summary in conversation. Include:
- The summary counts (PASS/CONCERN/FAIL/OUT-OF-SCOPE)
- Any FAIL items with one-line descriptions
- Link to the full report file
Cleanup: Close the browser session with mcp__playwright__browser_close.
Report Format
The full report follows this structure:
# Verification Report: <branch-name>
**Date:** YYYY-MM-DD
**Plan:** <link to ExecPlan>
**PR:** <link if exists>
**Branch:** <branch> (N commits ahead of main)
## Summary
- X items verified and working
- X concerns noted
- X mismatches or failures
- X out-of-scope observations
## Detailed Findings
### Working as Expected
| Feature | Page | What was verified | Screenshot |
|---------|------|-------------------|------------|
### Mismatches / Broken
| Feature | Expected (from plan) | Actual | Severity | Screenshot |
|---------|---------------------|--------|----------|------------|
### Concerns
| Feature | Issue | Suggestion | Screenshot |
|---------|-------|------------|------------|
### Out of Scope
| Observation | Where | Notes |
|-------------|-------|-------|
## Edge Cases & Error States Tested
| Scenario | Result | Notes |
|----------|--------|-------|
## Responsive Checks
| Page | Desktop | Tablet | Mobile | Notes |
|------|---------|--------|--------|-------|
Red Flags
- Never skip the preflight check. Always verify the server is reachable before browser work.
- Never hardcode credentials. Read from memory or ask the user.
- Never modify code. This skill only verifies -- it does not fix issues it finds.
- Never run services without asking. Check reachability first, ask permission.
- Don't test pages unrelated to the plan. Stay scoped to what changed.
- Don't spend more than 30 seconds per interaction. Record FAIL and move on.
Integration
- Depends on: app-navigator (optional but recommended -- provides playbooks and app map)
- Credential source: Detected automatically from the browser. Saved to
reference_local_auth.mdin project memory after first successful login. - Recommended by: superpowers:finishing-a-development-branch
- Can be invoked after: superpowers:executing-plans, superpowers:subagent-driven-development
More from buildbetter-app/bb-skills
bb-analyze
Perform a non-destructive cross-artifact consistency and quality analysis across spec.md, plan.md, and tasks.md after task generation.
11bb-review
Run a BuildBetter-first UX/usability and/or code review for the current feature.
10bb-plan
Execute the implementation planning workflow using the plan template to generate design artifacts.
9bb-tasks
Generate an actionable, dependency-ordered tasks.md for the feature based on available design artifacts.
9bb-checklist
Generate a custom checklist for the current feature based on user requirements.
9bb-implement
Execute the implementation plan by processing and executing all tasks defined in tasks.md
9