verifying-implementations
Verifying Implementations
Your job is to try to break it, not confirm it works.
CRITICAL: Do NOT modify the project. You are strictly prohibited from creating, modifying, or deleting any project files during verification. Use temporary directories if you need scratch space.
Workflow
Copy this checklist and track progress:
Verification:
- [ ] 1. List all behaviors the implementation should produce
- [ ] 2. List edge cases and failure modes
- [ ] 3. Run commands and record evidence for each check
- [ ] 4. Apply adversarial probes
- [ ] 5. Re-run a sample of checks to confirm consistency
- [ ] 6. Issue verdict
1. Choose verification strategy by change type
| Change type | Strategy |
|---|---|
| Tests | Run the full suite. Read actual output line by line. |
| Frontend | Start dev server → check console for errors → check network tab → test user flows |
| Backend/API | Start server → send actual HTTP requests → check response codes and bodies |
| CLI/Scripts | Run with representative inputs, edge cases, and malformed input |
| Bug fixes | First reproduce the original bug → then verify the fix → then run regression tests |
| Config changes | Verify the config loads correctly, test with missing/malformed config |
2. Record evidence for EVERY check
A check without a command and observed output is not a check — it's a guess.
### Check: [what you're verifying]
**Command:** [exact command run]
**Output:** [actual terminal/browser output observed]
**Result:** PASS | FAIL
3. Recognize your own failure patterns
Verification avoidance — You read the code, narrate what it does, write "PASS," and move on. Reading is not verification. Run it.
Seduced by the first 80% — The UI looks polished, the happy path works, so you conclude it's done. But state persistence is broken, error handling is missing, or edge cases crash. Test beyond the happy path.
Watch for these rationalizations and stop yourself:
- "The code looks correct based on my reading" — run it.
- "The implementer's tests already pass" — the implementer is an LLM. Verify independently.
- "This is probably fine" — probably is not verified. Run it.
- "This would take too long" — not your call. Verify.
- "I don't have the right tools" — check what's actually available before giving up.
4. Adversarial probes
Go beyond happy-path testing:
- Boundary values: 0, -1, empty string, very long strings (10K chars), unicode, special characters, MAX_INT
- Idempotency: run the same mutating operation twice — does it break or produce duplicates?
- Missing data: what happens when expected files, configs, env vars, or DB records don't exist?
- Concurrency: parallel requests to create-if-not-exists paths — race conditions? duplicate entries?
- Malformed input: invalid JSON, wrong types, extra fields, missing required fields
- Permissions: what happens without auth? With expired auth? With wrong role?
5. Verdict
End your verification with exactly one of:
- VERDICT: PASS — every check has a command, output, and PASS result
- VERDICT: FAIL — at least one check failed. List every failure with command and output.
- VERDICT: PARTIAL — some checks passed, some could not be verified due to environment limitations (no database access, no browser, etc.). List exactly what could not be verified and why.
Rules:
- Every PASS must have a command and observed output. No exceptions.
- You cannot assign yourself PASS based on code reading alone.
- If you cannot run a check, it is PARTIAL, never PASS.
- After issuing FAIL, list concrete steps to fix each failure.
Scripts
- run-checks.sh — Auto-detects test framework, linter, type checker, and build system for the current project. Run it to get a structured PASS/FAIL report in one shot instead of manually discovering and running each tool.
./run-checks.sh— run all checks./run-checks.sh --tests-only— skip lint and type checks./run-checks.sh --lint-only— skip tests
References
For the full report template, adversarial probe library, and per-change-type verification strategies:
- Verification report template — Complete markdown report template with check format, adversarial probe library (input validation, state, error handling, security), and minimum check counts by change type.
More from beltonk/claude-code-agent-skills
managing-memories
Covers the full memory lifecycle — when to save, what format to use, how to organize and deduplicate, how to recall relevant memories, and what to never persist. Use at natural breakpoints to capture user preferences, corrections, and project conventions, and at session start to load relevant context.
4scaffolding-projects
Provides a structured approach to starting new features or projects. Guides the agent through understanding requirements, exploring existing code, planning, incremental implementation, and verification. Use when asked to build something new — a feature, module, service, or project — to avoid jumping into code without context.
4handing-off-sessions
Captures structured session state for resuming work in a new session or handing off to another agent. Use at the end of a session, before context limits, or when the user asks to save progress. Not needed for trivial sessions (quick questions, one-line answers).
4compacting-context
Provides a structured 9-section summarization template for compressing long conversations while preserving critical details. Use when a session approaches context limits and history must be compressed without losing user intent, file changes, errors, or next steps.
4agentic-standards
Foundational behavioral standards for any AI agent — safety/reversibility framework, output quality, memory conventions, and prompt injection defense. Applies to all agent interactions including chat, analysis, writing, debugging, and coding. Use when setting up an agent, onboarding to a new project, or when behavioral baseline guidance is needed. For coding-specific rules, also load coding-practices.
4coordinating-agents
Provides a coordinator workflow for orchestrating multiple sub-agents on complex tasks. Covers the research-synthesis-implementation-verification phases, self-contained worker prompt rules, error handling, and anti-patterns. Use when a task has independent sub-problems that benefit from parallel work or when the scope exceeds what a single agent pass can handle reliably.
1