testing-evidence-collector
QA Evidence Collection Guide
Validate implementations against specifications using visual evidence, reproducible test commands, and concrete findings. Every claim in a report must be backed by a screenshot, a test result, or a copy-pasteable command that reproduces the finding.
Evidence Collection Process
Step 1: Capture Baseline Evidence
# Verify the dev server is running and accessible
curl -s -o /dev/null -w "%{http_code}" http://localhost:3000
# Capture full-page screenshots at standard viewports
npx playwright screenshot --viewport-size=1280,720 http://localhost:3000 qa-evidence/desktop-full.png
npx playwright screenshot --viewport-size=375,667 http://localhost:3000 qa-evidence/mobile-full.png
npx playwright screenshot --viewport-size=768,1024 http://localhost:3000 qa-evidence/tablet-full.png
# List actual project files to verify implementation exists
ls -la src/components/ || ls -la app/components/ || ls -la *.html
Step 2: Compare Against Specification
- For each stated requirement, locate the corresponding visual element in the screenshot.
- Quote the exact spec text next to what you observe. Example: Spec says "three-column layout" -- screenshot shows two columns.
Step 3: Test Interactive Elements
- Run the Playwright capture script for each interactive element to get before/after screenshots.
- Log the exact selector used. Record whether the element responded as specified.
Step 4: Write the Evidence Report
- Fill in every field with actual observed data. Attach screenshot paths for every finding.
- Mark each spec requirement as PASS, FAIL, or PARTIAL with a one-line explanation.
See Playwright Capture for the full screenshot utility and test spec code.
Report Structure
Report guidelines
- Every reported issue must include a screenshot file path and the exact CSS selector or
data-testidused to locate the element. - Before/after screenshots use the same viewport size (1280x720 for desktop, 375x667 for mobile).
- Test commands in the report must be copy-pasteable: running them produces the same result without modification.
- Each spec requirement is individually listed with a PASS / FAIL / PARTIAL verdict and a one-line explanation.
- No issue is reported without a reproduction step (either a Playwright test command or a manual step sequence).
- The evidence report must state the exact URL, browser, viewport, and timestamp of the test session.
- Findings reference the specification by quoting exact text, not paraphrasing.
- Report only what is observed. Do not speculate about features working "behind the scenes."
- Compare against the specification, not against an idealized version. Do not add requirements the spec never stated.
See Report Example for a complete accordion redesign evidence report.
Reference
Verdict Definitions
| Verdict | Meaning |
|---|---|
| PASS | Implementation matches the spec requirement exactly |
| PARTIAL | Implementation is present but deviates from spec (e.g., wrong timing, incomplete behavior) |
| FAIL | Implementation is missing or contradicts the spec requirement |
Issue Severity Levels
| Severity | Meaning |
|---|---|
| High | Blocks release; spec requirement not met, accessibility broken, or data loss risk |
| Medium | Deviates from spec but workaround exists; should fix before release |
| Low | Cosmetic or minor deviation; can ship with known issue documented |
Standard Viewports
| Device | Width x Height |
|---|---|
| Desktop | 1280 x 720 |
| Tablet | 768 x 1024 |
| Mobile | 375 x 667 |
Scripts
scripts/capture_screenshot.py
Capture a full-page screenshot of a URL using Playwright. Auto-detects available Playwright installations (Node.js, Python, or npx). Configurable viewport size and wait time before capture. Falls back to a helpful error message if Playwright is not installed.
scripts/capture_screenshot.py http://localhost:3000 screenshot.png
scripts/capture_screenshot.py --full-page --width 1920 --height 1080 http://example.com page.png
scripts/capture_screenshot.py --wait 2000 http://localhost:3000/dashboard dashboard.png
scripts/generate_report.py
Generate a markdown evidence report template from a directory of screenshots. Includes environment info (date, browser, viewports), screenshot inventory (file name, size, dimensions), embedded screenshot references, and placeholder finding templates for the agent to fill in.
scripts/generate_report.py ./qa-evidence
scripts/generate_report.py --project "Login Page Redesign" --url http://localhost:3000 ./screenshots -o report.md
More from peterhdd/agent-skills
engineering-backend-architect
Architect scalable backend systems, database schemas, APIs, and cloud infrastructure for robust server-side applications. Use when you need microservice vs monolith decisions, database indexing strategies, API versioning, event-driven architecture, ETL pipelines, WebSocket streaming, data modeling, query optimization, or cloud-native service design with high reliability and sub-20ms query performance.
40engineering-mobile-app-builder
Build native and cross-platform mobile applications for iOS and Android with optimized performance and platform integration. Use when you need SwiftUI or Jetpack Compose development, React Native or Flutter cross-platform apps, offline-first architecture, biometric authentication, push notifications, deep linking, app startup optimization, or mobile-specific UX patterns and gesture handling.
38engineering-system-designer
Design distributed systems, define architecture for scalability and reliability, or create system design documents. Use when you need component diagrams, data flow analysis, capacity planning, database sharding strategies, API contract design, failure mode analysis, CAP theorem tradeoffs, monolith-to-microservice migration, or architecture decision records for new or existing systems.
34engineering-rapid-prototyper
Build functional prototypes and MVPs at maximum speed to validate ideas through working software. Use when you need proof-of-concept development, rapid iteration on user feedback, no-code or low-code solutions, backend-as-a-service integration, A/B testing scaffolding, quick feature validation, or modular architectures designed for fast experimentation and learning.
33engineering-security-engineer
Secure applications, infrastructure, and pipelines through threat modeling, vulnerability assessment, and security architecture. Use when you need OWASP Top 10 remediation, threat modeling (STRIDE/DREAD), penetration testing methodology, secrets management, dependency vulnerability scanning, authentication/authorization architecture, CSP and security headers, API security, supply chain security, compliance frameworks (SOC 2, GDPR, HIPAA), incident response, or security-focused code review.
32engineering-ml-engineer
Build and deploy machine learning models with PyTorch, HuggingFace Transformers, and scikit-learn. Use when you need model training, fine-tuning with LoRA/QLoRA, text classification, NER, embeddings, RAG pipelines, dataset preparation, model evaluation, hyperparameter tuning, ONNX export, quantization, inference optimization, or classical ML with XGBoost and scikit-learn.
30