qa-browser-automation
QA Browser Automation
The agent drives Chrome MCP for live browser testing and uses four Python tools for deterministic health scoring, accessibility auditing, visual regression tracking, and report generation.
Quick Start
# Score QA findings (0-100 weighted across 10 categories)
python scripts/qa_health_scorer.py findings.json --threshold 85 --baseline .qa-baselines/latest.json --save-baseline --json
# Audit HTML for WCAG 2.1 violations
python scripts/accessibility_auditor.py page.html --level AA --json
# Track visual regressions
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
python scripts/visual_regression_tracker.py --register ./baselines
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5
# Generate full QA report
python scripts/test_report_generator.py session_data.json --format markdown -o report.md
Tools Overview
| Tool | Input | Output |
|---|---|---|
qa_health_scorer.py |
Findings JSON | Score 0-100, grade A-F, category breakdown, trend data |
accessibility_auditor.py |
HTML file (or stdin) | WCAG violations by level with remediation guidance |
visual_regression_tracker.py |
Baseline + current screenshot dirs | Pass/fail per page, change percentages |
test_report_generator.py |
Session data JSON | Markdown or JSON report with recommendations |
All tools support --json for machine output. Health scorer and regression tracker return exit code 1 on failure (CI-friendly).
Workflow 1: Full Application QA Sweep (11 Phases)
Phase 1-2: Pre-flight and authentication.
- Verify
git statusis clean. Abort if dirty. - Create session directory:
.qa-sessions/{timestamp}/ - Authenticate via Chrome MCP if needed.
Phase 3-4: Orient and explore.
- Use
mcp__claude-in-chrome__read_pageto build sitemap/page map. - Navigate each route. Check
read_console_messagesfor errors,read_network_requestsfor 4xx/5xx. - Test all forms with valid data, empty submissions, and boundary values.
Phase 5: State testing.
- Verify loading states (skeleton screens, not blank), empty states (guides to first action), error states, success states, partial states.
- Four shadow paths per interaction: happy path, nil input, empty input, error upstream.
Phase 6: Cross-device and security.
- Resize to 320px, 768px, 1024px, 1440px, 1920px.
- Check touch targets (44x44px min), layout shifts.
- Verify security headers (CSP, HSTS, X-Frame-Options), cookie flags.
Phase 7-8: Document and score.
- Record every finding with screenshot evidence. No finding without evidence.
- Classify by severity (P0-P4) and category (10 categories).
- Run:
python scripts/qa_health_scorer.py findings.json --baseline .qa-baselines/latest.json
Phase 9: Triage and fix loop.
- P3/P4: AUTO-FIX, commit atomically, verify.
- P0/P1/P2: ASK, present evidence, propose fix, wait for approval.
- After each fix: re-run check. If fail:
git revert. - Hard stop at 50 fixes.
Phase 10-11: Regression check and report.
- Re-visit fixed pages. Verify no new errors.
- Generate report:
python scripts/test_report_generator.py session.json --save-baseline
Validation checkpoint: Health score >= 85. Zero P0 findings. WCAG AA >= 95%.
Workflow 2: Visual Regression Testing
# Set up baseline
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
# Capture and register screenshots
python scripts/visual_regression_tracker.py --register ./baselines
# After changes, compare
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5 --json
# Accept intentional changes
python scripts/visual_regression_tracker.py --update-baseline --baseline ./baselines --current ./screenshots
Pages exceeding the threshold (default 5%) are flagged as regressions. Uses SHA-256 hashing and byte-level comparison.
Workflow 3: Accessibility Audit
python scripts/accessibility_auditor.py page.html --level AA --json
curl -s https://example.com | python scripts/accessibility_auditor.py - --level AAA
What gets checked by level:
- A (Must Fix): Alt text, page language, form labels, headings, duplicate IDs, autoplay media
- AA (Should Fix): Color contrast (4.5:1 text, 3:1 large), heading hierarchy, focus visible, error identification
- AAA (Nice to Have): Enhanced contrast (7:1), extended audio, reading level
Each violation includes: WCAG criterion, severity, element selector, and remediation guidance.
Testing Tiers
| Tier | Duration | Scope |
|---|---|---|
| Quick | 30s | Console errors, broken links, basic a11y, mobile resize |
| Standard | 2-5 min | + Top 10 routes, forms, contrast, Core Web Vitals |
| Deep | 10-20 min | + Full sitemap, state testing, WCAG AA, performance, visual regression, security headers |
| Exhaustive | 30+ min | + Every element, WCAG AAA, all pages performance, 5 breakpoints, auth edge cases, memory leaks |
Health Scoring System
10 weighted categories, score 0-100:
| Category | Weight | Measures |
|---|---|---|
| Functional | 18% | Forms, CRUD, navigation flows |
| Accessibility | 13% | WCAG compliance, keyboard nav |
| Console Errors | 12% | JS errors, unhandled rejections |
| UX Flow | 12% | Logical navigation, clear feedback |
| Performance | 12% | Core Web Vitals within thresholds |
| Visual Consistency | 10% | Layout shifts, alignment, z-index |
| Broken Links | 8% | HTTP 4xx/5xx, dead anchors |
| Content Quality | 5% | Spelling, placeholder text, truncation |
| Security Headers | 5% | CSP, HSTS, cookie flags |
| Mobile Responsive | 5% | Breakpoints, touch targets, no h-scroll |
Severity deductions: P0: -30, P1: -18, P2: -10, P3: -4, P4: -1.
Grades: A (90-100), B (80-89), C (70-79), D (60-69), F (0-59).
Safety Controls
- Clean working tree required -- abort if
git statusdirty. - Max 50 fixes per session -- hard stop.
- Risk accumulator -- component (+5), style (+2), config (+8), revert (+15). Stop at 25% of budget.
- WTF heuristic -- 3 consecutive fix verification failures = stop entirely.
- Atomic commits -- one fix = one commit:
fix(qa): [P{severity}] {description}
Troubleshooting
| Problem | Cause | Solution |
|---|---|---|
| Scorer exits code 1 with no errors | Score below --threshold (default 70) |
Check score in output; raise threshold or fix findings |
Auditor reports parse-error |
Malformed HTML | Verify file is complete; check curl is not returning redirect |
| Regression tracker 100% change on all pages | Baseline manifest empty | Run --init then --register before comparing |
| Findings default to P3/functional | Missing severity or category keys |
Include both keys in each finding dict |
| Chrome MCP returns stale content after SPA nav | DOM updated without full page load | Wait for transition, call read_page again |
References
| Guide | Path |
|---|---|
| Browser Testing Methodology | references/browser_testing_methodology.md |
| WCAG Compliance Guide | references/wcag_compliance_guide.md |
| Performance Benchmarks | references/performance_benchmarks.md |
Integration Points
| Skill | Integration |
|---|---|
code-reviewer |
Health score and findings in PR review context |
senior-frontend |
Visual regression baselines align with component library |
senior-devops |
Health score gates CI/CD via exit code |
senior-secops |
Security header findings escalate to security review |
incident-commander |
P0 findings trigger incident response |
Last Updated: April 2026 Version: 2.1.0