QA Browser Automation

The agent drives Chrome MCP for live browser testing and uses four Python tools for deterministic health scoring, accessibility auditing, visual regression tracking, and report generation.

Quick Start

# Score QA findings (0-100 weighted across 10 categories)
python scripts/qa_health_scorer.py findings.json --threshold 85 --baseline .qa-baselines/latest.json --save-baseline --json

# Audit HTML for WCAG 2.1 violations
python scripts/accessibility_auditor.py page.html --level AA --json

# Track visual regressions
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
python scripts/visual_regression_tracker.py --register ./baselines
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5

# Generate full QA report
python scripts/test_report_generator.py session_data.json --format markdown -o report.md

Tools Overview

Tool	Input	Output
`qa_health_scorer.py`	Findings JSON	Score 0-100, grade A-F, category breakdown, trend data
`accessibility_auditor.py`	HTML file (or stdin)	WCAG violations by level with remediation guidance
`visual_regression_tracker.py`	Baseline + current screenshot dirs	Pass/fail per page, change percentages
`test_report_generator.py`	Session data JSON	Markdown or JSON report with recommendations

All tools support --json for machine output. Health scorer and regression tracker return exit code 1 on failure (CI-friendly).

Workflow 1: Full Application QA Sweep (11 Phases)

Phase 1-2: Pre-flight and authentication.

Verify git status is clean. Abort if dirty.
Create session directory: .qa-sessions/{timestamp}/
Authenticate via Chrome MCP if needed.

Phase 3-4: Orient and explore.

Use mcp__claude-in-chrome__read_page to build sitemap/page map.
Navigate each route. Check read_console_messages for errors, read_network_requests for 4xx/5xx.
Test all forms with valid data, empty submissions, and boundary values.

Phase 5: State testing.

Verify loading states (skeleton screens, not blank), empty states (guides to first action), error states, success states, partial states.
Four shadow paths per interaction: happy path, nil input, empty input, error upstream.

Phase 6: Cross-device and security.

Resize to 320px, 768px, 1024px, 1440px, 1920px.
Check touch targets (44x44px min), layout shifts.
Verify security headers (CSP, HSTS, X-Frame-Options), cookie flags.

Phase 7-8: Document and score.

Record every finding with screenshot evidence. No finding without evidence.
Classify by severity (P0-P4) and category (10 categories).
Run: python scripts/qa_health_scorer.py findings.json --baseline .qa-baselines/latest.json

Phase 9: Triage and fix loop.

P3/P4: AUTO-FIX, commit atomically, verify.
P0/P1/P2: ASK, present evidence, propose fix, wait for approval.
After each fix: re-run check. If fail: git revert.
Hard stop at 50 fixes.

Phase 10-11: Regression check and report.

Re-visit fixed pages. Verify no new errors.
Generate report: python scripts/test_report_generator.py session.json --save-baseline

Validation checkpoint: Health score >= 85. Zero P0 findings. WCAG AA >= 95%.

Workflow 2: Visual Regression Testing

# Set up baseline
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
# Capture and register screenshots
python scripts/visual_regression_tracker.py --register ./baselines
# After changes, compare
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5 --json
# Accept intentional changes
python scripts/visual_regression_tracker.py --update-baseline --baseline ./baselines --current ./screenshots

Pages exceeding the threshold (default 5%) are flagged as regressions. Uses SHA-256 hashing and byte-level comparison.

Workflow 3: Accessibility Audit

python scripts/accessibility_auditor.py page.html --level AA --json
curl -s https://example.com | python scripts/accessibility_auditor.py - --level AAA

What gets checked by level:

A (Must Fix): Alt text, page language, form labels, headings, duplicate IDs, autoplay media
AA (Should Fix): Color contrast (4.5:1 text, 3:1 large), heading hierarchy, focus visible, error identification
AAA (Nice to Have): Enhanced contrast (7:1), extended audio, reading level

Each violation includes: WCAG criterion, severity, element selector, and remediation guidance.

Testing Tiers

Tier	Duration	Scope
Quick	30s	Console errors, broken links, basic a11y, mobile resize
Standard	2-5 min	+ Top 10 routes, forms, contrast, Core Web Vitals
Deep	10-20 min	+ Full sitemap, state testing, WCAG AA, performance, visual regression, security headers
Exhaustive	30+ min	+ Every element, WCAG AAA, all pages performance, 5 breakpoints, auth edge cases, memory leaks

Health Scoring System

10 weighted categories, score 0-100:

Category	Weight	Measures
Functional	18%	Forms, CRUD, navigation flows
Accessibility	13%	WCAG compliance, keyboard nav
Console Errors	12%	JS errors, unhandled rejections
UX Flow	12%	Logical navigation, clear feedback
Performance	12%	Core Web Vitals within thresholds
Visual Consistency	10%	Layout shifts, alignment, z-index
Broken Links	8%	HTTP 4xx/5xx, dead anchors
Content Quality	5%	Spelling, placeholder text, truncation
Security Headers	5%	CSP, HSTS, cookie flags
Mobile Responsive	5%	Breakpoints, touch targets, no h-scroll

Severity deductions: P0: -30, P1: -18, P2: -10, P3: -4, P4: -1.

Grades: A (90-100), B (80-89), C (70-79), D (60-69), F (0-59).

Safety Controls

Clean working tree required -- abort if git status dirty.
Max 50 fixes per session -- hard stop.
Risk accumulator -- component (+5), style (+2), config (+8), revert (+15). Stop at 25% of budget.
WTF heuristic -- 3 consecutive fix verification failures = stop entirely.
Atomic commits -- one fix = one commit: fix(qa): [P{severity}] {description}

Troubleshooting

Problem	Cause	Solution
Scorer exits code 1 with no errors	Score below `--threshold` (default 70)	Check score in output; raise threshold or fix findings
Auditor reports `parse-error`	Malformed HTML	Verify file is complete; check curl is not returning redirect
Regression tracker 100% change on all pages	Baseline manifest empty	Run `--init` then `--register` before comparing
Findings default to P3/functional	Missing `severity` or `category` keys	Include both keys in each finding dict
Chrome MCP returns stale content after SPA nav	DOM updated without full page load	Wait for transition, call `read_page` again

References

Guide	Path
Browser Testing Methodology	`references/browser_testing_methodology.md`
WCAG Compliance Guide	`references/wcag_compliance_guide.md`
Performance Benchmarks	`references/performance_benchmarks.md`

Integration Points

Skill	Integration
`code-reviewer`	Health score and findings in PR review context
`senior-frontend`	Visual regression baselines align with component library
`senior-devops`	Health score gates CI/CD via exit code
`senior-secops`	Security header findings escalate to security review
`incident-commander`	P0 findings trigger incident response

Last Updated: April 2026 Version: 2.1.0

qa-browser-automation