skills/borghei/claude-skills/qa-browser-automation

qa-browser-automation

Installation
SKILL.md

QA Browser Automation

The agent drives Chrome MCP for live browser testing and uses four Python tools for deterministic health scoring, accessibility auditing, visual regression tracking, and report generation.


Quick Start

# Score QA findings (0-100 weighted across 10 categories)
python scripts/qa_health_scorer.py findings.json --threshold 85 --baseline .qa-baselines/latest.json --save-baseline --json

# Audit HTML for WCAG 2.1 violations
python scripts/accessibility_auditor.py page.html --level AA --json

# Track visual regressions
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
python scripts/visual_regression_tracker.py --register ./baselines
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5

# Generate full QA report
python scripts/test_report_generator.py session_data.json --format markdown -o report.md

Tools Overview

Tool Input Output
qa_health_scorer.py Findings JSON Score 0-100, grade A-F, category breakdown, trend data
accessibility_auditor.py HTML file (or stdin) WCAG violations by level with remediation guidance
visual_regression_tracker.py Baseline + current screenshot dirs Pass/fail per page, change percentages
test_report_generator.py Session data JSON Markdown or JSON report with recommendations

All tools support --json for machine output. Health scorer and regression tracker return exit code 1 on failure (CI-friendly).


Workflow 1: Full Application QA Sweep (11 Phases)

Phase 1-2: Pre-flight and authentication.

  • Verify git status is clean. Abort if dirty.
  • Create session directory: .qa-sessions/{timestamp}/
  • Authenticate via Chrome MCP if needed.

Phase 3-4: Orient and explore.

  • Use mcp__claude-in-chrome__read_page to build sitemap/page map.
  • Navigate each route. Check read_console_messages for errors, read_network_requests for 4xx/5xx.
  • Test all forms with valid data, empty submissions, and boundary values.

Phase 5: State testing.

  • Verify loading states (skeleton screens, not blank), empty states (guides to first action), error states, success states, partial states.
  • Four shadow paths per interaction: happy path, nil input, empty input, error upstream.

Phase 6: Cross-device and security.

  • Resize to 320px, 768px, 1024px, 1440px, 1920px.
  • Check touch targets (44x44px min), layout shifts.
  • Verify security headers (CSP, HSTS, X-Frame-Options), cookie flags.

Phase 7-8: Document and score.

  • Record every finding with screenshot evidence. No finding without evidence.
  • Classify by severity (P0-P4) and category (10 categories).
  • Run: python scripts/qa_health_scorer.py findings.json --baseline .qa-baselines/latest.json

Phase 9: Triage and fix loop.

  • P3/P4: AUTO-FIX, commit atomically, verify.
  • P0/P1/P2: ASK, present evidence, propose fix, wait for approval.
  • After each fix: re-run check. If fail: git revert.
  • Hard stop at 50 fixes.

Phase 10-11: Regression check and report.

  • Re-visit fixed pages. Verify no new errors.
  • Generate report: python scripts/test_report_generator.py session.json --save-baseline

Validation checkpoint: Health score >= 85. Zero P0 findings. WCAG AA >= 95%.


Workflow 2: Visual Regression Testing

# Set up baseline
python scripts/visual_regression_tracker.py --init --baseline-dir ./baselines
# Capture and register screenshots
python scripts/visual_regression_tracker.py --register ./baselines
# After changes, compare
python scripts/visual_regression_tracker.py --baseline ./baselines --current ./screenshots --threshold 5 --json
# Accept intentional changes
python scripts/visual_regression_tracker.py --update-baseline --baseline ./baselines --current ./screenshots

Pages exceeding the threshold (default 5%) are flagged as regressions. Uses SHA-256 hashing and byte-level comparison.


Workflow 3: Accessibility Audit

python scripts/accessibility_auditor.py page.html --level AA --json
curl -s https://example.com | python scripts/accessibility_auditor.py - --level AAA

What gets checked by level:

  • A (Must Fix): Alt text, page language, form labels, headings, duplicate IDs, autoplay media
  • AA (Should Fix): Color contrast (4.5:1 text, 3:1 large), heading hierarchy, focus visible, error identification
  • AAA (Nice to Have): Enhanced contrast (7:1), extended audio, reading level

Each violation includes: WCAG criterion, severity, element selector, and remediation guidance.


Testing Tiers

Tier Duration Scope
Quick 30s Console errors, broken links, basic a11y, mobile resize
Standard 2-5 min + Top 10 routes, forms, contrast, Core Web Vitals
Deep 10-20 min + Full sitemap, state testing, WCAG AA, performance, visual regression, security headers
Exhaustive 30+ min + Every element, WCAG AAA, all pages performance, 5 breakpoints, auth edge cases, memory leaks

Health Scoring System

10 weighted categories, score 0-100:

Category Weight Measures
Functional 18% Forms, CRUD, navigation flows
Accessibility 13% WCAG compliance, keyboard nav
Console Errors 12% JS errors, unhandled rejections
UX Flow 12% Logical navigation, clear feedback
Performance 12% Core Web Vitals within thresholds
Visual Consistency 10% Layout shifts, alignment, z-index
Broken Links 8% HTTP 4xx/5xx, dead anchors
Content Quality 5% Spelling, placeholder text, truncation
Security Headers 5% CSP, HSTS, cookie flags
Mobile Responsive 5% Breakpoints, touch targets, no h-scroll

Severity deductions: P0: -30, P1: -18, P2: -10, P3: -4, P4: -1.

Grades: A (90-100), B (80-89), C (70-79), D (60-69), F (0-59).


Safety Controls

  • Clean working tree required -- abort if git status dirty.
  • Max 50 fixes per session -- hard stop.
  • Risk accumulator -- component (+5), style (+2), config (+8), revert (+15). Stop at 25% of budget.
  • WTF heuristic -- 3 consecutive fix verification failures = stop entirely.
  • Atomic commits -- one fix = one commit: fix(qa): [P{severity}] {description}

Troubleshooting

Problem Cause Solution
Scorer exits code 1 with no errors Score below --threshold (default 70) Check score in output; raise threshold or fix findings
Auditor reports parse-error Malformed HTML Verify file is complete; check curl is not returning redirect
Regression tracker 100% change on all pages Baseline manifest empty Run --init then --register before comparing
Findings default to P3/functional Missing severity or category keys Include both keys in each finding dict
Chrome MCP returns stale content after SPA nav DOM updated without full page load Wait for transition, call read_page again

References

Guide Path
Browser Testing Methodology references/browser_testing_methodology.md
WCAG Compliance Guide references/wcag_compliance_guide.md
Performance Benchmarks references/performance_benchmarks.md

Integration Points

Skill Integration
code-reviewer Health score and findings in PR review context
senior-frontend Visual regression baselines align with component library
senior-devops Health score gates CI/CD via exit code
senior-secops Security header findings escalate to security review
incident-commander P0 findings trigger incident response

Last Updated: April 2026 Version: 2.1.0

Weekly Installs
45
GitHub Stars
103
First Seen
3 days ago