sg-visual-run
/sg-visual-run — Execute Visual Tests
Execute YAML test manifests using agent-browser (Playwright CLI). Hybrid execution: mechanical steps run directly, complex assertions delegate to LLM evaluation.
Recommended model: Sonnet 4.6. This skill runs scripted steps (click, fill, screenshot) + lightweight LLM assertions. Opus 4.7 provides no measurable quality gain here. Use
/model sonnetbefore invoking to save Opus weekly quota.
Invocations
| Command | Behavior |
|---|---|
/sg-visual-run |
Interactive — asks what to test |
/sg-visual-run <text> |
Natural language — figures out what tests to run |
/sg-visual-run --from-audit |
Run tests for impacted_ui_routes from audit-results.json |
/sg-visual-run --diff=main |
Run tests impacted by changes since main |
/sg-visual-run --all |
Full suite (skip interactive menu) |
/sg-visual-run --regressions |
Re-run tests that failed last run |
For full flag parsing rules, interactive/natural-language/audit flows, and route-to-manifest matching: see references/invocation-modes.md.
Pre-flight
- Verify
agent-browser --versionis available - Read
visual-tests/_config.yaml— fail if missing (tell user to run/sg-visual-discover) - Verify
{base_url}is reachable:agent-browser open {base_url}, check no error - Create
{screenshots_dir}if missing - Read
visual-tests/_regressions.yaml(create empty if missing)
Build execution list
Priority order:
--from-audit→ severity-ordered list fromimpacted_ui_routes(see invocation-modes.md)--diffor "Only what changed" → diff-based routes + regressions- Natural language → intent analysis + generate missing tests
--regressions→ from_regressions.yaml, ordered bylast_faileddesc--allor "Full suite" → all manifests, regressions first, then priorityhigh→medium→low
Always skip manifests with deprecated: true. Regressions among matched tests always run first (except --from-audit, where severity wins).
Execution strategy
All browser tests run sequentially in a single browser session. One login, one browser, one agent.
CRITICAL: NEVER call multiple
agent-browsercommands in parallel Bash calls. agent-browser uses a single Playwright daemon. Parallel calls cause navigations to race to the same page, producing wrong URLs and corrupted screenshots. Even separate Bash tool calls in the same message execute concurrently. Always chain browser commands sequentially.
Sequential execution with a single auth is also faster: no re-login overhead, no session conflicts, no retries.
Session expiry detection
After EVERY agent-browser open {url}, verify navigation succeeded:
1. agent-browser get url
2. Compare actual vs expected URL
3. If actual != expected AND (actual == "/" OR contains "/login"):
→ Session expired. Re-login:
a. Execute _shared/login.yaml steps
b. Retry the original navigation
c. If still wrong URL: mark test as ERROR
4. If actual matches expected: continue
Catches silent session expiry — the most common failure mode in long runs (>30 min). Without this check, tests screenshot the wrong page and mark PASS.
Progress reporting
Print a progress line after each test completes:
[sg-visual-run] Test {current}/{total} — {test-name} ({PASS|FAIL|STALE|ERROR}) — ~{remaining} min remaining
Remaining time = (elapsed_seconds / tests_completed) * tests_remaining / 60. Update after each test.
Execution loop
For each manifest:
Step 0: Isolation
agent-browser open {base_url}
Every test starts from the base URL. No state carries over.
Step 1: Auth
If requires_auth: true, execute _shared/login.yaml steps.
Optimization: After first successful login, session persists if ALL: (1) no login form in snapshot (no username/password inputs), (2) authenticated UI element visible (user menu, avatar, logout), (3) no redirect away from protected URL. Any check fails → re-login. If auth fails mid-run, re-login and retry once.
Step 2: Execute steps
For each step in the manifest's steps: array, run the corresponding action. For action semantics, variable interpolation, include resolution, screenshot validation rules, and hybrid llm-* actions: see references/action-reference.md.
Screenshot validation is MANDATORY — never skip. Every screenshot must be read via the Read tool and visually inspected for errors before proceeding. A screenshot showing an error = test FAIL, regardless of other assertions. See action-reference.md for the full rule.
Step 3: Record result
PASS / FAIL / STALE / ERROR / SKIP — mapping in action-reference.md.
Browser crash recovery
If any agent-browser command returns non-zero exit code or times out:
agent-browser close(ignore errors)agent-browser open {base_url}- If
requires_auth: re-login - Retry the failed step once
- If retry fails → test
ERROR, next test - If 3 consecutive
ERRORacross different tests → abort entire run with "Browser unstable — check agent-browser installation"
Update regressions
After all tests complete, update visual-tests/_regressions.yaml:
- FAIL / STALE / ERROR: update or add entry (
consecutive_passes: 0) - PASS for a test in regressions: increment
consecutive_passes consecutive_passes >= 3: remove from regressions (resolved)
Full YAML format: see references/report-formats.md.
Generate report
Write report to {report_path} (default: visual-tests/_results/report.md) with sections: Summary, Failures, Stale Tests, Generated Tests, Regressions Status, All Results.
Full template: references/report-formats.md.
Output
Display a concise summary: pass/total, failures (one line each), stale tests (with "run /sg-visual-discover" hint), generated tests. Full format: references/report-formats.md.
agent-browser reference
Basic commands (cover ~60% of tests)
| Command | Usage | Example |
|---|---|---|
open <url> |
Navigate | agent-browser open http://localhost:3000 |
snapshot |
Accessibility tree with refs | agent-browser snapshot |
click <ref> |
Click by ref | agent-browser click @e12 |
fill <ref> <text> |
Clear and fill input | agent-browser fill @e10 "alex" |
upload <sel> <files> |
Upload file | agent-browser upload "#file-input" ./test.md |
eval <js> |
Run JS in page | agent-browser eval 'document.querySelector("input").id' |
screenshot <path> |
Viewport screenshot | agent-browser screenshot /tmp/x.png |
screenshot --full <path> |
Full-page screenshot | agent-browser screenshot --full /tmp/x.png |
wait <selector> [timeout] |
Wait for element | agent-browser wait "#result" 5000 |
find <text> |
Find by visible text | agent-browser find "Submit" |
get url |
Current URL | agent-browser get url |
close |
Close browser | agent-browser close |
Advanced interactions
When to read references/advanced-interactions.md: whenever a test involves drag-and-drop, hover/tooltips, keyboard shortcuts, forms (checkbox/radio/select), file upload, network mocking, state manipulation (cookies/storage/feature flags), responsive/dark-mode testing, multi-tab/OAuth, visual regression, console error detection, iframe/Shadow DOM, or auth optimization.
Tests that only use click/fill/snapshot miss ~80% of real UI bugs. The reference documents 20 patterns with framework-specific recipes (e.g., @dnd-kit requires mouse move + activation distance — click does not work).
Final checklist
Before considering the run complete:
- Pre-flight passed (agent-browser,
_config.yaml,base_urlreachable) - Browser opened (single session)
- Auth executed (if
requires_auth) - Every screenshot read and visually validated
-
_regressions.yamlupdated (failures added, 3 passes → removed) - Report written to
{report_path} - Browser closed
- Summary displayed (pass/fail/stale)
More from bacoco/shipguard
sg-scout
GitHub intelligence for ShipGuard — scans repos for code audit, debugging, and self-improving agent techniques, then files actionable improvement proposals. Use when you want to discover new approaches, benchmark against similar tools, or find inspiration for ShipGuard improvements. Trigger on "sg-scout", "scout github", "find skills", "benchmark shipguard", "veille technique", "competitive analysis", "what are others doing", "find improvements".
1sg-visual-fix
Process human-annotated Visual screenshots — analyze marked problem areas, trace to source code, implement fixes, capture before/after screenshots, and regenerate the review page with a comparison tab. Trigger on "sg-visual-fix", "fix annotated tests", "process review annotations", "visual fix", "fix les annotations", "traite la review".
1sg-improve
Auto-improve ShipGuard from real session learnings. Run this after any /sg-code-audit, /sg-visual-run, or debugging session. Analyzes what worked, what broke, and what was slow — saves project-specific learnings locally (zone sizing, patterns, infra timing) and files generic improvements as GitHub issues. The local learnings feed back into the next audit run automatically. Trigger on "sg-improve", "improve shipguard", "ameliore shipguard", "shipguard feedback", "session insights", "retex", "retrospective", "what did we learn".
1sg-record
Record browser interactions as replayable ShipGuard test manifests. Opens a Playwright browser with a floating toolbar — user navigates, clicks Check to mark assertions, clicks Stop to generate YAML. Trigger on "sg-record", "record test", "record interactions", "macro recorder", "enregistrer test", "enregistre les interactions".
1sg-visual-review
Generate an interactive HTML screenshot review page from Visual test results. Browse all test screenshots in a grid, filter by status/category, annotate problems with a pen tool, multi-select failed tests, and export re-run manifests. Trigger on "sg-visual-review", "visual review", "review screenshots", "show test results", "review visual", "visual-review", "show results", "test review".
1sg-code-audit
Parallel AI codebase audit — dispatches agents to find and fix bugs across the entire repo. Produces structured JSON results viewable in /sg-visual-review. Trigger on "sg-code-audit", "code audit", "audit codebase", "find bugs", "code-audit", "audit code", "static audit", "security audit", "ship guard".
1