sg-visual-run

Installation

SKILL.md

/sg-visual-run — Execute Visual Tests

Execute YAML test manifests using agent-browser (Playwright CLI). Hybrid execution: mechanical steps run directly, complex assertions delegate to LLM evaluation.

Recommended model: Sonnet 4.6. This skill runs scripted steps (click, fill, screenshot) + lightweight LLM assertions. Opus 4.7 provides no measurable quality gain here. Use /model sonnet before invoking to save Opus weekly quota.

Invocations

Command	Behavior
`/sg-visual-run`	Interactive — asks what to test
`/sg-visual-run <text>`	Natural language — figures out what tests to run
`/sg-visual-run --from-audit`	Run tests for `impacted_ui_routes` from `audit-results.json`
`/sg-visual-run --diff=main`	Run tests impacted by changes since `main`
`/sg-visual-run --all`	Full suite (skip interactive menu)
`/sg-visual-run --regressions`	Re-run tests that failed last run

For full flag parsing rules, interactive/natural-language/audit flows, and route-to-manifest matching: see references/invocation-modes.md.

Pre-flight

Verify agent-browser --version is available
Read visual-tests/_config.yaml — fail if missing (tell user to run /sg-visual-discover)
Verify {base_url} is reachable: agent-browser open {base_url}, check no error
Create {screenshots_dir} if missing
Read visual-tests/_regressions.yaml (create empty if missing)

Build execution list

Priority order:

--from-audit → severity-ordered list from impacted_ui_routes (see invocation-modes.md)
--diff or "Only what changed" → diff-based routes + regressions
Natural language → intent analysis + generate missing tests
--regressions → from _regressions.yaml, ordered by last_failed desc
--all or "Full suite" → all manifests, regressions first, then priority high→medium→low

Always skip manifests with deprecated: true. Regressions among matched tests always run first (except --from-audit, where severity wins).

Execution strategy

All browser tests run sequentially in a single browser session. One login, one browser, one agent.

CRITICAL: NEVER call multiple agent-browser commands in parallel Bash calls. agent-browser uses a single Playwright daemon. Parallel calls cause navigations to race to the same page, producing wrong URLs and corrupted screenshots. Even separate Bash tool calls in the same message execute concurrently. Always chain browser commands sequentially.

Sequential execution with a single auth is also faster: no re-login overhead, no session conflicts, no retries.

Session expiry detection

After EVERY agent-browser open {url}, verify navigation succeeded:

1. agent-browser get url
2. Compare actual vs expected URL
3. If actual != expected AND (actual == "/" OR contains "/login"):
   → Session expired. Re-login:
     a. Execute _shared/login.yaml steps
     b. Retry the original navigation
     c. If still wrong URL: mark test as ERROR
4. If actual matches expected: continue

Catches silent session expiry — the most common failure mode in long runs (>30 min). Without this check, tests screenshot the wrong page and mark PASS.

Progress reporting

Print a progress line after each test completes:

[sg-visual-run] Test {current}/{total} — {test-name} ({PASS|FAIL|STALE|ERROR}) — ~{remaining} min remaining

Remaining time = (elapsed_seconds / tests_completed) * tests_remaining / 60. Update after each test.

Execution loop

For each manifest:

Step 0: Isolation

agent-browser open {base_url}

Every test starts from the base URL. No state carries over.

Step 1: Auth

If requires_auth: true, execute _shared/login.yaml steps.

Optimization: After first successful login, session persists if ALL: (1) no login form in snapshot (no username/password inputs), (2) authenticated UI element visible (user menu, avatar, logout), (3) no redirect away from protected URL. Any check fails → re-login. If auth fails mid-run, re-login and retry once.

Step 2: Execute steps

For each step in the manifest's steps: array, run the corresponding action. For action semantics, variable interpolation, include resolution, screenshot validation rules, and hybrid llm-* actions: see references/action-reference.md.

Screenshot validation is MANDATORY — never skip. Every screenshot must be read via the Read tool and visually inspected for errors before proceeding. A screenshot showing an error = test FAIL, regardless of other assertions. See action-reference.md for the full rule.

Step 3: Record result

PASS / FAIL / STALE / ERROR / SKIP — mapping in action-reference.md.

Browser crash recovery

If any agent-browser command returns non-zero exit code or times out:

agent-browser close (ignore errors)
agent-browser open {base_url}
If requires_auth: re-login
Retry the failed step once
If retry fails → test ERROR, next test
If 3 consecutive ERROR across different tests → abort entire run with "Browser unstable — check agent-browser installation"

Update regressions

After all tests complete, update visual-tests/_regressions.yaml:

FAIL / STALE / ERROR: update or add entry (consecutive_passes: 0)
PASS for a test in regressions: increment consecutive_passes
consecutive_passes >= 3: remove from regressions (resolved)

Full YAML format: see references/report-formats.md.

Generate report

Write report to {report_path} (default: visual-tests/_results/report.md) with sections: Summary, Failures, Stale Tests, Generated Tests, Regressions Status, All Results.

Full template: references/report-formats.md.

Output

Display a concise summary: pass/total, failures (one line each), stale tests (with "run /sg-visual-discover" hint), generated tests. Full format: references/report-formats.md.

agent-browser reference

Basic commands (cover ~60% of tests)

Command	Usage	Example
`open <url>`	Navigate	`agent-browser open http://localhost:3000`
`snapshot`	Accessibility tree with refs	`agent-browser snapshot`
`click <ref>`	Click by ref	`agent-browser click @e12`
`fill <ref> <text>`	Clear and fill input	`agent-browser fill @e10 "alex"`
`upload <sel> <files>`	Upload file	`agent-browser upload "#file-input" ./test.md`
`eval <js>`	Run JS in page	`agent-browser eval 'document.querySelector("input").id'`
`screenshot <path>`	Viewport screenshot	`agent-browser screenshot /tmp/x.png`
`screenshot --full <path>`	Full-page screenshot	`agent-browser screenshot --full /tmp/x.png`
`wait <selector> [timeout]`	Wait for element	`agent-browser wait "#result" 5000`
`find <text>`	Find by visible text	`agent-browser find "Submit"`
`get url`	Current URL	`agent-browser get url`
`close`	Close browser	`agent-browser close`

Advanced interactions

When to read references/advanced-interactions.md: whenever a test involves drag-and-drop, hover/tooltips, keyboard shortcuts, forms (checkbox/radio/select), file upload, network mocking, state manipulation (cookies/storage/feature flags), responsive/dark-mode testing, multi-tab/OAuth, visual regression, console error detection, iframe/Shadow DOM, or auth optimization.

Tests that only use click/fill/snapshot miss ~80% of real UI bugs. The reference documents 20 patterns with framework-specific recipes (e.g., @dnd-kit requires mouse move + activation distance — click does not work).

Final checklist

Before considering the run complete:

Pre-flight passed (agent-browser, _config.yaml, base_url reachable)
Browser opened (single session)
Auth executed (if requires_auth)
Every screenshot read and visually validated
_regressions.yaml updated (failures added, 3 passes → removed)
Report written to {report_path}
Browser closed
Summary displayed (pass/fail/stale)

Related skills

More from bacoco/shipguard

Installs

Repository

bacoco/shipguard

GitHub Stars

First Seen

11 days ago

Security Audits

Gen Agent Trust HubPass

SocketPass

SnykPass