Expect Test — AI Browser Testing

Run AI-generated browser tests against your dev server using expect-cli. No test files to write — the AI reads your git diff, generates a test plan, and executes it in a real Playwright-driven browser.

When to Use

After making UI changes — verify they work visually
After fixing bugs — confirm the fix in a real browser
To test user flows (login, navigation, forms)
As part of /validate to add browser verification
When you need visual confirmation, not just build passes

Prerequisites

Requires Claude Code (already running if you're reading this).

Setup & Installation

# Check if expect-cli is installed
if ! command -v expect-cli &>/dev/null; then
  echo "Installing expect-cli..."
  npm install -g expect-cli@latest
fi

# Verify
expect-cli --version

Usage

Auto-detect changes and test

# Test unstaged changes (default) — auto-detects dev server
expect-cli

# Test with a specific instruction
expect-cli -m "test the login flow works correctly"

# Test a whole branch diff
expect-cli -t branch

# Skip plan review (CI mode)
expect-cli -y

# Show browser window during test
expect-cli --headed

# Verbose logging
expect-cli --verbose

Targeted testing

# Test specific feature
expect-cli -m "verify the Future/Past toggle on events calendar switches views correctly"

# Test a fix
expect-cli -m "confirm the modal has a solid dark background, not transparent"

# Test a flow
expect-cli -m "sign in as coach, navigate to perform, check events show only future events"

Running in tmux (for agents)

Always run in tmux — expect-cli has an interactive TUI that blocks the terminal.

SESSION="expect-$(date +%s)"
tmux new-session -d -s "$SESSION"

# Run with -y to skip interactive review
tmux send-keys -t "$SESSION" "expect-cli -m 'test description here' --headed -y --verbose 2>&1 | tee expect-results.log" C-m

# Monitor progress
sleep 30
tmux capture-pane -t "$SESSION" -p | tail -20

# Check results
cat expect-results.log | grep -E "✔|✘|PASS|FAIL"

CLI Reference

Flag	Description
`-m, --message <text>`	Natural language test instruction
`-f, --flow <slug>`	Reuse a previously saved test flow
`-y, --yes`	Skip plan review, run immediately
`-a, --agent <provider>`	AI provider: `claude` or `codex` (default: claude)
`-t, --target <target>`	What to test: `unstaged`, `branch`, or `changes`
`--headed`	Show visible browser window
`--verbose`	Enable verbose logging
`--replay-host <url>`	Override replay viewer host

Subcommand	Description
`expect-cli init`	First-time global install + agent skill setup
`expect-cli audit`	Run lint/type/format checks and suggest browser tests

How It Works

Auto-detects dev server — scans localhost ports via lsof, recognizes Vite/Next/React/Angular
Reads git diff — scopes test plan to what actually changed
AI generates test plan — step-by-step browser actions + assertions
Playwright executes — real browser, real clicks, real screenshots
Reports pass/fail — with session recording for replay

Replay Recordings & Evidence

Expect-cli automatically saves session recordings to disk:

# Default location
ls /tmp/expect-replays/

# Override with env var
EXPECT_REPLAY_OUTPUT_PATH=/path/to/dir expect-cli -m "test X" -y

Artifacts saved per session:

File	Format	Usage
`*.html`	HTML replay viewer	Open in browser to replay the full test session
`*.ndjson`	rrweb event recording	Raw event data for programmatic analysis
`*.webm`	Playwright video	Actual browser screen recording
`*-steps.json`	Step metadata	Pass/fail per step with timestamps

To collect evidence after a test run:

# Find latest replay
LATEST=$(ls -t /tmp/expect-replays/*.html 2>/dev/null | head -1)
echo "Replay: $LATEST"

# Open replay in browser
open "$LATEST"

# Copy video for PR evidence
cp /tmp/expect-replays/*.webm ./test-evidence/

Note: Screenshots are NOT saved to disk (base64 in AI context only). Use the .webm video for evidence extraction (see below).

Publishing Interactive Replay (Recommended)

The HTML replay is the best evidence format — reviewers can scrub through the entire session interactively. Publish via /here-now:

SESSION_ID=$(ls -t /tmp/expect-replays/*.html | head -1 | xargs basename .html)
mkdir -p /tmp/replay
cp /tmp/expect-replays/${SESSION_ID}.html /tmp/replay/
cp /tmp/expect-replays/${SESSION_ID}.ndjson.js /tmp/replay/
cp /tmp/expect-replays/${SESSION_ID}.ndjson /tmp/replay/

# Publish (3 files: HTML shell + ndjson.js recording + ndjson raw data)
$HOME/{{TOOL_DIR}}/skills/here-now/scripts/publish.sh /tmp/replay/ --client claude

# The replay URL is: https://{slug}.here.now/{SESSION_ID}.html
# Post to PR comment with the direct link

Important: The HTML file loads the .ndjson.js by deriving the filename from its own name. All 3 files must be published together, and the HTML must be accessed by its original filename (not as index.html).

Expiry: Anonymous publishes expire in 24h. Include the claim URL in the PR comment so the user can make it permanent.

Extracting Screenshots from Recordings

Since expect-cli doesn't save per-step screenshots to disk, extract key frames from the .webm video using ffmpeg. This produces shareable PNG evidence for PR comments.

WEBM=$(ls -t /tmp/expect-replays/*.webm | head -1)
mkdir -p /tmp/screenshots

# Extract frames at key timestamps (adjust based on test step timing from logs)
# Timing guide: step 1 finish ≈ step1_duration, step 2 ≈ step1 + step2, etc.
ffmpeg -i "$WEBM" -ss 25 -frames:v 1 /tmp/screenshots/01-login.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 50 -frames:v 1 /tmp/screenshots/02-home.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 80 -frames:v 1 /tmp/screenshots/03-feature.png -y 2>/dev/null

# Validate screenshots show real content (not splash/blank)
for f in /tmp/screenshots/*.png; do
  dims=$(magick identify -format "%wx%h" "$f" 2>/dev/null)
  echo "$(basename $f): $dims ($(wc -c < "$f") bytes)"
done

Timestamp estimation: Sum step durations from the test log. If step 1 took 25s and step 2 took 17s, the step 2 completion frame is at ~42s.

Uploading Evidence for PR Comments

Option 1: imgbb (preferred for small batches)

IMGBB_API_KEY="${IMGBB_API_KEY:-006dfde8d5037a1e366db2bb24e915d3}"
for file in /tmp/screenshots/*.png; do
  sleep 3  # Rate limit
  url=$(curl -s -X POST "https://api.imgbb.com/1/upload?key=${IMGBB_API_KEY}" \
    --form "image=@${file}" | jq -r '.data.url // empty')
  echo "$(basename $file) → $url"
done

Option 2: GitHub Draft Release (when imgbb rate-limits)

gh release create pr${PR}-evidence --repo OWNER/REPO \
  --title "PR #${PR} Evidence" --notes "Browser test screenshots" \
  --draft /tmp/screenshots/*.png

# Get download URLs
gh release view pr${PR}-evidence --repo OWNER/REPO --json assets \
  --jq '.assets[] | "\(.name) → \(.url)"'

Posting to PR

BASE="https://github.com/OWNER/REPO/releases/download/TAG"
gh pr comment $PR --repo OWNER/REPO --body "$(cat <<EOF
## ✅ Browser Test Evidence

![login](${BASE}/01-login.png)
![feature](${BASE}/02-feature.png)

<details><summary>All screenshots</summary>

![step3](${BASE}/03-step.png)

</details>
EOF
)"

Full Evidence Workflow (end-to-end)

# 1. Run test
SESSION="expect-$(date +%s)"
tmux new-session -d -s "$SESSION"
tmux send-keys -t "$SESSION" "EXPECT_REPLAY_OUTPUT_PATH=/tmp/expect-replays expect-cli -m 'test description' --headed -y --verbose 2>&1 | tee /tmp/expect.log" C-m

# 2. Wait for completion
sleep 120 && tmux capture-pane -t "$SESSION" -p | tail -20

# 3. Extract frames
WEBM=$(ls -t /tmp/expect-replays/*.webm | head -1)
ffmpeg -i "$WEBM" -ss 30 -frames:v 1 /tmp/screenshots/step1.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 60 -frames:v 1 /tmp/screenshots/step2.png -y 2>/dev/null

# 4. Validate (read with Claude to verify content)
# Use Read tool on each PNG to verify it shows the expected UI state

# 5a. Publish interactive replay (PREFERRED)
SESSION_ID=$(ls -t /tmp/expect-replays/*.html | head -1 | xargs basename .html)
mkdir -p /tmp/replay
cp /tmp/expect-replays/${SESSION_ID}.{html,ndjson.js,ndjson} /tmp/replay/
$HOME/{{TOOL_DIR}}/skills/here-now/scripts/publish.sh /tmp/replay/ --client claude
# Post the replay URL to PR

# 5b. Upload static screenshots (FALLBACK if replay not suitable)
gh release create prN-evidence --draft /tmp/screenshots/*.png
gh pr comment N --body "## Evidence\n![step1](url)\n![step2](url)"

Integration with /validate

When /validate is invoked, this skill adds browser verification as Step 2.5:

Step 2.5: Browser Verification (via expect-test)
- Auto-detect running dev server
- Generate browser tests from the implementation plan
- Execute tests and capture screenshots
- Report visual verification results alongside code validation

Result Interpretation

✔ = Test passed
✘ = Test failed (with description of what went wrong)

Results include:

Pass/fail status per test step
Screenshots at each step
Session recording URL for replay
Specific failure descriptions (e.g., "element not found", "wrong color value")

Limitations

Early stage tool (v0.0.x) — may crash on complex scenarios
macOS/Linux only (uses lsof for port detection)
Requires a running dev server on localhost
No custom Playwright config — browser is fully managed
Interactive TUI — must use -y flag or run in tmux for automation

Environment

Variable	Description
`NO_TELEMETRY=1`	Disable analytics/telemetry

expect-test