expect-test

Installation
SKILL.md

Expect Test — AI Browser Testing

Run AI-generated browser tests against your dev server using expect-cli. No test files to write — the AI reads your git diff, generates a test plan, and executes it in a real Playwright-driven browser.

When to Use

  • After making UI changes — verify they work visually
  • After fixing bugs — confirm the fix in a real browser
  • To test user flows (login, navigation, forms)
  • As part of /validate to add browser verification
  • When you need visual confirmation, not just build passes

Prerequisites

Requires Claude Code (already running if you're reading this).

Setup & Installation

# Check if expect-cli is installed
if ! command -v expect-cli &>/dev/null; then
  echo "Installing expect-cli..."
  npm install -g expect-cli@latest
fi

# Verify
expect-cli --version

Usage

Auto-detect changes and test

# Test unstaged changes (default) — auto-detects dev server
expect-cli

# Test with a specific instruction
expect-cli -m "test the login flow works correctly"

# Test a whole branch diff
expect-cli -t branch

# Skip plan review (CI mode)
expect-cli -y

# Show browser window during test
expect-cli --headed

# Verbose logging
expect-cli --verbose

Targeted testing

# Test specific feature
expect-cli -m "verify the Future/Past toggle on events calendar switches views correctly"

# Test a fix
expect-cli -m "confirm the modal has a solid dark background, not transparent"

# Test a flow
expect-cli -m "sign in as coach, navigate to perform, check events show only future events"

Running in tmux (for agents)

Always run in tmux — expect-cli has an interactive TUI that blocks the terminal.

SESSION="expect-$(date +%s)"
tmux new-session -d -s "$SESSION"

# Run with -y to skip interactive review
tmux send-keys -t "$SESSION" "expect-cli -m 'test description here' --headed -y --verbose 2>&1 | tee expect-results.log" C-m

# Monitor progress
sleep 30
tmux capture-pane -t "$SESSION" -p | tail -20

# Check results
cat expect-results.log | grep -E "✔|✘|PASS|FAIL"

CLI Reference

Flag Description
-m, --message <text> Natural language test instruction
-f, --flow <slug> Reuse a previously saved test flow
-y, --yes Skip plan review, run immediately
-a, --agent <provider> AI provider: claude or codex (default: claude)
-t, --target <target> What to test: unstaged, branch, or changes
--headed Show visible browser window
--verbose Enable verbose logging
--replay-host <url> Override replay viewer host
Subcommand Description
expect-cli init First-time global install + agent skill setup
expect-cli audit Run lint/type/format checks and suggest browser tests

How It Works

  1. Auto-detects dev server — scans localhost ports via lsof, recognizes Vite/Next/React/Angular
  2. Reads git diff — scopes test plan to what actually changed
  3. AI generates test plan — step-by-step browser actions + assertions
  4. Playwright executes — real browser, real clicks, real screenshots
  5. Reports pass/fail — with session recording for replay

Replay Recordings & Evidence

Expect-cli automatically saves session recordings to disk:

# Default location
ls /tmp/expect-replays/

# Override with env var
EXPECT_REPLAY_OUTPUT_PATH=/path/to/dir expect-cli -m "test X" -y

Artifacts saved per session:

File Format Usage
*.html HTML replay viewer Open in browser to replay the full test session
*.ndjson rrweb event recording Raw event data for programmatic analysis
*.webm Playwright video Actual browser screen recording
*-steps.json Step metadata Pass/fail per step with timestamps

To collect evidence after a test run:

# Find latest replay
LATEST=$(ls -t /tmp/expect-replays/*.html 2>/dev/null | head -1)
echo "Replay: $LATEST"

# Open replay in browser
open "$LATEST"

# Copy video for PR evidence
cp /tmp/expect-replays/*.webm ./test-evidence/

Note: Screenshots are NOT saved to disk (base64 in AI context only). Use the .webm video for evidence extraction (see below).

Publishing Interactive Replay (Recommended)

The HTML replay is the best evidence format — reviewers can scrub through the entire session interactively. Publish via /here-now:

SESSION_ID=$(ls -t /tmp/expect-replays/*.html | head -1 | xargs basename .html)
mkdir -p /tmp/replay
cp /tmp/expect-replays/${SESSION_ID}.html /tmp/replay/
cp /tmp/expect-replays/${SESSION_ID}.ndjson.js /tmp/replay/
cp /tmp/expect-replays/${SESSION_ID}.ndjson /tmp/replay/

# Publish (3 files: HTML shell + ndjson.js recording + ndjson raw data)
$HOME/{{TOOL_DIR}}/skills/here-now/scripts/publish.sh /tmp/replay/ --client claude

# The replay URL is: https://{slug}.here.now/{SESSION_ID}.html
# Post to PR comment with the direct link

Important: The HTML file loads the .ndjson.js by deriving the filename from its own name. All 3 files must be published together, and the HTML must be accessed by its original filename (not as index.html).

Expiry: Anonymous publishes expire in 24h. Include the claim URL in the PR comment so the user can make it permanent.

Extracting Screenshots from Recordings

Since expect-cli doesn't save per-step screenshots to disk, extract key frames from the .webm video using ffmpeg. This produces shareable PNG evidence for PR comments.

WEBM=$(ls -t /tmp/expect-replays/*.webm | head -1)
mkdir -p /tmp/screenshots

# Extract frames at key timestamps (adjust based on test step timing from logs)
# Timing guide: step 1 finish ≈ step1_duration, step 2 ≈ step1 + step2, etc.
ffmpeg -i "$WEBM" -ss 25 -frames:v 1 /tmp/screenshots/01-login.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 50 -frames:v 1 /tmp/screenshots/02-home.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 80 -frames:v 1 /tmp/screenshots/03-feature.png -y 2>/dev/null

# Validate screenshots show real content (not splash/blank)
for f in /tmp/screenshots/*.png; do
  dims=$(magick identify -format "%wx%h" "$f" 2>/dev/null)
  echo "$(basename $f): $dims ($(wc -c < "$f") bytes)"
done

Timestamp estimation: Sum step durations from the test log. If step 1 took 25s and step 2 took 17s, the step 2 completion frame is at ~42s.

Uploading Evidence for PR Comments

Option 1: imgbb (preferred for small batches)

IMGBB_API_KEY="${IMGBB_API_KEY:-006dfde8d5037a1e366db2bb24e915d3}"
for file in /tmp/screenshots/*.png; do
  sleep 3  # Rate limit
  url=$(curl -s -X POST "https://api.imgbb.com/1/upload?key=${IMGBB_API_KEY}" \
    --form "image=@${file}" | jq -r '.data.url // empty')
  echo "$(basename $file)$url"
done

Option 2: GitHub Draft Release (when imgbb rate-limits)

gh release create pr${PR}-evidence --repo OWNER/REPO \
  --title "PR #${PR} Evidence" --notes "Browser test screenshots" \
  --draft /tmp/screenshots/*.png

# Get download URLs
gh release view pr${PR}-evidence --repo OWNER/REPO --json assets \
  --jq '.assets[] | "\(.name) → \(.url)"'

Posting to PR

BASE="https://github.com/OWNER/REPO/releases/download/TAG"
gh pr comment $PR --repo OWNER/REPO --body "$(cat <<EOF
## ✅ Browser Test Evidence

![login](${BASE}/01-login.png)
![feature](${BASE}/02-feature.png)

<details><summary>All screenshots</summary>

![step3](${BASE}/03-step.png)

</details>
EOF
)"

Full Evidence Workflow (end-to-end)

# 1. Run test
SESSION="expect-$(date +%s)"
tmux new-session -d -s "$SESSION"
tmux send-keys -t "$SESSION" "EXPECT_REPLAY_OUTPUT_PATH=/tmp/expect-replays expect-cli -m 'test description' --headed -y --verbose 2>&1 | tee /tmp/expect.log" C-m

# 2. Wait for completion
sleep 120 && tmux capture-pane -t "$SESSION" -p | tail -20

# 3. Extract frames
WEBM=$(ls -t /tmp/expect-replays/*.webm | head -1)
ffmpeg -i "$WEBM" -ss 30 -frames:v 1 /tmp/screenshots/step1.png -y 2>/dev/null
ffmpeg -i "$WEBM" -ss 60 -frames:v 1 /tmp/screenshots/step2.png -y 2>/dev/null

# 4. Validate (read with Claude to verify content)
# Use Read tool on each PNG to verify it shows the expected UI state

# 5a. Publish interactive replay (PREFERRED)
SESSION_ID=$(ls -t /tmp/expect-replays/*.html | head -1 | xargs basename .html)
mkdir -p /tmp/replay
cp /tmp/expect-replays/${SESSION_ID}.{html,ndjson.js,ndjson} /tmp/replay/
$HOME/{{TOOL_DIR}}/skills/here-now/scripts/publish.sh /tmp/replay/ --client claude
# Post the replay URL to PR

# 5b. Upload static screenshots (FALLBACK if replay not suitable)
gh release create prN-evidence --draft /tmp/screenshots/*.png
gh pr comment N --body "## Evidence\n![step1](url)\n![step2](url)"

Integration with /validate

When /validate is invoked, this skill adds browser verification as Step 2.5:

Step 2.5: Browser Verification (via expect-test)
- Auto-detect running dev server
- Generate browser tests from the implementation plan
- Execute tests and capture screenshots
- Report visual verification results alongside code validation

Result Interpretation

✔ = Test passed
✘ = Test failed (with description of what went wrong)

Results include:

  • Pass/fail status per test step
  • Screenshots at each step
  • Session recording URL for replay
  • Specific failure descriptions (e.g., "element not found", "wrong color value")

Limitations

  • Early stage tool (v0.0.x) — may crash on complex scenarios
  • macOS/Linux only (uses lsof for port detection)
  • Requires a running dev server on localhost
  • No custom Playwright config — browser is fully managed
  • Interactive TUI — must use -y flag or run in tmux for automation

Environment

Variable Description
NO_TELEMETRY=1 Disable analytics/telemetry
Related skills
Installs
1
GitHub Stars
9
First Seen
3 days ago