Web Application Testing Skill

Overview

This skill adapts Anthropic's webapp-testing skill for the agent-studio framework. It provides a systematic approach to testing web applications locally using Playwright, choosing the right strategy based on the type of web content being tested.

Source repository: https://github.com/anthropics/skills License: MIT Tool: Playwright (Python API)

When to Use

When verifying frontend functionality of a web application
When debugging UI behavior or visual rendering issues
When capturing screenshots for visual regression testing
When checking browser console for JavaScript errors
When testing form submissions, navigation flows, and interactive elements
When generating automated test scripts for web applications

Iron Law

NEVER INSPECT DOM BEFORE WAITING FOR NETWORKIDLE ON DYNAMIC APPS

Dynamic web applications load content asynchronously. Inspecting the DOM before the page has stabilized will produce incorrect or incomplete results. Always call page.wait_for_load_state('networkidle') before inspecting rendered content on dynamic apps.

Decision Tree: Choose Your Approach

Is the target a static HTML file?
  YES → Approach A: Direct Read
  NO → Is there a running dev server?
    YES → Approach B: Reconnaissance-then-Action
    NO → Approach C: Helper Script First

Approach A: Static HTML Files

For static HTML files that do not require a server:

Read the HTML file directly
Identify CSS selectors for elements of interest
Write a Playwright script to open the file and verify content

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()
    page.goto(f"file://{abs_path_to_html}")

    # Verify content
    title = page.title()
    assert title == "Expected Title"

    # Check element exists
    element = page.query_selector("h1.main-heading")
    assert element is not None

    # Capture screenshot
    page.screenshot(path="screenshot.png")
    browser.close()

Approach B: Running Server (Reconnaissance-then-Action)

When a dev server is already running:

Reconnaissance: Navigate to the app and discover the page structure
Wait for stability: page.wait_for_load_state('networkidle')
Inspect: Query elements, read content, check console logs
Act: Interact with forms, buttons, navigation
Verify: Assert expected outcomes

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=True)
    page = browser.new_page()

    # Capture console messages
    console_messages = []
    page.on("console", lambda msg: console_messages.append(
        f"[{msg.type}] {msg.text}"
    ))

    # Navigate and wait for stability
    page.goto("http://localhost:3000")
    page.wait_for_load_state("networkidle")

    # Discover page structure
    headings = page.query_selector_all("h1, h2, h3")
    buttons = page.query_selector_all("button")
    forms = page.query_selector_all("form")
    links = page.query_selector_all("a[href]")

    # Print discovered elements
    for h in headings:
        print(f"Heading: {h.text_content()}")
    for btn in buttons:
        print(f"Button: {btn.text_content()}")

    # Screenshot before interaction
    page.screenshot(path="before-interaction.png")

    # Interact with form
    page.fill("input[name='email']", "test@example.com")
    page.fill("input[name='password']", "testpass123")
    page.click("button[type='submit']")
    page.wait_for_load_state("networkidle")

    # Screenshot after interaction
    page.screenshot(path="after-interaction.png")

    # Check for console errors
    errors = [m for m in console_messages if "[error]" in m.lower()]
    if errors:
        print(f"Console errors found: {errors}")

    browser.close()

Approach C: No Running Server (Helper Script)

When the web app needs a server started:

import subprocess
import time
from playwright.sync_api import sync_playwright

# Start the server
server_proc = subprocess.Popen(
    ["npm", "run", "dev"],
    cwd="/path/to/project",
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    shell=False  # Security: always shell=False
)

# Wait for server to be ready
time.sleep(5)  # Or use a health check loop

try:
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        page = browser.new_page()
        page.goto("http://localhost:3000")
        page.wait_for_load_state("networkidle")

        # Run tests...
        page.screenshot(path="test-result.png")
        browser.close()
finally:
    server_proc.terminate()
    server_proc.wait()

Common Test Patterns

Pattern 1: Visual Regression Check

# Take baseline screenshot
page.screenshot(path="baseline.png", full_page=True)

# After changes, take comparison screenshot
page.screenshot(path="current.png", full_page=True)

# Compare (use image diff library)

Pattern 2: Form Validation Testing

# Test empty submission
page.click("button[type='submit']")
error_msgs = page.query_selector_all(".error-message")
assert len(error_msgs) > 0, "Expected validation errors for empty form"

# Test invalid email
page.fill("input[type='email']", "not-an-email")
page.click("button[type='submit']")
email_error = page.query_selector("input[type='email']:invalid")
assert email_error is not None

Pattern 3: Navigation Flow Testing

# Test navigation links
nav_links = page.query_selector_all("nav a")
for link in nav_links:
    href = link.get_attribute("href")
    link.click()
    page.wait_for_load_state("networkidle")
    assert page.url.endswith(href), f"Expected URL to end with {href}"
    page.go_back()
    page.wait_for_load_state("networkidle")

Pattern 4: Responsive Design Testing

viewports = [
    {"width": 375, "height": 812, "name": "mobile"},
    {"width": 768, "height": 1024, "name": "tablet"},
    {"width": 1920, "height": 1080, "name": "desktop"},
]

for vp in viewports:
    page.set_viewport_size({"width": vp["width"], "height": vp["height"]})
    page.wait_for_load_state("networkidle")
    page.screenshot(path=f"responsive-{vp['name']}.png")

Critical Pitfalls

Do NOT inspect DOM before networkidle: Dynamic apps load content asynchronously. Early inspection gives incomplete results.
Do NOT use shell=True: When spawning server processes, always use shell=False with array arguments for security.
Do NOT hardcode waits: Use page.wait_for_selector() or page.wait_for_load_state() instead of time.sleep().
Do NOT ignore console errors: Always capture and report browser console errors -- they indicate real issues.
Do NOT forget cleanup: Always terminate server processes in a finally block.

Prerequisites

Ensure Playwright is installed:

pip install playwright
playwright install chromium

Integration with Agent-Studio

Recommended Workflow

Use webapp-testing to verify frontend behavior
Feed screenshot evidence to code-reviewer for visual review
Use tdd skill to generate test suites from discovered patterns
Use accessibility skill to verify WCAG compliance

Complementary Skills

Skill	Relationship
`tdd`	Generate test suites from webapp-testing discoveries
`accessibility`	WCAG compliance verification after functional testing
`frontend-expert`	UI/UX pattern guidance for test design
`chrome-browser`	Alternative browser automation approach
`test-generator`	Generate test code from testing patterns

Iron Laws

NEVER INSPECT DOM BEFORE NETWORKIDLE — Dynamic web applications load content asynchronously. Inspecting the DOM before the page has stabilized produces incorrect or incomplete results.
NEVER use shell=True when spawning server processes — always use shell=False with array arguments (SE-01 security requirement).
ALWAYS capture console errors — browser console errors indicate real issues; never ignore them in test reports.
ALWAYS terminate server processes in finally blocks — leaked server processes corrupt future test runs and consume resources.
NEVER hardcode waits — use page.wait_for_selector() or page.wait_for_load_state() instead of time.sleep().

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Inspecting DOM before networkidle	Dynamic content not yet loaded; assertions produce false negatives	Always `page.wait_for_load_state('networkidle')` before inspection
Using `time.sleep()` for waits	Flaky — too short on slow machines, too long on fast ones	Use explicit waits: `wait_for_selector`, `wait_for_load_state`
Ignoring browser console errors	Real JS errors go undetected; test passes but app is broken	Always capture and report console errors in every test run
Using `shell=True` for server processes	Command injection vulnerability	Always `shell=False` with list arguments
Not cleaning up server processes	Port conflicts, resource leaks on subsequent runs	Use `try/finally` to guarantee `server_proc.terminate()`

Puppeteer MCP Browser Automation

For agent-native browser automation without Python, use the Puppeteer MCP server from modelcontextprotocol/servers:

Setup

Add to .claude/settings.json under mcpServers:

"puppeteer": {
  "command": "npx",
  "args": ["-y", "@modelcontextprotocol/server-puppeteer"]
}

MCP Tool Reference

Tool	Purpose
`puppeteer_navigate`	Navigate to URL, wait for page load
`puppeteer_screenshot`	Capture screenshot (full page or element)
`puppeteer_click`	Click on CSS selector
`puppeteer_fill`	Fill form input with value
`puppeteer_select`	Select dropdown option by value
`puppeteer_hover`	Hover over element
`puppeteer_evaluate`	Execute JavaScript in browser context

Usage Pattern

// Navigate and capture state
mcp__puppeteer__puppeteer_navigate({ url: 'http://localhost:3000' });
mcp__puppeteer__puppeteer_screenshot({ name: 'initial-state', fullPage: true });

// Interact with forms
mcp__puppeteer__puppeteer_fill({ selector: 'input[name="email"]', value: 'test@example.com' });
mcp__puppeteer__puppeteer_click({ selector: 'button[type="submit"]' });
mcp__puppeteer__puppeteer_screenshot({ name: 'after-submit' });

// Evaluate page state
mcp__puppeteer__puppeteer_evaluate({
  script:
    'JSON.stringify({ title: document.title, errors: [...document.querySelectorAll(".error")].map(e => e.textContent) })',
});

When to Use Puppeteer MCP vs Playwright Python

Scenario	Use
Quick page verification in agent flow	Puppeteer MCP
Complex test suites with assertions	Playwright Python
Screenshot capture as evidence	Puppeteer MCP
Form interaction and navigation	Either
CI test automation	Playwright Python
Agent-embedded browser checks	Puppeteer MCP

Memory Protocol (MANDATORY)

Before starting:

Read .claude/context/memory/learnings.md

Check for:

Existing test scripts or Playwright configurations in the project
Known page selectors from previous sessions
Previously discovered console errors or flaky test patterns

After completing:

Testing pattern found -> .claude/context/memory/learnings.md
Test flakiness or browser issue -> .claude/context/memory/issues.md
Decision about test strategy -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: If it's not in memory, it didn't happen.