QA Team Skill

Purpose [LEVEL 1]

This skill helps you create agentic outside-in tests that verify application behavior from an external user's perspective without any knowledge of internal implementation. Using the gadugi-agentic-test framework, you write declarative YAML scenarios that AI agents execute, observe, and validate.

Key Principle: Tests describe WHAT should happen, not HOW it's implemented. Agents figure out the execution details.

When to Use This Skill [LEVEL 1]

Perfect For

Smoke Tests: Quick validation that critical user flows work
Behavior-Driven Testing: Verify features from user perspective
Cross-Platform Testing: Same test logic for CLI, TUI, Web, Electron
Refactoring Safety: Tests remain valid when implementation changes
AI-Powered Testing: Let agents handle complex interactions
Documentation as Tests: YAML scenarios double as executable specs

Use This Skill When

Starting a new project and defining expected behaviors
Refactoring code and need tests that won't break with internal changes
Testing user-facing applications (CLI tools, TUIs, web apps, desktop apps)
Writing acceptance criteria that can be automatically verified
Need tests that non-developers can read and understand
Want to catch regressions in critical user workflows
Testing complex multi-step interactions

Don't Use This Skill When

Need unit tests for internal functions (use test-gap-analyzer instead)
Testing performance or load characteristics
Need precise timing or concurrency control
Testing non-interactive batch processes
Implementation details matter more than behavior

Core Concepts [LEVEL 1]

Outside-In Testing Philosophy

Traditional Inside-Out Testing:

# Tightly coupled to implementation
def test_calculator_add():
    calc = Calculator()
    result = calc.add(2, 3)
    assert result == 5
    assert calc.history == [(2, 3, 5)]  # Knows internal state

Agentic Outside-In Testing:

# Implementation-agnostic behavior verification
scenario:
  name: "Calculator Addition"
  steps:
    - action: launch
      target: "./calculator"
    - action: send_input
      value: "add 2 3"
    - action: verify_output
      contains: "Result: 5"

Benefits:

Tests survive refactoring (internal changes don't break tests)
Readable by non-developers (YAML is declarative)
Platform-agnostic (same structure for CLI/TUI/Web/Electron)
AI agents handle complexity (navigation, timing, screenshots)

The Gadugi Agentic Test Framework [LEVEL 2]

Gadugi-agentic-test is a Python framework that:

Parses YAML test scenarios with declarative steps
Dispatches to specialized agents (CLI, TUI, Web, Electron agents)
Executes actions (launch, input, click, wait, verify)
Collects evidence (screenshots, logs, output captures)
Validates outcomes against expected results
Generates reports with evidence trails

Architecture:

YAML Scenario → Scenario Loader → Agent Dispatcher → Execution Engine
                                          ↓
                     [CLI Agent, TUI Agent, Web Agent, Electron Agent]
                                          ↓
                           Observers → Comprehension Agent
                                          ↓
                                   Evidence Report

Progressive Disclosure Levels [LEVEL 1]

This skill teaches testing in four levels:

Level 1: Fundamentals - Basic single-action tests, simple verification
Level 2: Intermediate - Multi-step flows, conditional logic, error handling
Level 3: Advanced - Custom agents, visual regression, performance validation
Level 4: Parity & Shadowing - Side-by-side A/B comparison, remote observable runs, rollout divergence logging

Each example is marked with its level. Start at Level 1 and progress as needed.

Side-by-Side Parity and A/B Validation [LEVEL 2]

QA Team is the renamed primary skill for what used to be outside-in-testing. Use it for standard outside-in scenarios and for parity loops where you must compare a legacy implementation to a replacement, or compare approach A to approach B, as an external user would observe them.

Use QA Team for parity work when

migrating Python to Rust, old CLI to new CLI, or v1 to v2 behavior
validating a rewrite before switching defaults
comparing branch A vs branch B using the same user scenarios
running observable side-by-side sessions in paired virtual TTYs
logging rollout divergences in shadow mode without failing the run

Recommended parity loop

Define shared user-facing scenarios first.
Run both implementations in isolated sandboxes.
Compare stdout, stderr, exit code, JSON outputs, and filesystem side effects.
Re-run in --observable mode when you need paired tmux panes for debugging.
Use --ssh-target <host> when parity must happen on a remote environment such as azlin.
Use --shadow-mode --shadow-log <file> during rollout to log divergences without blocking execution.

Command pattern to reuse

If the repo already has a parity harness, extend it instead of inventing a second one. A good baseline is:

python tests/parity/validate_cli_parity.py \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /path/to/legacy-repo \
  --rust-binary /path/to/new-binary \
  --observable

For remote parity:

python tests/parity/validate_cli_parity.py \
  --ssh-target azlin \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /remote/path/to/legacy-repo \
  --rust-binary /remote/path/to/new-binary

For rollout shadow logging:

python tests/parity/validate_cli_parity.py \
  --scenario tests/parity/scenarios/feature.yaml \
  --python-repo /path/to/legacy-repo \
  --rust-binary /path/to/new-binary \
  --shadow-mode \
  --shadow-log /tmp/feature-shadow.jsonl

Quick Start [LEVEL 1]

Installation

Prerequisites (for native module compilation):

# macOS
xcode-select --install

# Ubuntu/Debian
sudo apt-get install -y build-essential python3

# Windows: Install Visual Studio Build Tools with "Desktop development with C++"

Install the framework:

# Install globally for CLI access
npm install -g @gadugi/agentic-test

# Or install locally in your project
npm install @gadugi/agentic-test

# Verify installation
gadugi-test --version

Your First Test (CLI Example)

Create test-hello.yaml:

scenario:
  name: "Hello World CLI Test"
  description: "Verify CLI prints greeting"
  type: cli

  prerequisites:
    - "./hello-world executable exists"

  steps:
    - action: launch
      target: "./hello-world"

    - action: verify_output
      contains: "Hello, World!"

    - action: verify_exit_code
      expected: 0

Run the test:

gadugi-test run test-hello.yaml

Output:

✓ Scenario: Hello World CLI Test
  ✓ Step 1: Launched ./hello-world
  ✓ Step 2: Output contains "Hello, World!"
  ✓ Step 3: Exit code is 0

PASSED (3/3 steps successful)
Evidence saved to: ./evidence/test-hello-20250116-093045/

Understanding the YAML Structure [LEVEL 1]

Every test scenario has this structure:

scenario:
  name: "Descriptive test name"
  description: "What this test verifies"
  type: cli | tui | web | electron

  # Optional metadata
  tags: [smoke, critical, auth]
  timeout: 30s

  # What must be true before test runs
  prerequisites:
    - "Condition 1"
    - "Condition 2"

  # The test steps (executed sequentially)
  steps:
    - action: action_name
      parameter1: value1
      parameter2: value2

    - action: verify_something
      expected: value

  # Optional cleanup
  cleanup:
    - action: stop_application

Application Types and Agents [LEVEL 2]

CLI Applications [LEVEL 1]

Use Case: Command-line tools, scripts, build tools, package managers

Supported Actions:

launch - Start the CLI program
send_input - Send text or commands via stdin
send_signal - Send OS signals (SIGINT, SIGTERM)
wait_for_output - Wait for specific text in stdout/stderr
verify_output - Check stdout/stderr contains/matches expected text
verify_exit_code - Validate process exit code
capture_output - Save output for later verification

Example (see examples/cli/calculator-basic.yaml):

scenario:
  name: "CLI Calculator Basic Operations"
  type: cli

  steps:
    - action: launch
      target: "./calculator"
      args: ["--mode", "interactive"]

    - action: send_input
      value: "add 5 3\n"

    - action: verify_output
      contains: "Result: 8"
      timeout: 2s

    - action: send_input
      value: "multiply 4 7\n"

    - action: verify_output
      contains: "Result: 28"

    - action: send_input
      value: "exit\n"

    - action: verify_exit_code
      expected: 0

TUI Applications [LEVEL 1]

Use Case: Terminal user interfaces (htop, vim, tmux, custom dashboard TUIs)

Supported Actions:

launch - Start TUI application
send_keypress - Send keyboard input (arrow keys, enter, ctrl+c, etc.)
wait_for_screen - Wait for specific text to appear on screen
verify_screen - Check screen contents match expectations
capture_screenshot - Save terminal screenshot (ANSI art)
navigate_menu - Navigate menu structures
fill_form - Fill TUI form fields

Example (see examples/tui/file-manager-navigation.yaml):

scenario:
  name: "TUI File Manager Navigation"
  type: tui

  steps:
    - action: launch
      target: "./file-manager"

    - action: wait_for_screen
      contains: "File Manager v1.0"
      timeout: 3s

    - action: send_keypress
      value: "down"
      times: 3

    - action: verify_screen
      contains: "> documents/"
      description: "Third item should be selected"

    - action: send_keypress
      value: "enter"

    - action: wait_for_screen
      contains: "documents/"
      timeout: 2s

    - action: capture_screenshot
      save_as: "documents-view.txt"

Web Applications [LEVEL 1]

Use Case: Web apps, dashboards, SPAs, admin panels

Supported Actions:

navigate - Go to URL
click - Click element by selector or text
type - Type into input fields
wait_for_element - Wait for element to appear
verify_element - Check element exists/contains text
verify_url - Validate current URL
screenshot - Capture browser screenshot
scroll - Scroll page or element

Example (see examples/web/dashboard-smoke-test.yaml):

scenario:
  name: "Dashboard Smoke Test"
  type: web

  steps:
    - action: navigate
      url: "http://localhost:3000/dashboard"

    - action: wait_for_element
      selector: "h1.dashboard-title"
      timeout: 5s

    - action: verify_element
      selector: "h1.dashboard-title"
      contains: "Analytics Dashboard"

    - action: verify_element
      selector: ".widget-stats"
      count: 4
      description: "Should have 4 stat widgets"

    - action: click
      selector: "button.refresh-data"

    - action: wait_for_element
      selector: ".loading-spinner"
      disappears: true
      timeout: 10s

    - action: screenshot
      save_as: "dashboard-loaded.png"

Electron Applications [LEVEL 2]

Use Case: Desktop apps built with Electron (VS Code, Slack, Discord clones)

Supported Actions:

launch - Start Electron app
window_action - Interact with windows (focus, minimize, close)
menu_click - Click application menu items
dialog_action - Handle native dialogs (open file, save, confirm)
ipc_send - Send IPC message to main process
verify_window - Check window state/properties
All web actions (since Electron uses Chromium)

Example (see examples/electron/single-window-basic.yaml):

scenario:
  name: "Electron Single Window Test"
  type: electron

  steps:
    - action: launch
      target: "./dist/my-app"
      wait_for_window: true
      timeout: 10s

    - action: verify_window
      title: "My Application"
      visible: true

    - action: menu_click
      path: ["File", "New Document"]

    - action: wait_for_element
      selector: ".document-editor"

    - action: type
      selector: ".document-editor"
      value: "Hello from test"

    - action: menu_click
      path: ["File", "Save"]

    - action: dialog_action
      type: save_file
      filename: "test-document.txt"

    - action: verify_window
      title_contains: "test-document.txt"

Test Scenario Anatomy [LEVEL 2]

Metadata Section

scenario:
  name: "Clear descriptive name"
  description: "Detailed explanation of what this test verifies"
  type: cli | tui | web | electron

  # Optional fields
  tags: [smoke, regression, auth, payment]
  priority: high | medium | low
  timeout: 60s # Overall scenario timeout
  retry_on_failure: 2 # Retry count

  # Environment requirements
  environment:
    variables:
      API_URL: "http://localhost:8080"
      DEBUG: "true"
    files:
      - "./config.json must exist"

Prerequisites

Prerequisites are conditions that must be true before the test runs. The framework validates these before execution.

prerequisites:
  - "./application binary exists"
  - "Port 8080 is available"
  - "Database is running"
  - "User account test@example.com exists"
  - "File ./test-data.json exists"

If prerequisites fail, the test is skipped (not failed).

Steps

Steps execute sequentially. Each step has:

action: Required - the action to perform
Parameters: Action-specific parameters
description: Optional - human-readable explanation
timeout: Optional - step-specific timeout
continue_on_failure: Optional - don't fail scenario if step fails

steps:
  # Simple action
  - action: launch
    target: "./app"

  # Action with multiple parameters
  - action: verify_output
    contains: "Success"
    timeout: 5s
    description: "App should print success message"

  # Continue even if this fails
  - action: click
    selector: ".optional-button"
    continue_on_failure: true

Verification Actions [LEVEL 1]

Verification actions check expected outcomes. They fail the test if expectations aren't met.

Common Verifications:

# CLI: Check output contains text
- action: verify_output
  contains: "Expected text"

# CLI: Check output matches regex
- action: verify_output
  matches: "Result: \\d+"

# CLI: Check exit code
- action: verify_exit_code
  expected: 0

# Web/TUI: Check element exists
- action: verify_element
  selector: ".success-message"

# Web/TUI: Check element contains text
- action: verify_element
  selector: "h1"
  contains: "Welcome"

# Web: Check URL
- action: verify_url
  equals: "http://localhost:3000/dashboard"

# Web: Check element count
- action: verify_element
  selector: ".list-item"
  count: 5

# Electron: Check window state
- action: verify_window
  title: "My App"
  visible: true
  focused: true

Cleanup Section

Cleanup runs after all steps complete (success or failure). Use for teardown actions.

cleanup:
  - action: stop_application
    force: true

  - action: delete_file
    path: "./temp-test-data.json"

  - action: reset_database
    connection: "test_db"

Advanced Patterns [LEVEL 2]

Conditional Logic

Execute steps based on conditions:

steps:
  - action: launch
    target: "./app"

  - action: verify_output
    contains: "Login required"
    id: login_check

  # Only run if login_check passed
  - action: send_input
    value: "login admin password123\n"
    condition: login_check.passed

Variables and Templating [LEVEL 2]

Define variables and use them throughout the scenario:

scenario:
  name: "Test with Variables"
  type: cli

  variables:
    username: "testuser"
    api_url: "http://localhost:8080"

  steps:
    - action: launch
      target: "./app"
      args: ["--api", "${api_url}"]

    - action: send_input
      value: "login ${username}\n"

    - action: verify_output
      contains: "Welcome, ${username}!"

Loops and Repetition [LEVEL 2]

Repeat actions multiple times:

steps:
  - action: launch
    target: "./app"

  # Repeat action N times
  - action: send_keypress
    value: "down"
    times: 5

  # Loop over list
  - action: send_input
    value: "${item}\n"
    for_each:
      - "apple"
      - "banana"
      - "cherry"

Error Handling [LEVEL 2]

Handle expected errors gracefully:

steps:
  - action: send_input
    value: "invalid command\n"

  # Verify error message appears
  - action: verify_output
    contains: "Error: Unknown command"
    expected_failure: true

  # App should still be running
  - action: verify_running
    expected: true

Multi-Step Workflows [LEVEL 2]

Complex scenarios with multiple phases:

scenario:
  name: "E-commerce Purchase Flow"
  type: web

  steps:
    # Phase 1: Authentication
    - action: navigate
      url: "http://localhost:3000/login"

    - action: type
      selector: "#username"
      value: "test@example.com"

    - action: type
      selector: "#password"
      value: "password123"

    - action: click
      selector: "button[type=submit]"

    - action: wait_for_url
      contains: "/dashboard"

    # Phase 2: Product Selection
    - action: navigate
      url: "http://localhost:3000/products"

    - action: click
      text: "Add to Cart"
      nth: 1

    - action: verify_element
      selector: ".cart-badge"
      contains: "1"

    # Phase 3: Checkout
    - action: click
      selector: ".cart-icon"

    - action: click
      text: "Proceed to Checkout"

    - action: fill_form
      fields:
        "#shipping-address": "123 Test St"
        "#city": "Testville"
        "#zip": "12345"

    - action: click
      selector: "#place-order"

    - action: wait_for_element
      selector: ".order-confirmation"
      timeout: 10s

    - action: verify_element
      selector: ".order-number"
      exists: true

Level 3: Advanced Topics [LEVEL 3]

Custom Comprehension Agents

The framework uses AI agents to interpret application output and determine if tests pass. You can customize these agents for domain-specific logic.

Default Comprehension Agent:

Observes raw output (text, HTML, screenshots)
Applies general reasoning to verify expectations
Returns pass/fail with explanation

Custom Comprehension Agent (see examples/custom-agents/custom-comprehension-agent.yaml):

scenario:
  name: "Financial Dashboard Test with Custom Agent"
  type: web

  # Define custom comprehension logic
  comprehension_agent:
    model: "gpt-4"
    system_prompt: |
      You are a financial data validator. When verifying dashboard content:
      1. All monetary values must use proper formatting ($1,234.56)
      2. Percentages must include % symbol
      3. Dates must be in MM/DD/YYYY format
      4. Negative values must be red
      5. Chart data must be logically consistent

      Be strict about formatting and data consistency.

    examples:
      - input: "Total Revenue: 45000"
        output: "FAIL - Missing currency symbol and comma separator"
      - input: "Total Revenue: $45,000.00"
        output: "PASS - Correctly formatted"

  steps:
    - action: navigate
      url: "http://localhost:3000/financial-dashboard"

    - action: verify_element
      selector: ".revenue-widget"
      use_custom_comprehension: true
      description: "Revenue should be properly formatted"

Visual Regression Testing [LEVEL 3]

Compare screenshots against baseline images:

scenario:
  name: "Visual Regression - Homepage"
  type: web

  steps:
    - action: navigate
      url: "http://localhost:3000"

    - action: wait_for_element
      selector: ".page-loaded"

    - action: screenshot
      save_as: "homepage.png"

    - action: visual_compare
      screenshot: "homepage.png"
      baseline: "./baselines/homepage-baseline.png"
      threshold: 0.05 # 5% difference allowed
      highlight_differences: true

Performance Validation [LEVEL 3]

Measure and validate performance metrics:

scenario:
  name: "Performance - Dashboard Load Time"
  type: web

  performance:
    metrics:
      - page_load_time
      - first_contentful_paint
      - time_to_interactive

  steps:
    - action: navigate
      url: "http://localhost:3000/dashboard"
      measure_timing: true

    - action: verify_performance
      metric: page_load_time
      less_than: 3000 # 3 seconds

    - action: verify_performance
      metric: first_contentful_paint
      less_than: 1500 # 1.5 seconds

Multi-Window Coordination (Electron) [LEVEL 3]

Test applications with multiple windows:

scenario:
  name: "Multi-Window Chat Application"
  type: electron

  steps:
    - action: launch
      target: "./chat-app"

    - action: menu_click
      path: ["Window", "New Chat"]

    - action: verify_window
      count: 2

    - action: window_action
      window: 1
      action: focus

    - action: type
      selector: ".message-input"
      value: "Hello from window 1"

    - action: click
      selector: ".send-button"

    - action: window_action
      window: 2
      action: focus

    - action: wait_for_element
      selector: ".message"
      contains: "Hello from window 1"
      timeout: 5s

IPC Testing (Electron) [LEVEL 3]

Test Inter-Process Communication between renderer and main:

scenario:
  name: "Electron IPC Communication"
  type: electron

  steps:
    - action: launch
      target: "./my-app"

    - action: ipc_send
      channel: "get-system-info"

    - action: ipc_expect
      channel: "system-info-reply"
      timeout: 3s

    - action: verify_ipc_payload
      contains:
        platform: "darwin"
        arch: "x64"

Custom Reporters [LEVEL 3]

Generate custom test reports:

scenario:
  name: "Test with Custom Reporting"
  type: cli

  reporting:
    format: custom
    template: "./report-template.html"
    include:
      - screenshots
      - logs
      - timing_data
      - video_recording

    email:
      enabled: true
      recipients: ["team@example.com"]
      on_failure_only: true

  steps:
    # ... test steps ...

Framework Integration [LEVEL 2]

Running Tests

Single test:

gadugi-test run test-scenario.yaml

Multiple tests:

gadugi-test run tests/*.yaml

With options:

gadugi-test run test.yaml \
  --verbose \
  --evidence-dir ./test-evidence \
  --retry 2 \
  --timeout 60s

CI/CD Integration

GitHub Actions (.github/workflows/agentic-tests.yml):

name: Agentic Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install gadugi-agentic-test
        run: npm install -g @gadugi/agentic-test

      - name: Run tests
        run: gadugi-test run tests/agentic/*.yaml

      - name: Upload evidence
        if: always()
        uses: actions/upload-artifact@v3
        with:
          name: test-evidence
          path: ./evidence/

Evidence Collection

The framework automatically collects evidence for debugging:

evidence/
  scenario-name-20250116-093045/
    ├── scenario.yaml          # Original test scenario
    ├── execution-log.json     # Detailed execution log
    ├── screenshots/           # All captured screenshots
    │   ├── step-1.png
    │   ├── step-3.png
    │   └── step-5.png
    ├── output-captures/       # CLI/TUI output
    │   ├── stdout.txt
    │   └── stderr.txt
    ├── timing.json            # Performance metrics
    └── report.html            # Human-readable report

Best Practices [LEVEL 2]

1. Start Simple, Add Complexity

Begin with basic smoke tests, then add detail:

# Level 1: Basic smoke test
steps:
  - action: launch
    target: "./app"
  - action: verify_output
    contains: "Ready"

# Level 2: Add interaction
steps:
  - action: launch
    target: "./app"
  - action: send_input
    value: "command\n"
  - action: verify_output
    contains: "Success"

# Level 3: Add error handling and edge cases
steps:
  - action: launch
    target: "./app"
  - action: send_input
    value: "invalid\n"
  - action: verify_output
    contains: "Error"
  - action: send_input
    value: "command\n"
  - action: verify_output
    contains: "Success"

2. Use Descriptive Names and Descriptions

# Bad
scenario:
  name: "Test 1"
  steps:
    - action: click
      selector: "button"

# Good
scenario:
  name: "User Login Flow - Valid Credentials"
  description: "Verifies user can log in with valid email and password"
  steps:
    - action: click
      selector: "button[type=submit]"
      description: "Submit login form"

3. Verify Critical Paths Only

Don't test every tiny detail. Focus on user-facing behavior:

# Bad - Tests implementation details
- action: verify_element
  selector: ".internal-cache-status"
  contains: "initialized"

# Good - Tests user-visible behavior
- action: verify_element
  selector: ".welcome-message"
  contains: "Welcome back"

4. Use Prerequisites for Test Dependencies

scenario:
  name: "User Profile Edit"

  prerequisites:
    - "User testuser@example.com exists"
    - "User is logged in"
    - "Database is seeded with test data"

  steps:
    # Test assumes prerequisites are met
    - action: navigate
      url: "/profile"

5. Keep Tests Independent

Each test should set up its own state and clean up:

scenario:
  name: "Create Document"

  steps:
    # Create test user (don't assume exists)
    - action: api_call
      endpoint: "/api/users"
      method: POST
      data: { email: "test@example.com" }

    # Run test
    - action: navigate
      url: "/documents/new"
    # ... test steps ...

  cleanup:
    # Remove test user
    - action: api_call
      endpoint: "/api/users/test@example.com"
      method: DELETE

6. Use Tags for Organization

scenario:
  name: "Critical Payment Flow"
  tags: [smoke, critical, payment, e2e]
  # Run with: gadugi-test run --tags critical

7. Add Timeouts Strategically

steps:
  # Quick operations - short timeout
  - action: click
    selector: "button"
    timeout: 2s

  # Network operations - longer timeout
  - action: wait_for_element
    selector: ".data-loaded"
    timeout: 10s

  # Complex operations - generous timeout
  - action: verify_element
    selector: ".report-generated"
    timeout: 60s

Testing Strategies [LEVEL 2]

Smoke Tests

Minimal tests that verify critical functionality works:

scenario:
  name: "Smoke Test - Application Starts"
  tags: [smoke]

  steps:
    - action: launch
      target: "./app"
    - action: verify_output
      contains: "Ready"
      timeout: 5s

Run before every commit: gadugi-test run --tags smoke

Happy Path Tests

Test the ideal user journey:

scenario:
  name: "Happy Path - User Registration"

  steps:
    - action: navigate
      url: "/register"
    - action: type
      selector: "#email"
      value: "newuser@example.com"
    - action: type
      selector: "#password"
      value: "SecurePass123!"
    - action: click
      selector: "button[type=submit]"
    - action: wait_for_url
      contains: "/welcome"

Error Path Tests

Verify error handling:

scenario:
  name: "Error Path - Invalid Login"

  steps:
    - action: navigate
      url: "/login"
    - action: type
      selector: "#email"
      value: "invalid@example.com"
    - action: type
      selector: "#password"
      value: "wrongpassword"
    - action: click
      selector: "button[type=submit]"
    - action: verify_element
      selector: ".error-message"
      contains: "Invalid credentials"

Regression Tests

Prevent bugs from reappearing:

scenario:
  name: "Regression - Issue #123 Password Reset"
  tags: [regression, bug-123]
  description: "Verifies password reset email is sent (was broken in v1.2)"

  steps:
    - action: navigate
      url: "/forgot-password"
    - action: type
      selector: "#email"
      value: "user@example.com"
    - action: click
      selector: "button[type=submit]"
    - action: verify_element
      selector: ".success-message"
      contains: "Reset email sent"

Philosophy Alignment [LEVEL 2]

This skill follows amplihack's core principles:

Ruthless Simplicity

YAML over code: Declarative tests are simpler than programmatic tests
No implementation details: Tests describe WHAT, not HOW
Minimal boilerplate: Each test is focused and concise

Modular Design (Bricks & Studs)

Self-contained scenarios: Each YAML file is independent
Clear contracts: Steps have well-defined inputs/outputs
Composable actions: Reuse actions across different test types

Zero-BS Implementation

No stubs: Every example in this skill is a complete, runnable test
Working defaults: Tests run with minimal configuration
Clear errors: Framework provides actionable error messages

Outside-In Thinking

User perspective: Tests verify behavior users care about
Implementation agnostic: Refactoring doesn't break tests
Behavior-driven: Focus on outcomes, not internals

Common Pitfalls and Solutions [LEVEL 2]

Pitfall 1: Over-Specifying

Problem: Test breaks when UI changes slightly

# Bad - Too specific
- action: verify_element
  selector: "div.container > div.row > div.col-md-6 > span.text-primary.font-bold"
  contains: "Welcome"

Solution: Use flexible selectors

# Good - Focused on behavior
- action: verify_element
  selector: ".welcome-message"
  contains: "Welcome"

Pitfall 2: Missing Waits

Problem: Test fails intermittently due to timing

# Bad - No wait for async operation
- action: click
  selector: ".load-data-button"
- action: verify_element
  selector: ".data-table" # May not exist yet!

Solution: Always wait for dynamic content

# Good - Wait for element to appear
- action: click
  selector: ".load-data-button"
- action: wait_for_element
  selector: ".data-table"
  timeout: 10s
- action: verify_element
  selector: ".data-table"

Pitfall 3: Testing Implementation Details

Problem: Test coupled to internal state

# Bad - Tests internal cache state
- action: verify_output
  contains: "Cache hit ratio: 85%"

Solution: Test user-visible behavior

# Good - Tests response time
- action: verify_response_time
  less_than: 100ms
  description: "Fast response indicates caching works"

Pitfall 4: Flaky Assertions

Problem: Assertions depend on exact timing or formatting

# Bad - Exact timestamp match will fail
- action: verify_output
  contains: "Created at: 2025-11-16 09:30:45"

Solution: Use flexible patterns

# Good - Match pattern, not exact value
- action: verify_output
  matches: "Created at: \\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2}"

Pitfall 5: Not Cleaning Up

Problem: Tests leave artifacts that affect future runs

# Bad - No cleanup
steps:
  - action: create_file
    path: "./test-data.json"
  - action: launch
    target: "./app"

Solution: Always use cleanup section

# Good - Cleanup ensures clean slate
steps:
  - action: create_file
    path: "./test-data.json"
  - action: launch
    target: "./app"

cleanup:
  - action: delete_file
    path: "./test-data.json"

Example Library [LEVEL 1]

This skill includes 15 complete working examples organized by application type and complexity level:

CLI Examples

calculator-basic.yaml [LEVEL 1] - Simple CLI arithmetic operations
cli-error-handling.yaml [LEVEL 2] - Error messages and recovery
cli-interactive-session.yaml [LEVEL 2] - Multi-turn interactive CLI

TUI Examples

file-manager-navigation.yaml [LEVEL 1] - Basic TUI keyboard navigation
tui-form-validation.yaml [LEVEL 2] - Complex form filling and validation
tui-performance-monitoring.yaml [LEVEL 3] - TUI performance dashboard testing

Web Examples

dashboard-smoke-test.yaml [LEVEL 1] - Simple web dashboard verification
web-authentication-flow.yaml [LEVEL 2] - Multi-step login workflow
web-visual-regression.yaml [LEVEL 2] - Screenshot-based visual testing

Electron Examples

single-window-basic.yaml [LEVEL 1] - Basic Electron window test
multi-window-coordination.yaml [LEVEL 2] - Multiple window orchestration
electron-menu-testing.yaml [LEVEL 2] - Application menu interactions
electron-ipc-testing.yaml [LEVEL 3] - Main/renderer IPC testing

Custom Agent Examples

custom-comprehension-agent.yaml [LEVEL 3] - Domain-specific validation logic
custom-reporter-integration.yaml [LEVEL 3] - Custom test reporting

See examples/ directory for full example code with inline documentation.

Framework Freshness Check [LEVEL 3]

This skill embeds knowledge of gadugi-agentic-test version 0.1.0. To check if a newer version exists:

# Run the freshness check script
python scripts/check-freshness.py

# Output if outdated:
# WARNING: Embedded framework version is 0.1.0
# Latest GitHub version is 0.2.5
#
# New features in 0.2.5:
# - Native Playwright support for web testing
# - Video recording for all test types
# - Parallel test execution
#
# Update with: npm update -g @gadugi/agentic-test

The script checks the GitHub repository for releases and compares against the embedded version. This ensures you're aware of new features and improvements.

When to Update This Skill:

New framework version adds significant features
Breaking changes in YAML schema
New application types supported
Agent capabilities expand

Integration with Other Skills [LEVEL 2]

Works Well With

test-gap-analyzer:

Use test-gap-analyzer to find untested functions
Write outside-in tests for critical user-facing paths
Use unit tests (from test-gap-analyzer) for internal functions

philosophy-guardian:

Ensure test YAML follows ruthless simplicity
Verify tests focus on behavior, not implementation

pr-review-assistant:

Include outside-in tests in PR reviews
Verify tests cover changed functionality
Check test readability and clarity

module-spec-generator:

Generate module specs that include outside-in test scenarios
Use specs as templates for test YAML

Example Combined Workflow

# 1. Analyze coverage gaps
claude "Use test-gap-analyzer on ./src"

# 2. Write outside-in tests for critical paths
claude "Use qa-team to create web tests for authentication"

# 3. Verify philosophy compliance
claude "Use philosophy-guardian to review new test files"

# 4. Include in PR
git add tests/agentic/
git commit -m "Add outside-in tests for auth flow"

Troubleshooting [LEVEL 2]

Test Times Out

Symptom: Test exceeds timeout and fails

Causes:

Application takes longer to start than expected
Network requests are slow
Element never appears (incorrect selector)

Solutions:

# Increase timeout
- action: wait_for_element
  selector: ".slow-loading-element"
  timeout: 30s # Increase from default

# Add intermediate verification
- action: launch
  target: "./app"
- action: wait_for_output
  contains: "Initializing..."
  timeout: 5s
- action: wait_for_output
  contains: "Ready"
  timeout: 20s

Element Not Found

Symptom: verify_element or click fails with "element not found"

Causes:

Incorrect CSS selector
Element not yet rendered (timing issue)
Element in iframe or shadow DOM

Solutions:

# Add wait before interaction
- action: wait_for_element
  selector: ".target-element"
  timeout: 10s
- action: click
  selector: ".target-element"

# Use more specific selector
- action: click
  selector: "button[data-testid='submit-button']"

# Handle iframe
- action: switch_to_iframe
  selector: "iframe#payment-frame"
- action: click
  selector: ".pay-now-button"

Test Passes Locally, Fails in CI

Symptom: Test works on dev machine but fails in CI environment

Causes:

Different screen size (web/Electron)
Missing dependencies
Timing differences (slower CI machines)
Environment variable differences

Solutions:

# Set explicit viewport size (web/Electron)
scenario:
  environment:
    viewport:
      width: 1920
      height: 1080

# Add longer timeouts in CI
- action: wait_for_element
  selector: ".element"
  timeout: 30s  # Generous for CI

# Verify prerequisites
prerequisites:
  - "Chrome browser installed"
  - "Environment variable API_KEY is set"

Output Doesn't Match Expected

Symptom: verify_output fails even though output looks correct

Causes:

Extra whitespace or newlines
ANSI color codes in output
Case sensitivity

Solutions:

# Use flexible matching
- action: verify_output
  matches: "Result:\\s+Success" # Allow flexible whitespace

# Strip ANSI codes
- action: verify_output
  contains: "Success"
  strip_ansi: true

# Case-insensitive match
- action: verify_output
  contains: "success"
  case_sensitive: false

Reference: Action Catalog [LEVEL 3]

CLI Actions

Action	Parameters	Description
`launch`	`target`, `args`, `cwd`, `env`	Start CLI application
`send_input`	`value`, `delay`	Send text to stdin
`send_signal`	`signal`	Send OS signal (SIGINT, SIGTERM, etc.)
`wait_for_output`	`contains`, `matches`, `timeout`	Wait for text in stdout/stderr
`verify_output`	`contains`, `matches`, `stream`	Check output content
`verify_exit_code`	`expected`	Validate exit code
`capture_output`	`save_as`, `stream`	Save output to file

TUI Actions

Action	Parameters	Description
`launch`	`target`, `args`, `terminal_size`	Start TUI application
`send_keypress`	`value`, `times`, `modifiers`	Send keyboard input
`wait_for_screen`	`contains`, `timeout`	Wait for text on screen
`verify_screen`	`contains`, `matches`, `region`	Check screen content
`capture_screenshot`	`save_as`	Save terminal screenshot
`navigate_menu`	`path`	Navigate menu structure
`fill_form`	`fields`	Fill TUI form fields

Web Actions

Action	Parameters	Description
`navigate`	`url`, `wait_for_load`	Go to URL
`click`	`selector`, `text`, `nth`	Click element
`type`	`selector`, `value`, `delay`	Type into input
`wait_for_element`	`selector`, `timeout`, `disappears`	Wait for element
`verify_element`	`selector`, `contains`, `count`, `exists`	Check element state
`verify_url`	`equals`, `contains`, `matches`	Validate URL
`screenshot`	`save_as`, `selector`, `full_page`	Capture screenshot
`scroll`	`selector`, `direction`, `amount`	Scroll page/element
`select_option`	`selector`, `value`	Select dropdown option
`checkbox`	`selector`, `checked`	Check/uncheck checkbox

Electron Actions

Action	Parameters	Description
`launch`	`target`, `args`, `wait_for_window`	Start Electron app
`window_action`	`window`, `action`	Interact with windows
`menu_click`	`path`	Click menu items
`dialog_action`	`type`, `action`, `filename`	Handle dialogs
`ipc_send`	`channel`, `data`	Send IPC message
`ipc_expect`	`channel`, `timeout`	Wait for IPC message
`verify_window`	`title`, `visible`, `focused`, `count`	Check window state
All web actions		Electron includes Chromium

Common Parameters

Parameter	Type	Description
`timeout`	Duration	Maximum wait time (e.g., "5s", "2m")
`description`	String	Human-readable step explanation
`continue_on_failure`	Boolean	Don't fail scenario if step fails
`id`	String	Step identifier for conditionals
`condition`	Expression	Execute step only if condition true

Quick Reference: YAML Template [LEVEL 1]

scenario:
  # Required fields
  name: "Test Name"
  description: "What this test verifies"
  type: cli | tui | web | electron

  # Optional metadata
  tags: [smoke, critical]
  timeout: 60s

  # What must be true before running
  prerequisites:
    - "Condition 1"
    - "Condition 2"

  # Environment setup
  environment:
    variables:
      VAR_NAME: "value"

  # Variables for templating
  variables:
    username: "testuser"

  # Test steps (executed in order)
  steps:
    - action: launch
      target: "./app"

    - action: send_input
      value: "command\n"

    - action: verify_output
      contains: "Success"
      timeout: 5s

  # Cleanup (always runs)
  cleanup:
    - action: stop_application

Skill Usage Examples [LEVEL 1]

Invoke This Skill

Claude, use the qa-team skill to create a CLI test for my calculator app.

Claude, use qa-team to generate web tests for the login flow.

Claude, create TUI tests using the qa-team skill for file navigation.

What You'll Get

Complete YAML test scenario matching your requirements
Inline documentation explaining each section
Best practices applied (timeouts, waits, verification)
Appropriate complexity level (1, 2, or 3 based on needs)
Instructions for running the test

Example Request and Response

User: "Create a web test for user login with valid credentials"

Claude (using this skill):

scenario:
  name: "User Login - Valid Credentials"
  description: "Verifies user can successfully log in with valid email and password"
  type: web
  tags: [smoke, auth, critical]

  prerequisites:
    - "User test@example.com exists with password 'TestPass123!'"
    - "Application is running on http://localhost:3000"

  steps:
    # Navigate to login page
    - action: navigate
      url: "http://localhost:3000/login"
      description: "Open login page"

    # Wait for page to fully load
    - action: wait_for_element
      selector: "form.login-form"
      timeout: 5s
      description: "Wait for login form to render"

    # Fill in email field
    - action: type
      selector: "input[name='email']"
      value: "test@example.com"
      description: "Enter email address"

    # Fill in password field
    - action: type
      selector: "input[name='password']"
      value: "TestPass123!"
      description: "Enter password"

    # Submit form
    - action: click
      selector: "button[type='submit']"
      description: "Click login button"

    # Wait for redirect to dashboard
    - action: wait_for_url
      contains: "/dashboard"
      timeout: 10s
      description: "Should redirect to dashboard after successful login"

    # Verify user is logged in
    - action: verify_element
      selector: ".user-profile"
      contains: "test@example.com"
      description: "User email should appear in profile widget"

    # Capture evidence
    - action: screenshot
      save_as: "successful-login.png"
      description: "Save screenshot of logged-in state"
# To run this test:
# 1. Save as test-login-valid.yaml
# 2. Run: gadugi-test run test-login-valid.yaml
# 3. Check evidence in ./evidence/user-login-valid-credentials-TIMESTAMP/

Related Resources [LEVEL 1]

Official Documentation

GitHub Repository: https://github.com/rysweet/gadugi-agentic-test
Framework Docs: See repo README and docs/ folder
Issue Tracker: https://github.com/rysweet/MicrosoftHackathon2025-AgenticCoding/issues/1356

Level 4: Shadow Environment Integration [LEVEL 4]

Run your outside-in tests in isolated shadow environments to validate changes before pushing. This combines the behavioral testing power of gadugi-agentic-test with the clean-state isolation of shadow environments.

Why Use Shadow Environments for Testing

Clean State: Fresh container, no host pollution
Local Changes: Test uncommitted code exactly as-is
Multi-Repo: Coordinate changes across multiple repos
CI Parity: What shadow sees ≈ what CI will see

Shadow Testing Workflow

For complete shadow environment documentation, see the shadow-testing skill. Here's how to integrate it with outside-in tests:

Pattern 1: CLI Tests in Shadow (Amplifier)

# Create shadow with your local library changes
shadow.create(local_sources=["~/repos/my-lib:org/my-lib"])

# Run outside-in test scenarios inside shadow
shadow.exec(shadow_id, "gadugi-test run test-scenario.yaml")

# Extract evidence
shadow.extract(shadow_id, "/evidence", "./test-evidence")

# Cleanup
shadow.destroy(shadow_id)

Pattern 2: CLI Tests in Shadow (Standalone)

# Create shadow with local changes
amplifier-shadow create --local ~/repos/my-lib:org/my-lib --name test

# Run your test scenarios
amplifier-shadow exec test "gadugi-test run test-scenario.yaml"

# Extract results
amplifier-shadow extract test /evidence ./test-evidence

# Cleanup
amplifier-shadow destroy test

Pattern 3: Multi-Repo Integration Test

# test-multi-repo.yaml
scenario:
  name: "Multi-Repo Integration Test"
  type: cli

  prerequisites:
    - "Shadow environment with core-lib and cli-tool"

  steps:
    - action: launch
      target: "cli-tool"

    - action: send_input
      value: "process --lib core-lib\n"

    - action: verify_output
      contains: "Success: Using core-lib"

# Setup shadow with both repos
amplifier-shadow create \
  --local ~/repos/core-lib:org/core-lib \
  --local ~/repos/cli-tool:org/cli-tool \
  --name multi-test

# Run test that exercises both
amplifier-shadow exec multi-test "gadugi-test run test-multi-repo.yaml"

Pattern 4: Web App Testing in Shadow

# test-web-app.yaml
scenario:
  name: "Web App with Local Library"
  type: web

  steps:
    - action: navigate
      url: "http://localhost:3000"

    - action: click
      selector: "button.process"

    - action: verify_element
      selector: ".result"
      contains: "Processed with v2.0" # Your local version

# Shadow with library changes
amplifier-shadow create --local ~/repos/my-lib:org/my-lib --name web-test

# Start web app inside shadow (uses your local lib)
amplifier-shadow exec web-test "
  cd /workspace &&
  git clone https://github.com/org/web-app &&
  cd web-app &&
  npm install &&  # Pulls your local my-lib via git URL rewriting
  npm start &
"

# Wait for app to start, then run tests
amplifier-shadow exec web-test "sleep 5 && gadugi-test run test-web-app.yaml"

Verification Best Practices

When running tests in shadow, always verify your local sources are being used:

# After shadow.create, check snapshot commits
shadow.status(shadow_id)
# Shows: snapshot_commits: {"org/my-lib": "abc1234..."}

# When your test installs dependencies, verify commit matches
# Look in test output for: my-lib @ git+...@abc1234

Complete Example: Library Change Validation

# test-library-change.yaml - Outside-in test
scenario:
  name: "Validate Library Breaking Change"
  type: cli
  description: "Test that dependent app still works with new library API"

  steps:
    - action: launch
      target: "/workspace/org/dependent-app/cli.py"

    - action: send_input
      value: "process data.json\n"

    - action: verify_output
      contains: "Processed successfully"
      description: "New library API should still work"

    - action: verify_exit_code
      expected: 0

# Complete workflow
# 1. Create shadow with your breaking change
amplifier-shadow create --local ~/repos/my-lib:org/my-lib --name breaking-test

# 2. Install dependent app (pulls your local lib)
amplifier-shadow exec breaking-test "
  cd /workspace &&
  git clone https://github.com/org/dependent-app &&
  cd dependent-app &&
  pip install -e . &&  # This installs git+https://github.com/org/my-lib (your local version)
  echo 'Ready to test'
"

# 3. Run outside-in test
amplifier-shadow exec breaking-test "gadugi-test run test-library-change.yaml"

# If test passes, your breaking change is compatible!
# If test fails, you've caught the issue before pushing

When to Use Shadow Integration

Use shadow + outside-in tests when:

✅ Testing library changes with dependent projects
✅ Validating multi-repo coordinated changes
✅ Need clean-state validation before pushing
✅ Want to catch integration issues early
✅ Testing that setup/install procedures work

Don't use shadow for:

❌ Simple unit tests (too much overhead)
❌ Tests of already-committed code (shadow adds no value)
❌ Performance testing (container overhead skews results)

Learn More

For complete shadow environment documentation, including:

Shell scripts for DIY setup
Docker Compose examples
Multi-language support (Python, Node, Rust, Go)
Troubleshooting and verification techniques

Load the shadow-testing skill:

Claude, use the shadow-testing skill to set up a shadow environment

Or for Amplifier users, the shadow tool is built-in:

shadow.create(local_sources=["~/repos/lib:org/lib"])

Related Skills

shadow-testing: Complete shadow environment setup and usage
test-gap-analyzer: Find untested code paths
philosophy-guardian: Review test philosophy compliance
pr-review-assistant: Include tests in PR reviews
module-spec-generator: Generate specs with test scenarios

Level 4: A/B Comparison Harness & Self-Improving Audit [LEVEL 4]

Reusable tools for running any two CLI implementations side-by-side and quantifying their behavioral differences. Works for any scenario where you need to compare two things:

Migration parity: Python→Rust, v1→v2, monolith→microservice
A/B testing: Compare two approaches to the same CLI
Feature flag validation: Same binary, different configs
Version regression: Compare release N with release N+1
Canary validation: Compare canary build against stable

Tools

Located in .claude/scenarios/ab-comparison/:

File	Purpose
`ab_comparison_harness.py`	Run two CLIs side-by-side in isolated sandboxes
`ab_audit_cycle.py`	Self-improving loop: identify → categorize → fix → re-validate
`SCENARIO_FORMAT.md`	YAML scenario format specification
`example_scenario.yaml`	Example scenario for a calculator CLI

Quick Start

# Compare any two CLI implementations
python .claude/scenarios/ab-comparison/ab_comparison_harness.py \
    --a "python -m myapp.cli" \
    --b "./target/debug/myapp" \
    --scenario tests/scenarios/*.yaml

# A/B with feature flag
python .claude/scenarios/ab-comparison/ab_comparison_harness.py \
    --a "myapp --feature-flag=off" \
    --b "myapp --feature-flag=on" \
    --scenario tests/scenarios/regression.yaml

# Shadow mode: log divergences without failing (for production rollout)
python .claude/scenarios/ab-comparison/ab_comparison_harness.py \
    --a "myapp-stable" \
    --b "myapp-canary" \
    --scenario tests/scenarios/*.yaml \
    --shadow-mode \
    --shadow-log /tmp/canary-divergences.jsonl

# Self-improving audit cycle (iterate until convergence)
python .claude/scenarios/ab-comparison/ab_audit_cycle.py \
    --a "python -m myapp.cli" \
    --b "./target/debug/myapp" \
    --scenarios-dir tests/scenarios/

# Legacy flags (--legacy/--candidate) also work
python .claude/scenarios/ab-comparison/ab_comparison_harness.py \
    --legacy "python -m myapp.cli" \
    --candidate "./target/debug/myapp" \
    --scenario tests/scenarios/*.yaml

Scenario Format (Summary)

cases:
  - name: "unique-test-name"
    category: "install"                     # For audit grouping
    argv: ["subcommand", "--flag"]          # Same args for both sides
    timeout: 15                             # Seconds
    env:
      PATH: "${SANDBOX_ROOT}/bin:${PATH}"   # Template variables
    setup: |                                # Bash script run in both sandboxes
      mkdir -p bin
      echo '#!/bin/bash' > bin/stub && chmod +x bin/stub
    compare:
      - exit_code                           # Integer match
      - stdout                              # Text or JSON-semantic
      - stderr                              # Text match
      - "fs:path/to/file"                  # File/dir hash comparison
      - "jsonfs:path/to/file.json"         # JSON semantic comparison

See SCENARIO_FORMAT.md for the full specification.

Key Design Decisions

Isolated sandboxes: Each side gets its own HOME, TMPDIR, PATH. No collisions between the two runs, or with the host.
JSON-semantic comparison: stdout comparison parses both sides as JSON if possible, ignoring key ordering. Falls back to exact text if not JSON.
Stub binary pattern: Test launcher behavior by providing fake binaries that capture their args and env vars to files, then compare those files.
Timeout resilience: Captures partial output on timeout (exit code 124) instead of crashing the harness. Critical for testing commands that may hang in sandboxed environments.
Shadow mode: Logs divergences to append-only JSONL without failing the run. Use during rollout to monitor real workloads.
Audit categorization: Groups divergences by keyword inference and generates prioritized fix workstream specs.

Self-Improvement Cycle Pattern

┌─────────────┐
│  IDENTIFY   │  Run all scenarios, collect divergences
└──────┬──────┘
       ▼
┌─────────────┐
│ CATEGORIZE  │  Group by area: install (5), launch (3), recipe (1)
└──────┬──────┘
       ▼
┌─────────────┐
│  SPECIFY    │  Generate fix specs: [CRITICAL] fix-install, [HIGH] fix-launch
└──────┬──────┘
       ▼
┌─────────────┐
│    FIX      │  Agent/human fixes code
└──────┬──────┘
       ▼
┌─────────────┐
│ RE-VALIDATE │  Run audit again → check progress (5→2 divergences)
└──────┬──────┘
       ▼
    Converged? ──no──→ back to FIX
       │
      yes
       ▼
     DONE

Real-World Results

Used for amplihack-rs migration validation (rysweet/amplihack-rs#25):

118/118 comparison tests across 12 tiers
619/619 hook golden file tests
9/9 shadow harness cases
5 Python bugs discovered and fixed
1 recipe orchestrator bug discovered and fixed
Full launcher behavioral match achieved

Changelog [LEVEL 3]

Version 1.2.0 (2026-03-11)

NEW: Level 4 - A/B Comparison Harness & Self-Improving Audit
Generic ab_comparison_harness.py for any side-by-side CLI comparison (migration parity, A/B testing, feature flags, version regression, canary)
Generic ab_audit_cycle.py self-improvement loop
SCENARIO_FORMAT.md YAML specification
Supports --a/--b flags (also --legacy/--candidate for backward compat)
Example scenario file
Extracted from amplihack-rs parity validation (118/118 tests)

Version 1.1.0 (2026-01-29)

NEW: Level 4 - Shadow Environment Integration
Added complete shadow testing workflow patterns
Integration examples for Amplifier native and standalone CLI
Multi-repo integration test patterns
Web app testing in shadow environments
Complete workflow example for library change validation
References to shadow-testing skill for deep-dive documentation

Version 1.0.0 (2025-11-16)

Initial skill release
Support for CLI, TUI, Web, and Electron applications
15 complete working examples
Progressive disclosure levels (1, 2, 3)
Embedded gadugi-agentic-test framework documentation (v0.1.0)
Freshness check script for version monitoring
Full integration with amplihack philosophy
Comprehensive troubleshooting guide
Action reference catalog

Remember: Outside-in tests verify WHAT your application does, not HOW it does it. Focus on user-visible behavior, and your tests will remain stable across refactorings while providing meaningful validation of critical workflows.

Start at Level 1 with simple smoke tests, and progressively add complexity only when needed. The framework's AI agents handle the hard parts - you just describe what should happen.