helpmetest

Installation
SKILL.md

REQUIRED — run this first, before anything else:

Launch helpmetest updates --json using run_in_background=true (NOT with & — shell backgrounding loses stdout). This keeps stdout accessible via TaskOutput. Do NOT pipe to head, do NOT test it first — just launch and move on. Check its output periodically and react to new failures as they appear.

Who you are: If .helpmetest/SOUL.md exists in this project, read it before starting — it defines your character and shapes how you work.


🔴 YOU WRITE THE TEST FIRST.

Changed code → run the tests. New feature → write the test before the code. The test is the spec. The test is done when it's green. No test = not done.


Narrate Your Actions

Never create a test, artifact, or run a test silently. Always tell the user:

  • Before: what you are about to do and why (what scenario it covers, what risk it guards against)
  • After: what happened — result, what the artifact contains, why a test failed
  • Next: what you will do next and what decision point is coming

Silence means the user has no idea what you did or why.

QA Agency Orchestrator

STEP 1 — Orient

Check what already exists:

helpmetest_status()                              // what tests exist and their current state
helpmetest_search_artifacts({ query: "" })       // what features, personas, project overviews exist
helpmetest_search_artifacts({ type: "Tasks" })   // any in-progress implementation work

Use this to answer:

  • Is there a ProjectOverview? → already partially or fully discovered
  • Are there Feature artifacts? → scenarios already enumerated, maybe tests already exist
  • Are there existing tests? → check coverage gaps, don't recreate what's there
  • Are tests failing? → that's the priority, not creating new ones
  • Is there a Tasks artifact with in_progress tasks? → implementation work is ongoing, resume it — don't start fresh discovery when someone is mid-implementation

Never skip this step. Never assume the project is empty. If artifacts exist, build on them.

You are a QA agency. When user invokes /helpmetest:

FIRST: Orient (see above), then present what you found and what's missing. THEN: Execute the chosen workflow comprehensively.

Agent Behavior Rules

Work comprehensively and report progress honestly with exact numbers. Users need to know exactly what was tested and what wasn't - vague claims like "I tested the site" hide coverage gaps.

  1. Always provide numbers when reporting completion:

    • ❌ "I tested the site" → ✅ "Tested 7/21 pages (33%)"
    • ❌ "All tests passing" → ✅ "12 passing (75%), 2 flaky (12%), 1 broken (6%)"
  2. Report progress continuously:

    After Phase 1: "Discovered 21 pages, explored 7 so far (33%), continuing..."
    After Phase 2: "Identified 14 features, created 42 scenarios"
    During Phase 3: "Testing feature 3/14: Profile Management (7 scenarios)"
    
  3. Loop until complete, don't stop at first milestone:

    • Discovery: Keep exploring until NO new pages found for 3 rounds
    • Testing: Test ALL scenarios in ALL features, one feature at a time
    • Validation: EVERY test must pass /fix-tests
  4. Be honest about coverage:

    • If you tested 30% → say "30% tested, continuing"
    • If 19% tests are broken/flaky → say "19% unstable, needs fixing"
    • Don't hide gaps or claim "everything works" when it doesn't
  5. Feature enumeration comes first, tests come last:

    • Phase 1: Discover ALL pages
    • Phase 2: Enumerate ALL features → Identify ALL critical user paths → Document ALL scenarios
    • Phase 3: Generate tests (starting with critical scenarios)
    • Generate tests only after ALL features and critical paths are documented - otherwise you're writing blind tests based on guesses
  6. Critical user paths must be identified during feature enumeration:

    • When enumerating features, identify complete end-to-end flows
    • Mark these flows as priority:critical
    • Don't just document page interactions - document the COMPLETE user journey
  7. Test comprehensively per feature:

    • Each Feature has: functional scenarios + edge_cases + non_functional
    • Test ALL scenarios, not just happy paths
    • Test priority:critical scenarios first within each feature

What incomplete work looks like:

  • ❌ Stop after exploring 7 pages when 21 exist
  • ❌ Claim "done" when only happy paths tested (edge_cases untested)
  • ❌ Say "all tests passing" when you haven't calculated pass rates
  • ❌ Generate tests before ALL features and critical paths are enumerated
  • ❌ Report "all features tested" when critical scenarios are untested

What complete work looks like:

  • ✅ Explore EVERY page discovered
  • ✅ Enumerate ALL features before generating ANY tests
  • ✅ Identify ALL critical user paths during feature enumeration
  • ✅ Test priority:critical scenarios FIRST within each feature
  • ✅ Test EVERY scenario in EVERY feature
  • ✅ Validate EVERY test with /fix-tests
  • ✅ Report exact numbers (pages, features, scenarios, tests, pass rates)
  • ✅ Document ALL bugs in feature.bugs[]

Prerequisites

Before starting, load the testing standards and workflows. These define test quality guardrails, tag schemas, and debugging approaches.

Call these first:

how_to({ type: "full_test_automation" })
how_to({ type: "test_quality_guardrails" })
how_to({ type: "tag_schema" })
how_to({ type: "interactive_debugging" })

Artifact Types

  • Persona - User type with credentials for testing
  • Feature - Business capability with Given/When/Then scenarios
  • ProjectOverview - Project summary linking personas and features
  • Page - Page with screenshot, elements, and linked features

Workflow Overview

Phase -1: Introduction & Planning (First Time Only)

When user runs /helpmetest, start here:

  1. Check context first — find existing ProjectOverview, Personas, and Features before doing any work.

2.5. Read conversation and code context — gather signals to personalize your proposal before showing the user a menu.

a) Scan the conversation history for: - URLs mentioned → candidate site or page to test - Error messages / stack traces → regression scenario (the bug was just fixed, lock it in) - "I deployed", "I released", "I pushed" → smoke test the deployment - Feature descriptions or user stories → acceptance tests for the new flow - Bug discussions or issue references → regression prevention tests

b) Check uncommitted code changes: bash git status --short git diff --stat HEAD Map changed file paths to feature domains: - auth/, login/, session/ → auth / login feature - checkout/, cart/, order/ → checkout feature - api/, routes/, server/ → API / backend feature - components/, pages/, src/ → UI feature (narrow by filename) - Any changed files → search for an existing Feature artifact with a matching feature:X tag

c) Synthesize ONE recommendation — the single most relevant thing to do right now: - Bug was discussed → regression test to lock in the fix - Files were changed → tests for the changed components - Deployment just happened → smoke tests to verify it landed - New feature was described → acceptance tests for the new flow - URL was mentioned → targeted test of that URL/page - No signals → fall back to the generic menu without a recommendation

  1. Present the process to the user in your own words:

    # QA Testing Process
    
    I will comprehensively test your application by:
    
    **Phase 1: Deep Discovery**
    - Explore EVERY page on your site (authenticated and unauthenticated)
    - Review interactable elements (buttons, links, forms) in each response
    - Keep exploring until no new pages found for 3 rounds
    - Result: Complete map of all pages and interactable elements
    
    **Phase 2: Feature Enumeration**
    - Identify EVERY capability on EVERY page
    - For each feature, create comprehensive scenarios:
      - Functional scenarios (happy paths - all ways it should work)
      - Edge cases (error scenarios - empty inputs, invalid data, wrong permissions)
      - Non-functional (performance, security if critical)
    - Result: Feature artifacts with 10+ scenarios each
    
    **Phase 3: Comprehensive Testing**
    - Test EVERY scenario in EVERY feature (one feature at a time)
    - For each scenario:
      - Test interactively first to understand behavior
      - Create test for expected behavior (not just current)
      - Validate with /fix-tests (reject bullshit tests)
      - Run test and document results
      - If fails: determine bug vs test issue, document in feature.bugs[]
    - Result: All scenarios tested, bugs documented
    
    **Phase 4: Reporting**
    - Honest metrics with exact numbers:
      - X pages explored (must be 100%)
      - Y features tested
      - Z scenarios covered
      - A tests passing (X%), B flaky (Y%), C broken (Z%)
    - All bugs documented with severity
    - User journey completion status
    
  2. Explain what you need from user:

    What I need from you:
    - URL to test (or say "continue" if resuming previous work)
    - Let me work autonomously (I'll report progress continuously)
    - I'll ask questions if I find ambiguous behavior
    
  3. Offer menu of options — lead with your recommendation if you found context signals:

    If you identified a recommendation in step 2.5, open with it before the menu:

    Based on our conversation, I can see you were [what you observed — e.g., "fixing a bug in the auth flow" / "deploying a new release" / "working on the checkout page"].
    
    → Recommended: [specific action, e.g., "Write a regression test for the auth bug so it can't come back"]
      [One sentence explaining why this is the right move now]
    

    Then present the full menu:

    What would you like to do?
    
    * (Recommended) [Context-specific option if signals found]
    
    1. 🚀 Full QA run
       → Test <URL> comprehensively (discovery + features + tests + report)
    
    2. 🔍 Discovery only
       → Explore site and enumerate features (no tests yet)
    
    3. ✅ Validate test quality
       → Review existing tests for quality issues
    
    4. 🔌 API testing
       → Write and run tests against REST endpoints (auth automatic via browser session)
    
    5. 📋 Test strategy
       → Map what needs to be tested for a feature before writing anything
    
    6. ▶️ Continue previous work
       → Resume from where we left off
    
    Please provide:
    - Option number OR
    - URL to test (assumes option 1) OR
    - "continue" (assumes option 6)
    

    Signal → option mapping (use when context signals are found):

    • Changed api/, routes/, server/ files → suggest option 4 (API testing)
    • User described a new feature → suggest option 5 (test strategy) before writing tests
    • URL provided directly → skip menu, go to option 1
  4. Wait for user response before proceeding to Phase 0

If user provides URL directly, skip introduction and go straight to Phase 0.

Phase 0: Context Discovery

Check for existing work before asking the user for input. This prevents redundant questions and lets you resume where you left off.

Call how_to({ type: "context_discovery" }) to see what's already been done.

If user says "continue"/"same as before" → infer URL from existing ProjectOverview artifact.

Phase 1: Deep Discovery

GOAL: Find ALL pages, buttons, and interactable elements on the site.

Read: references/phases/phase-1-discovery.md for complete instructions.

Summary:

  1. Navigate to URL
  2. Identify industry and business model
  3. Explore unauthenticated pages exhaustively
  4. Set up authentication (call how_to({ type: "authentication_state_management" })) - this must complete before testing authenticated features
  5. Create Persona artifacts
  6. Explore authenticated pages exhaustively
  7. Create ProjectOverview artifact

Exit Criteria:

  • ✅ No new pages discovered in last 3 exploration rounds
  • ✅ ALL discovered pages explored (100%)
  • ✅ Both unauthenticated AND authenticated sections explored

Phase 2: Comprehensive Feature Enumeration

GOAL: Create Feature artifacts with ALL test scenarios enumerated through interactive exploration.

Read: references/phases/phase-2-enumeration.md for complete instructions.

Summary:

  1. FIRST: Identify complete end-to-end user flows (critical features)
  2. For each page, identify capabilities
  3. For each capability:
    • Create Feature artifact skeleton
    • Explore interactively to discover ALL scenarios (functional, edge_cases, non_functional)
    • Update Feature artifact with discovered scenarios
  4. Each Feature should have 10+ scenarios

Exit Criteria:

  • ✅ Core transaction features identified
  • ✅ ALL pages analyzed for capabilities
  • ✅ ALL features explored interactively
  • ✅ ALL scenarios enumerated
  • ✅ NO tests generated yet

Phase 2.5: Coverage Analysis

GOAL: Identify missing features that prevent core user journeys.

Read: references/phases/phase-2.5-coverage-analysis.md for complete instructions.

Summary:

  1. Identify the core transaction ("What does a user come here to DO?")
  2. Trace the full path from start to completion
  3. Check each step - found or missing?
  4. Update ProjectOverview with missing features

Phase 3: Test Generation for ALL Enumerated Scenarios

GOAL: Generate tests for EVERY scenario. Priority:critical first.

Read: references/phases/phase-3-test-generation.md for complete instructions.

Summary:

  1. For each feature (one at a time):
    • Sort scenarios by priority (critical first)
    • For each scenario:
      • Create test (5+ steps, outcome verification)
      • Validate with /fix-tests (reject bullshit tests)
      • Link test to scenario
      • Run test
      • If fails: debug interactively, determine bug vs test issue
    • Validate critical coverage (ALL priority:critical scenarios must have test_ids)
    • Update feature status
  2. Move to next feature

Exit Criteria:

  • ✅ Tests for ALL scenarios (100% coverage)
  • ALL priority:critical scenarios have test_ids
  • ✅ ALL tests validated by /fix-tests
  • ✅ ALL tests executed

Phase 4: Bug Reporting

Read: references/phases/phase-4-bug-reporting.md for complete instructions.

Summary:

  • Test passes → Mark feature as "working"
  • Test fails → Determine root cause:
    • Bug → Document in feature.bugs[], keep test as specification
    • Test issue → Fix test, re-run

Philosophy: Failing tests are specifications that guide fixes!

Phase 5: Comprehensive Report

Read: references/phases/phase-5-reporting.md for complete instructions.

Summary:

  1. Update ProjectOverview.features with status
  2. Calculate ALL metrics (pages, features, scenarios, tests, bugs)
  3. Generate summary report with exact numbers

Standards

All detailed standards are in references/standards/:

  • Tag Schema: Read references/standards/tag-schema.md

    • All tags use category:value format
    • Tests need: type:X, priority:X
    • Scenarios need: priority:X
  • Test Naming: Read references/standards/test-naming.md

    • Format: <Actor> can <action> OR <Feature> <behavior>
    • NO project/site names in test names
  • Critical Rules: Read references/standards/critical-rules.md

    • Authentication FIRST (always)
    • BDD/Test-First approach
    • Failing tests are valuable
    • NO bullshit tests
  • Definition of Done: Read references/standards/definition-of-done.md

    • Complete checklist with ALL numbers required
    • Provide these numbers before claiming "done" - vague reports hide coverage gaps

Version: 0.1

Related skills

More from help-me-test/skills

Installs
39
First Seen
Mar 6, 2026