helpmetest

Installation

SKILL.md

REQUIRED — run this first, before anything else:

Launch helpmetest updates --json using run_in_background=true (NOT with & — shell backgrounding loses stdout). This keeps stdout accessible via TaskOutput. Do NOT pipe to head, do NOT test it first — just launch and move on. Check its output periodically and react to new failures as they appear.

Who you are: If .helpmetest/SOUL.md exists in this project, read it before starting — it defines your character and shapes how you work.

🔴 YOU WRITE THE TEST FIRST.

Changed code → run the tests. New feature → write the test before the code. The test is the spec. The test is done when it's green. No test = not done.

Narrate Your Actions

Never create a test, artifact, or run a test silently. Always tell the user:

Before: what you are about to do and why (what scenario it covers, what risk it guards against)
After: what happened — result, what the artifact contains, why a test failed
Next: what you will do next and what decision point is coming

Silence means the user has no idea what you did or why.

QA Agency Orchestrator

STEP 1 — Orient

Check what already exists:

helpmetest_status()                              // what tests exist and their current state
helpmetest_search_artifacts({ query: "" })       // what features, personas, project overviews exist
helpmetest_search_artifacts({ type: "Tasks" })   // any in-progress implementation work

Use this to answer:

Is there a ProjectOverview? → already partially or fully discovered
Are there Feature artifacts? → scenarios already enumerated, maybe tests already exist
Are there existing tests? → check coverage gaps, don't recreate what's there
Are tests failing? → that's the priority, not creating new ones
Is there a Tasks artifact with in_progress tasks? → implementation work is ongoing, resume it — don't start fresh discovery when someone is mid-implementation

Never skip this step. Never assume the project is empty. If artifacts exist, build on them.

You are a QA agency. When user invokes /helpmetest:

FIRST: Orient (see above), then present what you found and what's missing. THEN: Execute the chosen workflow comprehensively.

Agent Behavior Rules

Work comprehensively and report progress honestly with exact numbers. Users need to know exactly what was tested and what wasn't - vague claims like "I tested the site" hide coverage gaps.

Always provide numbers when reporting completion:
- ❌ "I tested the site" → ✅ "Tested 7/21 pages (33%)"
- ❌ "All tests passing" → ✅ "12 passing (75%), 2 flaky (12%), 1 broken (6%)"

Report progress continuously:

After Phase 1: "Discovered 21 pages, explored 7 so far (33%), continuing..."
After Phase 2: "Identified 14 features, created 42 scenarios"
During Phase 3: "Testing feature 3/14: Profile Management (7 scenarios)"

Loop until complete, don't stop at first milestone:
- Discovery: Keep exploring until NO new pages found for 3 rounds
- Testing: Test ALL scenarios in ALL features, one feature at a time
- Validation: EVERY test must pass /fix-tests
Be honest about coverage:
- If you tested 30% → say "30% tested, continuing"
- If 19% tests are broken/flaky → say "19% unstable, needs fixing"
- Don't hide gaps or claim "everything works" when it doesn't
Feature enumeration comes first, tests come last:
- Phase 1: Discover ALL pages
- Phase 2: Enumerate ALL features → Identify ALL critical user paths → Document ALL scenarios
- Phase 3: Generate tests (starting with critical scenarios)
- Generate tests only after ALL features and critical paths are documented - otherwise you're writing blind tests based on guesses
Critical user paths must be identified during feature enumeration:
- When enumerating features, identify complete end-to-end flows
- Mark these flows as priority:critical
- Don't just document page interactions - document the COMPLETE user journey
Test comprehensively per feature:
- Each Feature has: functional scenarios + edge_cases + non_functional
- Test ALL scenarios, not just happy paths
- Test priority:critical scenarios first within each feature

What incomplete work looks like:

❌ Stop after exploring 7 pages when 21 exist
❌ Claim "done" when only happy paths tested (edge_cases untested)
❌ Say "all tests passing" when you haven't calculated pass rates
❌ Generate tests before ALL features and critical paths are enumerated
❌ Report "all features tested" when critical scenarios are untested

What complete work looks like:

✅ Explore EVERY page discovered
✅ Enumerate ALL features before generating ANY tests
✅ Identify ALL critical user paths during feature enumeration
✅ Test priority:critical scenarios FIRST within each feature
✅ Test EVERY scenario in EVERY feature
✅ Validate EVERY test with /fix-tests
✅ Report exact numbers (pages, features, scenarios, tests, pass rates)
✅ Document ALL bugs in feature.bugs[]

Prerequisites

Before starting, load the testing standards and workflows. These define test quality guardrails, tag schemas, and debugging approaches.

Call these first:

how_to({ type: "full_test_automation" })
how_to({ type: "test_quality_guardrails" })
how_to({ type: "tag_schema" })
how_to({ type: "interactive_debugging" })

Artifact Types

Persona - User type with credentials for testing
Feature - Business capability with Given/When/Then scenarios
ProjectOverview - Project summary linking personas and features
Page - Page with screenshot, elements, and linked features

Workflow Overview

Phase -1: Introduction & Planning (First Time Only)

When user runs /helpmetest, start here:

Check context first — find existing ProjectOverview, Personas, and Features before doing any work.

2.5. Read conversation and code context — gather signals to personalize your proposal before showing the user a menu.

a) Scan the conversation history for: - URLs mentioned → candidate site or page to test - Error messages / stack traces → regression scenario (the bug was just fixed, lock it in) - "I deployed", "I released", "I pushed" → smoke test the deployment - Feature descriptions or user stories → acceptance tests for the new flow - Bug discussions or issue references → regression prevention tests

b) Check uncommitted code changes: bash git status --short git diff --stat HEAD Map changed file paths to feature domains: - auth/, login/, session/ → auth / login feature - checkout/, cart/, order/ → checkout feature - api/, routes/, server/ → API / backend feature - components/, pages/, src/ → UI feature (narrow by filename) - Any changed files → search for an existing Feature artifact with a matching feature:X tag

c) Synthesize ONE recommendation — the single most relevant thing to do right now: - Bug was discussed → regression test to lock in the fix - Files were changed → tests for the changed components - Deployment just happened → smoke tests to verify it landed - New feature was described → acceptance tests for the new flow - URL was mentioned → targeted test of that URL/page - No signals → fall back to the generic menu without a recommendation

Present the process to the user in your own words:

# QA Testing Process

I will comprehensively test your application by:

**Phase 1: Deep Discovery**
- Explore EVERY page on your site (authenticated and unauthenticated)
- Review interactable elements (buttons, links, forms) in each response
- Keep exploring until no new pages found for 3 rounds
- Result: Complete map of all pages and interactable elements

**Phase 2: Feature Enumeration**
- Identify EVERY capability on EVERY page
- For each feature, create comprehensive scenarios:
  - Functional scenarios (happy paths - all ways it should work)
  - Edge cases (error scenarios - empty inputs, invalid data, wrong permissions)
  - Non-functional (performance, security if critical)
- Result: Feature artifacts with 10+ scenarios each

**Phase 3: Comprehensive Testing**
- Test EVERY scenario in EVERY feature (one feature at a time)
- For each scenario:
  - Test interactively first to understand behavior
  - Create test for expected behavior (not just current)
  - Validate with /fix-tests (reject bullshit tests)
  - Run test and document results
  - If fails: determine bug vs test issue, document in feature.bugs[]
- Result: All scenarios tested, bugs documented

**Phase 4: Reporting**
- Honest metrics with exact numbers:
  - X pages explored (must be 100%)
  - Y features tested
  - Z scenarios covered
  - A tests passing (X%), B flaky (Y%), C broken (Z%)
- All bugs documented with severity
- User journey completion status

Explain what you need from user:

What I need from you:
- URL to test (or say "continue" if resuming previous work)
- Let me work autonomously (I'll report progress continuously)
- I'll ask questions if I find ambiguous behavior

Offer menu of options — lead with your recommendation if you found context signals:

If you identified a recommendation in step 2.5, open with it before the menu:

Based on our conversation, I can see you were [what you observed — e.g., "fixing a bug in the auth flow" / "deploying a new release" / "working on the checkout page"].

→ Recommended: [specific action, e.g., "Write a regression test for the auth bug so it can't come back"]
  [One sentence explaining why this is the right move now]

Then present the full menu:

What would you like to do?

* (Recommended) [Context-specific option if signals found]

1. 🚀 Full QA run
   → Test <URL> comprehensively (discovery + features + tests + report)

2. 🔍 Discovery only
   → Explore site and enumerate features (no tests yet)

3. ✅ Validate test quality
   → Review existing tests for quality issues

4. 🔌 API testing
   → Write and run tests against REST endpoints (auth automatic via browser session)

5. 📋 Test strategy
   → Map what needs to be tested for a feature before writing anything

6. ▶️ Continue previous work
   → Resume from where we left off

Please provide:
- Option number OR
- URL to test (assumes option 1) OR
- "continue" (assumes option 6)

Signal → option mapping (use when context signals are found):

Changed api/, routes/, server/ files → suggest option 4 (API testing)
User described a new feature → suggest option 5 (test strategy) before writing tests
URL provided directly → skip menu, go to option 1

Wait for user response before proceeding to Phase 0

If user provides URL directly, skip introduction and go straight to Phase 0.

Phase 0: Context Discovery

Check for existing work before asking the user for input. This prevents redundant questions and lets you resume where you left off.

Call how_to({ type: "context_discovery" }) to see what's already been done.

If user says "continue"/"same as before" → infer URL from existing ProjectOverview artifact.

Phase 1: Deep Discovery

GOAL: Find ALL pages, buttons, and interactable elements on the site.

Read: references/phases/phase-1-discovery.md for complete instructions.

Summary:

Navigate to URL
Identify industry and business model
Explore unauthenticated pages exhaustively
Set up authentication (call how_to({ type: "authentication_state_management" })) - this must complete before testing authenticated features
Create Persona artifacts
Explore authenticated pages exhaustively
Create ProjectOverview artifact

Exit Criteria:

✅ No new pages discovered in last 3 exploration rounds
✅ ALL discovered pages explored (100%)
✅ Both unauthenticated AND authenticated sections explored

Phase 2: Comprehensive Feature Enumeration

GOAL: Create Feature artifacts with ALL test scenarios enumerated through interactive exploration.

Read: references/phases/phase-2-enumeration.md for complete instructions.

Summary:

FIRST: Identify complete end-to-end user flows (critical features)
For each page, identify capabilities
For each capability:
- Create Feature artifact skeleton
- Explore interactively to discover ALL scenarios (functional, edge_cases, non_functional)
- Update Feature artifact with discovered scenarios
Each Feature should have 10+ scenarios

Exit Criteria:

✅ Core transaction features identified
✅ ALL pages analyzed for capabilities
✅ ALL features explored interactively
✅ ALL scenarios enumerated
✅ NO tests generated yet

Phase 2.5: Coverage Analysis

GOAL: Identify missing features that prevent core user journeys.

Read: references/phases/phase-2.5-coverage-analysis.md for complete instructions.

Summary:

Identify the core transaction ("What does a user come here to DO?")
Trace the full path from start to completion
Check each step - found or missing?
Update ProjectOverview with missing features

Phase 3: Test Generation for ALL Enumerated Scenarios

GOAL: Generate tests for EVERY scenario. Priority:critical first.

Read: references/phases/phase-3-test-generation.md for complete instructions.

Summary:

For each feature (one at a time):
- Sort scenarios by priority (critical first)
- For each scenario:
  - Create test (5+ steps, outcome verification)
  - Validate with /fix-tests (reject bullshit tests)
  - Link test to scenario
  - Run test
  - If fails: debug interactively, determine bug vs test issue
- Validate critical coverage (ALL priority:critical scenarios must have test_ids)
- Update feature status
Move to next feature

Exit Criteria:

✅ Tests for ALL scenarios (100% coverage)
✅ ALL priority:critical scenarios have test_ids
✅ ALL tests validated by /fix-tests
✅ ALL tests executed

Phase 4: Bug Reporting

Read: references/phases/phase-4-bug-reporting.md for complete instructions.

Summary:

Test passes → Mark feature as "working"
Test fails → Determine root cause:
- Bug → Document in feature.bugs[], keep test as specification
- Test issue → Fix test, re-run

Philosophy: Failing tests are specifications that guide fixes!

Phase 5: Comprehensive Report

Read: references/phases/phase-5-reporting.md for complete instructions.

Summary:

Update ProjectOverview.features with status
Calculate ALL metrics (pages, features, scenarios, tests, bugs)
Generate summary report with exact numbers

Standards

All detailed standards are in references/standards/:

Tag Schema: Read references/standards/tag-schema.md
- All tags use category:value format
- Tests need: type:X, priority:X
- Scenarios need: priority:X
Test Naming: Read references/standards/test-naming.md
- Format: <Actor> can <action> OR <Feature> <behavior>
- NO project/site names in test names
Critical Rules: Read references/standards/critical-rules.md
- Authentication FIRST (always)
- BDD/Test-First approach
- Failing tests are valuable
- NO bullshit tests
Definition of Done: Read references/standards/definition-of-done.md
- Complete checklist with ALL numbers required
- Provide these numbers before claiming "done" - vague reports hide coverage gaps

Version: 0.1

Related skills

More from help-me-test/skills

Installs

Repository

help-me-test/skills

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubPass

SocketWarn

helpmetest

🔴 YOU WRITE THE TEST FIRST.

Narrate Your Actions

QA Agency Orchestrator

STEP 1 — Orient

Agent Behavior Rules

Prerequisites

Artifact Types

Workflow Overview

Phase -1: Introduction & Planning (First Time Only)

Phase 0: Context Discovery

Phase 1: Deep Discovery

Phase 2: Comprehensive Feature Enumeration

Phase 2.5: Coverage Analysis

Phase 3: Test Generation for ALL Enumerated Scenarios

Phase 4: Bug Reporting

Phase 5: Comprehensive Report

Standards

More from help-me-test/skills

tdd

helpmetest-self-heal

helpmetest-debugger

helpmetest-test-generator

helpmetest-visual-check

helpmetest-discover