fix-tests

Installation

SKILL.md

Who you are: If .helpmetest/SOUL.md exists, read it — it defines your character.

No MCP? Use helpmetest <command> CLI instead. See README for CLI reference.

🔴 YOU WRITE THE TEST FIRST.

Changed code → run the tests. New feature → write the test before the code. The test is the spec. The test is done when it's green. No test = not done.

Narrate Your Actions

Never create a test, artifact, or run a test silently. Always tell the user:

Before: what you are about to do and why (what scenario it covers, what risk it guards against)
After: what happened — result, what the artifact contains, why a test failed
Next: what you will do next and what decision point is coming

Silence means the user has no idea what you did or why.

Fix Tests

One skill for everything wrong with your test suite. Reads the situation, picks the right mode.

Prerequisites — Always Do This First

helpmetest_status()
helpmetest_search_artifacts({ query: "" })
helpmetest_search_artifacts({ query: "Memory" })
how_to({ type: "context_discovery" })
how_to({ type: "interactive_debugging" })
how_to({ type: "debugging_self_healing" })

Check git state:

git log --oneline -10
git diff --stat HEAD

Read the Situation → Pick the Mode

After orient, classify:

Signal	Mode
"Something broke" / "it stopped working" / vague signal	Triage first (see below)
One specific test named by user, or one test failing	Debug
Multiple tests failing after a deploy or UI change	Heal
Tests passing but code changed — drift suspected	Sync
"Is this test any good?" / reviewing test quality	Validate
Mixed (failures + drift + quality issues)	All modes, in order

Triage (when you don't know what's wrong)

Gather fast, diagnose specifically, then switch to the right mode.

Collect everything in parallel:

helpmetest_status()             // failing tests, health checks
git log --oneline -10           // recent commits
git diff --stat HEAD            // uncommitted changes

Map what you find to a root cause:

Test issue — test fails but feature works. Selector changed, timing off, stale after refactor → Debug or Heal mode
App bug — feature itself is broken. 500 errors, missing data, broken flow → document in Feature.bugs[], tell user
Regression — worked before a specific commit. Identify the commit, scope blast radius → Debug mode + recommend rollback or hotfix
Environment — auth state expired, proxy down, env var missing → fix setup, re-run auth test
Coverage gap — "it's broken" but no test exists → create Feature artifact, run /tdd

State the diagnosis once before acting: "Based on [evidence], the problem is [specific cause]. The fix is [action]." Then switch to the right mode.

Mode: Debug — One Test, Root Cause

Golden Rule: Always reproduce interactively before fixing. Never guess.

Tasks Artifact

Create before starting:

{
  "type": "Tasks",
  "name": "Tasks: Debug [test name]",
  "content": {
    "overview": "Debug failing test [test-id]. Root cause → fix or document bug.",
    "tasks": [
      { "id": "1.0", "title": "Understand the failure", "status": "pending", "priority": "critical" },
      { "id": "2.0", "title": "Reproduce interactively", "status": "pending", "priority": "critical" },
      { "id": "3.0", "title": "Determine root cause", "status": "pending", "priority": "critical" },
      { "id": "4.0", "title": "Fix test OR document bug", "status": "pending", "priority": "critical" }
    ]
  }
}

Phase 1: Understand

helpmetest_open_test + helpmetest_status({ id, testRunLimit: 10 })
Read the error. Classify: selector? timing? assertion? state? API?
Check recent git changes — map changed files to likely failure causes
Load the Feature artifact the test belongs to

Phase 2: Reproduce Interactively

Run steps one at a time via helpmetest_run_interactive_command:

As  <auth_state>
Go To  <url>
# → observe after each step

Stop at the failing step. Investigate based on error type:

Element not found: Try alternate selectors — is element gone (bug) or selector changed (test issue)?
Not interactable: Check visibility, scroll, multiple matches, disabled state
Assertion failed: What's actually displayed? Behavior changed intentionally?
Timeout: App slow or broken?

Phase 3: Root Cause

Selector changed → fix selector
Timing → add wait
State/auth → verify auth state restoration
API error → document bug
Test isolation (alternating PASS/FAIL, shared state) → make idempotent

Phase 4A: Fix Test

Validate fix interactively first — run the complete corrected flow
Update via helpmetest_upsert_test
Run via helpmetest_run_test to confirm
Update Feature artifact

Phase 4B: Document Bug

Add to Feature.bugs[]:

{
  "name": "Brief description",
  "given": "Precondition",
  "when": "Action taken",
  "then": "Expected outcome",
  "actual": "What actually happens",
  "severity": "blocker|critical|major|minor",
  "url": "http://example.com/page",
  "tags": []
}

Update Feature.status → "broken" or "partial".

Mode: Heal — Bulk Failures After Deploy

Don't fix blindly — classify first, then fix fast.

Tasks Artifact

{
  "type": "Tasks",
  "name": "Tasks: Heal Session [date]",
  "content": {
    "overview": "Healing [N] failing tests.",
    "tasks": [
      { "id": "1.0", "title": "[test-id]: [test name]", "status": "pending", "priority": "critical",
        "notes": "[error summary from last run]" }
    ],
    "notes": ["SelfHealing artifact: self-healing-log"]
  }
}

Startup: Fix All Existing Failures

Get all failing tests from helpmetest_status
For each failing test:
- Classify failure type
- Fixable (selector change, timing, form structure): investigate → fix → verify → document in SelfHealing artifact
- Not fixable (auth broken, 500 errors, missing pages): document as bug in Feature artifact
After processing all failures, enter monitoring mode

Fixable vs Not:

Fixable: selector changed, timing issue, form added/removed, button moved, test isolation
Not fixable: auth broken, server errors, missing features, API endpoints removed

Monitoring Mode

listen_to_events({ type: "test_run_completed" })

When a test fails: classify → fix if fixable → document if not → resume listening.

SelfHealing Artifact

{
  "type": "SelfHealing",
  "id": "self-healing-log",
  "name": "SelfHealing: Test Maintenance Log",
  "content": {
    "fixed": [
      { "test_id": "test-login", "pattern_detected": "selector_change",
        "fix_applied": "Updated selector to [data-testid='submit-btn']",
        "verification_result": "Test passed on re-run", "timestamp": "..." }
    ],
    "not_fixed": [
      { "test_id": "test-checkout", "issue_type": "server_error",
        "error_message": "500 on POST /api/checkout",
        "why_not_fixable": "Application bug, not a test issue",
        "recommendation": "Investigate checkout API endpoint" }
    ],
    "summary": { "total_processed": 5, "fixed": 3, "not_fixable": 2, "last_run": "..." }
  }
}

Mode: Sync — Drift Audit After Refactor

Tests may be passing but wrong — stale assertions, removed features, changed behavior.

Discrepancy Types

Failure-based:

Code Broke It — test was passing, code change caused regression → fix code
Test Is Stale — code intentionally changed, test hasn't caught up → fix test
Not Deployed — fix in local code, not shipped yet → tag pending-deploy
Removed Feature — test exercises what no longer exists → delete test

Passing but suspicious: 5. False Positive — passes but assertions too weak to verify anything 6. Flaky — passes sometimes, fails sometimes with no code change 7. Duplicate Coverage — two tests cover the exact same scenario

Coverage gaps: 8. Missing Test — feature exists, no test coverage 9. Scenario Gap — Feature artifact has scenario but test_ids is empty 10. Scenario Drift — tests and code agree but Feature artifact documents old behavior 11. Selector / Schema Drift — test's selectors or API shape no longer matches code

Workflow

Run all tests: helpmetest_status → get IDs → run each
For each test + each Feature artifact, check for discrepancy types above
Record: type, test, Feature, what test expects vs what code does, git evidence

Sync Report (present before resolving)

🔄 Sync Report · <project> · <date>
<N> failing · <N> flaky · <N> gaps · <N> passing

💥 Failures
   Code Broke It · <N> tests
   <test name>
   issue: <one line>

🕳 Gaps
   Missing Test · <N>
   <feature name> — <what it does>

Wait for user to confirm, then resolve one by one.

Resolution Options (per discrepancy)

#3 of 12 · TEST IS STALE
📋 <test name>
   expects   <what test asserts>
   code now  <what code does> · <file> · <commit>

   1 · Fix the test    [code leads]
   2 · Fix the code    [test leads]
   3 · Skip
   4 · Delete test
   5 · Document bug
   6 · Not deployed

If user says "fix all selector drifts" — apply across the category without asking per item.

Mode: Validate — Test Quality Review

The core question: would this test fail if the feature broke? If not → reject.

The Business Value Test (MOST IMPORTANT)

"What business capability does this test verify?"
"If this test passes but the feature is broken, is that possible?"

If answer to #2 is YES → IMMEDIATE REJECTION

Anti-Patterns (Auto-Reject)

Only navigation + element counting
Click + wait for element that was already visible
Form field presence check without filling + submitting
Page load + title check only
UI element visible without verifying it works

Minimum Quality Requirements

≥ 5 meaningful steps
≥ 2 assertions (Get Text, Should Be, Wait For)
Verifies state change (before/after OR API response OR persistence)
Has [Documentation] with a PROTECTS: line naming the specific user complaint
Uses stable selectors
Tags: priority:? and feature:? required

Mutation Resistance Check

Mentally introduce a realistic bug (e.g. "save button onClick removed") and ask: does this test catch it? If not → score 7+.

Bullshit Score (1–10)

Score	Meaning
1–3	Solid — behavioral assertions, mutation-resistant
4–6	Mediocre — some value but weak
7–9	Mostly bullshit — navigation only, no real behavior
10	Pure bullshit — single Go To, Sleep with no assertion

Score ≤ 4 → PASS. Score ≥ 5 → REJECT

Output: Single Test

[score]/10 — ✅ PASS / ❌ REJECT
Test ID: [id]
Reason: [one sentence]
[What to fix if rejected]

Output: Batch

Table grouped by tier (Solid / Mediocre / Bullshit), then action menu:

Reply with numbers to act:

1. Delete [N] score-10 tests
2. Fix [N] misleading test names
3. Fix [N] vacuous assertions
4. Rewrite [N] mediocre tests
5. Investigate [N] failing tests
all — do everything

When user replies: execute without asking further. Delete score-10 tests immediately. For rewrites, show diff then call helpmetest_upsert_test.

Key Principles

Reproduce before fixing — never guess, always verify interactively
Code may not be deployed — check git diff HEAD before calling something broken
Tests and code are both sources of truth — neither wins automatically
Don't weaken assertions to make tests pass — fix the root cause
All findings go into Feature artifacts — a bug mentioned only in chat doesn't exist
Update Feature.status after any change: "working" | "broken" | "partial"

Version: 0.1

Related skills

More from help-me-test/skills

Installs

Repository

help-me-test/skills

First Seen

Apr 4, 2026

fix-tests

🔴 YOU WRITE THE TEST FIRST.

Narrate Your Actions

Fix Tests

Prerequisites — Always Do This First

Read the Situation → Pick the Mode

Triage (when you don't know what's wrong)

Mode: Debug — One Test, Root Cause

Tasks Artifact

Phase 1: Understand

Phase 2: Reproduce Interactively

Phase 3: Root Cause

Phase 4A: Fix Test

Phase 4B: Document Bug

Mode: Heal — Bulk Failures After Deploy

Tasks Artifact

Startup: Fix All Existing Failures

Monitoring Mode

SelfHealing Artifact

Mode: Sync — Drift Audit After Refactor

Discrepancy Types

Workflow

Sync Report (present before resolving)

Resolution Options (per discrepancy)

Mode: Validate — Test Quality Review

The Business Value Test (MOST IMPORTANT)

Anti-Patterns (Auto-Reject)

Minimum Quality Requirements

Mutation Resistance Check

Bullshit Score (1–10)

Output: Single Test

Output: Batch

Key Principles

More from help-me-test/skills

helpmetest-test-generator

helpmetest-discover

validate-tests

helpmetest-troubleshoot