tdd

Installation

SKILL.md

Who you are: If .helpmetest/SOUL.md exists, read it — it defines your character.

No MCP? Use helpmetest <command> instead of MCP tools.

🔴 YOU WRITE THE TEST FIRST.

Changed code → run the tests. New feature → write the test before the code. The test is the spec. The test is done when it's green. No test = not done.

Narrate Your Actions

Never create a test, artifact, or run a test silently. Always tell the user:

Before: what you are about to do and why (what scenario it covers, what risk it guards against)
After: what happened — result, what the artifact contains, why a test failed
Next: what you will do next and what decision point is coming

Silence means the user has no idea what you did or why.

Tests — Write, Generate, Fix

Orient First (Always)

Before doing anything, check what already exists:

helpmetest_status()
helpmetest_search_artifacts({ query: "" })
helpmetest_search_artifacts({ type: "Tasks" })

Tests already failing? → that's the priority, not creating new ones
Tasks artifact in progress? → resume it, don't start over
Feature artifacts exist? → use them, don't re-discover

Use Cases

"I need to build something" (TDD)

New feature, bug fix, or refactor. Tests come first — they define what "done" means.

1. Create a Tasks artifact to track the work:

{
  "id": "tasks-[feature-name]",
  "type": "Tasks",
  "content": {
    "overview": "What this implements and why",
    "tasks": [
      { "id": "1.0", "title": "Write all tests first", "status": "pending", "priority": "critical" },
      { "id": "2.0", "title": "Implement to make tests pass", "status": "pending" },
      { "id": "3.0", "title": "All green — review for gaps", "status": "pending" }
    ]
  }
}

2. Create a Feature artifact with all scenarios before writing a single test:

{
  "id": "feature-[name]",
  "type": "Feature",
  "content": {
    "goal": "What this feature does",
    "functional": [
      { "name": "User can do X", "given": "...", "when": "...", "then": "...", "tags": ["priority:critical"], "test_ids": [] }
    ],
    "edge_cases": [],
    "bugs": []
  }
}

3. Plan coverage before writing a single test. For every feature, enumerate scenarios across four types:

Type	Always?	Question to ask
Critical path	✅ always	What does a real user do when everything works?
Error paths	✅ always	What are the 3 most likely ways this breaks in prod?
Boundary conditions	if data/logic	Where does behavior change based on a threshold?
Edge cases	selectively	What would a QA engineer test that a dev wouldn't?

Mark each scenario: write immediately (critical/high) / write before launch (medium) / skip (cosmetic, already covered, unreliable). Don't test everything — test what would hurt if it broke.

4. Write ALL tests — happy paths, edge cases, errors — before implementing anything. Failing tests are your spec.

5. Implement incrementally — pick the highest-priority failing test, make it pass, move to the next.

6. Done when all tests are green and you've reviewed for missing edge cases.

"Write tests for an existing feature"

Feature exists (or was just built by someone else). Your job is tests only.

1. Read the Feature artifact — helpmetest_get_artifact({ id: "feature-X" }). If none exists, create one first based on what you know.

2. Explore interactively before writing — run the scenario step by step using helpmetest_run_interactive_command. A test written after seeing real behavior uses real selectors and reflects actual timing. A test written from a description is a guess.

As  <persona>
Go To  <url>
# Execute each Given/When/Then step, observe what actually happens

3. Before writing each test, answer this out loud:

"If this test passes but the feature is actually broken, what user complaint would we miss until a customer reports it?"

Write that answer as the PROTECTS: line in [Documentation]. This is the contract the test makes with the user — not optional boilerplate. If you can't answer it in one sentence, the scenario needs more thought, not a test.

4. Write tests for priority:critical scenarios first, then high, then medium. For each:

5+ meaningful steps
Verify business outcomes (data saved, state changed) — not just that an element is visible
Use Create Fake Email for any registration/email fields — never hardcode
[Documentation] must start with PROTECTS: <what user complaint this catches>

5. Validate each test with fix-tests before linking it to the scenario. A test that passes when the feature is broken must be rewritten — it is not done until the validator says PASS.

6. Link tests back — add each test ID to scenario.test_ids in the Feature artifact only after it passes validation.

7. Run and fix — see "Fix broken tests" below if a newly-written test fails.

"Fix broken tests" / "Tests are failing"

First: understand the failure pattern

Check recent code changes:

git diff --stat HEAD
git log --oneline -5

Map changed files to likely causes:

components/, pages/ → selector changes
auth/, session/ → auth state issues
api/, routes/ → backend errors or changed response shapes

Then get test history: helpmetest_status({ id: "test-id", testRunLimit: 10 })

Classify:

Consistent failure after a code change → selector/behavior changed
Intermittent PASS/FAIL with changing errors → isolation issue (shared state, test order dependency)
Timeout / element not visible → timing issue
Auth/session error → state not restored correctly
Backend error in test output → real bug, not a test issue

Reproduce interactively — always do this before fixing

Run the failing steps one at a time:

As  <persona>
Go To  <url>
# Execute each step, observe what actually happens at the point of failure

For "element not found": list all elements of that type, try alternate selectors. For "wrong value": check what's actually displayed vs what the test expected. For timeouts: try longer waits, check whether the element ever appears.

Decide: test issue or app bug?

Test issue (selector changed, timing, wrong expectation) → fix the test, validate the fix interactively before saving
App bug (feature is actually broken) → document in feature.bugs[], update Feature.status to "broken" or "partial"

Many tests broke after a UI change?

Work through them systematically one by one. For each:

Classify the failure (usually selector or timing)
Reproduce interactively
Fix
Run to confirm

Don't shotgun-fix by guessing — one wrong fix creates two broken tests.

Tests are out of date after a refactor?

Get test list: helpmetest_status()
For each failing test, check whether the Feature artifact scenario still matches intended behavior
If the code is the source of truth → update the test
If the test was right and the refactor broke behavior → document the regression

Writing Tests

Structure

As  <persona>          # auth state — always first
Go To  <url>

# Given — establish precondition
<steps>

# When — perform the action
<steps>

# Then — verify the outcome
<assertions>

# Persistence check (if relevant)
Reload
<re-assert that state survived>

Documentation format

Every test must have [Documentation] with four explicit lines:

[Documentation]
...    Given: <precondition — what state the system is in before the action>
...    When: <action — what the user/system does>
...    Then: <outcome — what is asserted, specifically>
...    Risk: <what silent failure this catches — user complaint if this test were deleted>

Full example:

[Documentation]
...    Given: registered user with valid account
...    When: submits login form with wrong password
...    Then: sees "Invalid email or password" error, remains on login page, is NOT authenticated
...    Risk: silent login failure — attacker gets in, or user is confused with no feedback

Given/When/Then — what makes them good:

✅ Given: registered user with valid account — specific precondition
✅ When: submits login form with wrong password — exact action
✅ Then: sees "Invalid email or password" error, remains on login page, is NOT authenticated — concrete, multiple assertions named
❌ Given: user is logged in | When: they do something | Then: it works — vague, tells you nothing when the test fails

Language rules — the description must be readable by a product manager:

✅ Write in terms of user actions and visible outcomes
❌ NEVER put CSS selectors, XPath, or DOM class names (.keyword-line.current, #submit-btn, div[data-id])
❌ NEVER put JavaScript internals, variable names, or debug APIs (replayDebug.currentKeyword, window.__state)
❌ NEVER put Robot Framework syntax, keyword names, or technical implementation details
The test body is where selectors live. The description is where the product manager lives.

Wrong:

Given: replay loaded with .banner-keywords visible
When: user clicks .banner-next and replayDebug.currentKeyword changes
Then: .keyword-line.current text matches window.replayDebug.currentKeyword.split(/ {2,}/)[0]

Right:

Given: a test replay is open and paused
When: user steps forward then backward through keywords using the navigation buttons
Then: the highlighted keyword in the banner matches the current playback position after each navigation

Risk — good examples:

✅ Risk: users completing checkout get charged without receiving an order confirmation
✅ Risk: users typing wrong passwords are silently logged in or shown a blank screen
✅ Risk: profile email changes silently fail — user sees stale email with no indication
❌ Risk: the login form breaks — too vague, what breaks? who notices?
❌ Risk: the form doesn't submit — that's what the test does, not what it protects against

Inline comments

Every non-obvious step must have a # comment above it written for a product manager, not an engineer.

Comments explain why a step exists, what the user is experiencing, or what the test is checking — not what the keyword does.

# User opens an existing test replay — this is the entry point for debugging failed tests
Go To  https://helpmetest.example.com/test/some-test

# The banner shows the current keyword being replayed — it must stay in sync with what's actually executing
Click  css=.banner-next

# After stepping forward, the highlighted keyword in the banner must change to match the new position
# If this fails, users see the wrong keyword highlighted while debugging — they investigate the wrong step
${label}=  Get Text  css=.keyword-line.current .keyword-text
Should Contain  ${label}  ${expected_keyword}

Comments are mandatory for:

Any Javascript call — explain what user-visible state it reads or changes
Any Hover or Sleep — explain why the UI requires this (hover to reveal hidden elements, sleep for animation)
Any multi-step assertion group — explain what the group collectively verifies
Any setup step that isn't obvious navigation

What makes a good test

✅ Verifies a business outcome — data saved, filter applied, order created ✅ Would FAIL if the feature is broken ✅ 5+ meaningful steps ✅ Checks state change, not just that a button exists

❌ Just navigates to a page and counts elements ❌ Clicks something without checking what happened ❌ Passes when the feature is broken

Test naming

Format: User can <action> or <Feature> <behavior>

✅ User can update profile email
✅ Cart total updates when quantity changes
❌ MyApp Login Test
❌ SiteName Checkout

Auth

Use Save As <StateName> once to capture auth state. Reuse with As <StateName> in every test — never re-authenticate inside tests.

Emails

Use Create Fake Email — never hardcode test@example.com. Hardcoded emails break on second run.

${email}=  Create Fake Email
Fill Text  input[name=email]  ${email}
${code}=   Get Email Verification Code  ${email}

Localhost

If testing a local server, set up the proxy first:

helpmetest_proxy({ action: "start", domain: "dev.local", sourcePort: 3000 })

Verify it works before writing any tests. See the proxy skill for details.

Done means

✅ All tests passing
✅ All priority:critical scenarios have test_ids
✅ Every test has Given:, When:, Then:, Risk: lines in [Documentation]
✅ Every test passed /fix-tests before being linked
✅ Bugs documented in feature.bugs[]
✅ Feature.status updated (working / broken / partial)
✅ Tasks artifact all done

Final summary format

Never end with "N tests created, M passing." End with this:

## What you can now trust works
- <user-facing statement> (test: <id>)
- <user-facing statement> (test: <id>)

## What's still unprotected
- <what could silently break with no test catching it>

## Bugs found
- <bug description> — documented in feature.bugs[]

If you can't write the first section in user-facing language, your tests are not done.

Related skills

More from help-me-test/skills

Installs

Repository

help-me-test/skills

First Seen

Mar 7, 2026

tdd

🔴 YOU WRITE THE TEST FIRST.