aiden-test-feature
Task: Test Feature & Generate Demo Report
Analyze the current branch's changes, scope the coverage level with the user, build an explicit testing strategy, start the app's dev server, test the feature in a real browser using agent-browser, capture screenshots and video (always — no exceptions), upload everything to S3, and generate a structured test report (always — never skip).
Phase 0: Environment Setup
1. Get artifact context IDs
Use the active task context when it is present in the prompt. Otherwise:
- The task ID is available from the
AIDEN_TASK_IDenvironment variable. - The conversation ID is available from the
AIDEN_SESSION_IDenvironment variable. - Resolve
teamIdwith Aiden MCP context/tools before creating the report.
When calling create_test_report, always pass taskId, conversationId, and teamId explicitly. Do not substitute one ID for another.
2. Verify agent-browser is available
agent-browser is a CLI tool by Vercel Labs (npm: agent-browser) — it is NOT a skill or MCP tool. It provides headless browser automation via bash commands (open, click, fill, screenshot, record). In sandboxes it is pre-installed by runtime-bootstrap.ts.
command -v agent-browser >/dev/null 2>&1 && echo "agent-browser: OK" || echo "agent-browser: MISSING"
If missing, install it:
npm install -g agent-browser && agent-browser install
If installation fails (e.g. no network, no npm), STOP and tell the user:
"agent-browser is not available and could not be installed. Install it manually:
npm install -g agent-browser && agent-browser install"
Do NOT proceed to Phase 2 without a working agent-browser — all browser testing depends on it.
3. Create workspace
All captures (screenshots, videos) MUST be written to /tmp/aiden-captures/.
mkdir -p /tmp/aiden-captures
CRITICAL — file path rules:
- ONLY write capture files to
/tmp/aiden-captures/. Never anywhere else.- NEVER write to
.claude/, project directories,reports/, or any path inside the repo..claude/is a sensitive system directory — writing to it will be blocked and will abort the test run.- If you are tempted to create a
reports/orscreenshots/folder anywhere other than/tmp/, stop and use/tmp/aiden-captures/instead.
Phase 0.5: Coverage Scoping & Context Gathering (MANDATORY — do before anything else)
Never skip this phase. You must understand what the user wants before writing a single test step.
1. Ask about coverage level
Use AskUserQuestion to ask the user:
What level of test coverage do you want for this feature?
- light — Happy path only. Quick smoke test to verify the main flow works.
- standard — Happy path + key edge cases + basic error states. (Default)
- comprehensive — Full coverage: happy path, all edge cases, error states, non-happy paths, boundary values, accessibility, responsiveness.
Also: are there any specific flows, known bugs, or risky areas you want me to focus on?
Wait for the response before continuing.
2. Ask targeted follow-up questions
Based on the feature (which you may not know yet — do a quick git diff --stat first to get a hint), use AskUserQuestion to ask 1–3 targeted follow-up questions. Examples:
- "Is there an authenticated state I should test? If yes, what test credentials should I use?"
- "Are there any known edge cases or previous bugs related to this feature?"
- "What's the definition of 'working correctly' for this feature — what should I see?"
- "Are there any specific non-happy paths you're concerned about (e.g. invalid input, network errors, empty states)?"
- "Any pages or user roles I should specifically include or exclude?"
Do not skip questions if context is unclear. Ask. A well-scoped test is 10× more valuable than a blind one.
3. Confirm the plan
After gathering answers, summarize back to the user (no tool call needed — just a short message):
- Coverage level chosen
- Specific areas / flows to focus on
- Any known risks or edge cases to target
- Rough count of test scenarios expected
Only proceed to Phase 1 after this confirmation.
Phase 1: Discover What to Test
1. Analyze the branch
Run these commands to understand what changed:
git log main..HEAD --oneline 2>/dev/null || git log HEAD~5..HEAD --oneline
git diff main...HEAD --stat 2>/dev/null || git diff HEAD~1 --stat
Read the actual diff to understand the feature or bug fix. Identify:
- What part of the app is affected (which pages, components, API routes)
- What the expected behavior change is
- What URL path to navigate to for testing
2. Discover and start the dev server
Look at the project to figure out how to run it:
- Read
package.json— checkscripts.dev,scripts.start,scripts.serve - Check for
docker-compose.yml/docker-compose.yaml/compose.yml - Check for
Makefile(look fordevorservetargets) - Check for
Procfile,.env,Pipfile,requirements.txt,Gemfile - Check for framework-specific files:
next.config.*,vite.config.*,nuxt.config.*,angular.json,manage.py,config/routes.rb
Start the dev server in the background. Common patterns:
# Node.js
npm run dev &
# or: pnpm dev &, yarn dev &, npx next dev &, npx vite &
# Python
python manage.py runserver &
# or: flask run &, uvicorn main:app &
# Ruby
bundle exec rails server &
# Docker
docker compose up -d
3. Wait for the server
Poll until the server is responding:
# Replace PORT with the discovered port
for i in $(seq 1 30); do
curl -sf http://localhost:PORT >/dev/null 2>&1 && break
sleep 2
done
Check common ports if unclear: 3000, 5173, 8080, 4200, 8000, 4000, 3001, 8888.
4. Determine the app URL
- Parse dev server stdout/stderr for "Local:" or "ready on" messages with URLs
- Check
.envor.env.localforPORTorVITE_PORTor similar - Test common ports with
curl -sf http://localhost:PORT >/dev/null - If you cannot determine it, ask the user via
AskUserQuestion
5. Summarize before proceeding
Tell the user:
- What you found in the diff (feature/fix summary)
- What URL you will test
- What test scenarios you plan to cover
Phase 1.5: Build Testing Strategy & Create Todos (MANDATORY)
Before opening the browser, you must have an explicit plan. Do not improvise test steps on the fly.
1. Draft the testing strategy
Based on:
- The coverage level the user chose in Phase 0.5
- The diff analysis from Phase 1
- The context gathered from the user
Write out the full test plan as a structured list. For each scenario, note:
- Scenario name — short label (e.g. "Happy path: create item")
- Type — happy path / non-happy path / edge case / error state / visual / navigation
- Steps — what to do
- Expected outcome — what "pass" looks like
- Screenshot needed — yes/no
Coverage requirements by level:
- light: happy path only (1–3 scenarios)
- standard: happy path + 2–4 non-happy paths + 1–2 error states
- comprehensive: happy path + all non-happy paths + all error states + boundary values + empty states + responsiveness + navigation
Non-happy path examples to consider:
- Invalid / missing required inputs
- Submitting with no data / empty state
- Duplicate entries (if applicable)
- Permission denied / unauthorized access
- Network error / API failure simulation (if testable)
- Rapid repeated actions (double-click, spam submit)
- Long input strings / special characters
- Back button / browser navigation mid-flow
2. Create todos for each scenario
Use TodoWrite to create a todo entry for each test scenario so progress is tracked. Each todo should be the scenario name.
3. Present the strategy
Output the full testing strategy as a numbered list before proceeding. The user should be able to see exactly what will be tested before Phase 2 starts.
Phase 2: Test with agent-browser
agent-browseris a CLI tool — invoke it via bash, not as an MCP tool or skill. Docs: https://github.com/vercel-labs/agent-browser Key commands:open,snapshot,click,fill,screenshot,record,wait,find,close
1. Open the app
agent-browser --session test-feature open http://localhost:PORT
agent-browser --session test-feature wait --load networkidle
2. Start video recording — MANDATORY
Always start a video recording before any interaction. No exceptions.
agent-browser --session test-feature record start /tmp/aiden-captures/happy-path.webm
If video recording fails for technical reasons, note the failure but continue — screenshots are still required.
3. Take initial screenshot — MANDATORY
Always capture the initial page state before any interaction.
agent-browser --session test-feature screenshot /tmp/aiden-captures/step-00-initial-state.png
4. Execute each scenario from the testing strategy
Work through every scenario defined in Phase 1.5. For each scenario:
- Mark the corresponding todo as in-progress
- Use
agent-browser snapshot -ito discover interactive elements - Use
agent-browser click @eN,agent-browser fill @eN "text", etc. to interact - Use
agent-browser wait --load networkidleoragent-browser wait 1500between actions - Take a screenshot after every significant state change — never go more than 2 meaningful actions without a screenshot:
Name screenshots descriptively:agent-browser --session test-feature screenshot /tmp/aiden-captures/step-NN-description.pngstep-02-form-filled.png,step-03-submit-clicked.png,step-04-success-state.png - Re-snapshot after navigation or DOM changes (refs go stale)
- If elements are hard to find by ref, use semantic locators:
agent-browser --session test-feature find text "Submit" click agent-browser --session test-feature find role button click --name "Save" - Mark each todo as complete or failed based on outcome
Mandatory coverage checklist (execute ALL that apply to the coverage level chosen):
Happy path (always required):
- Main feature flow works end-to-end as expected
- Success state / confirmation is visible
- Data is persisted / reflected correctly after action
Non-happy paths (required for standard + comprehensive):
- Empty / missing required inputs — form validation fires, error messages shown
- Invalid input values — correct rejection, no crash
- Boundary values — minimum and maximum accepted values
- Duplicate / conflicting data (if applicable)
- Unauthorized / permission-denied state (if applicable)
- Empty list / zero-state view (if applicable)
- Rapid repeated actions (double-click submit, spam button)
- Long strings / special characters in text inputs
Error states (required for standard + comprehensive):
- API / network failure behavior (if simulatable)
- Partial failure — what happens if only part of the action succeeds
- Graceful degradation — app does not crash, user sees a meaningful message
Visual & navigation (required for comprehensive):
- Layout is correct, no overflow or broken UI
- Responsive on narrower viewport (resize if possible)
- Links and navigation work; back button does not break state
- Loading states / spinners shown while async work is in progress
5. Stop recording and clean up
agent-browser --session test-feature record stop
agent-browser --session test-feature screenshot /tmp/aiden-captures/final-state.png
agent-browser --session test-feature close
Keep the recording under 2 minutes. If the feature requires more exploration, split into multiple recordings (e.g. happy-path.webm, error-states.webm).
6. Fix WebM duration metadata (MANDATORY if ffmpeg is available)
WebM files recorded by agent-browser often have duration = Infinity in the container header — the video player then shows 0:00. Fix every .webm file by remuxing it through ffmpeg, which reads the entire file, computes the real duration, and writes it into the output header:
for f in /tmp/aiden-captures/*.webm; do
if command -v ffmpeg >/dev/null 2>&1; then
ffmpeg -y -i "$f" -c copy "${f%.webm}-fixed.webm" 2>/dev/null \
&& mv "${f%.webm}-fixed.webm" "$f" \
|| echo "ffmpeg remux failed for $f — uploading as-is"
fi
done
If ffmpeg is not available, skip this step and continue — the video will still play, it just won't show the correct duration in the player.
Phase 3: Upload Captures to S3
For each captured file, use the mcp__aiden__get_upload_url MCP tool to get a presigned S3 URL, then curl PUT the file directly to S3.
Per-file upload flow
-
Get the file size (needed by the MCP tool):
SIZE=$(stat -c%s "/tmp/aiden-captures/step-01.png" 2>/dev/null || stat -f%z "/tmp/aiden-captures/step-01.png") -
Call the MCP tool to get a presigned upload URL:
Tool: mcp__aiden__get_upload_url Parameters: { "taskId": "<active task ID or value of AIDEN_TASK_ID>", "filename": "step-01.png", "mimeType": "image/png", "size": <SIZE from step 1> }Returns:
{ "uploadUrl": "https://...", "s3Key": "sandbox-captures/..." } -
Upload the file to S3 using the presigned URL:
curl -sf -X PUT "<uploadUrl>" \ -H "Content-Type: image/png" \ --data-binary @/tmp/aiden-captures/step-01.png -
Save the
s3Key— you will pass it tocreate_test_reportin Phase 4.
Repeat for each screenshot and video file. Common MIME types:
- Screenshots:
image/png - Video recordings:
video/webm
If uploads fail: Continue to Phase 4 anyway — omit the screenshotUrl, videoUrl, and screenshotUrls fields. The structured report is still valuable without media.
Phase 4: Create Structured Test Report (MANDATORY — NEVER SKIP)
CRITICAL: You MUST call mcp__aiden__create_test_report to complete this skill. This is non-negotiable.
- Do NOT output the report as markdown text
- Do NOT summarize findings in chat only
- Do NOT skip this phase for any reason — not because of time, not because uploads failed, not because testing was partial
- The report MUST be persisted via this MCP tool call so the UI renders it as an interactive artifact
- Skipping this step means the entire test session is wasted and untracked
If uploads failed in Phase 3, create the report anyway — omit the media URLs but include all steps, issues, and summary text.
Call the mcp__aiden__create_test_report MCP tool with structured data from your testing.
Gather all the data from the previous phases and call the tool:
Tool: mcp__aiden__create_test_report
Parameters: {
"title": "<short description of what was tested>",
"taskId": "<active task ID or value of AIDEN_TASK_ID, if set>",
"conversationId": "<active conversation ID or value of AIDEN_SESSION_ID>",
"teamId": "<resolved team ID>",
"branch": "<current branch name from git>",
"baseBranch": "main",
"commits": [
{ "sha": "<commit sha>", "message": "<commit message>" }
],
"changedFiles": [
{ "path": "src/components/Feature.tsx", "description": "Added new feature component" }
],
"appUrl": "<the URL you tested>",
"summary": "<1-2 paragraph summary of what was tested and the outcome>",
"status": "pass | fail | mixed",
"steps": [
{
"name": "Navigate to feature page",
"status": "pass | fail | skip",
"screenshotUrl": "<s3Key from Phase 3 — e.g. sandbox-captures/org/task/step-01.png — omit if upload failed>",
"notes": "Page loaded correctly"
}
],
"issues": [
{
"title": "Button misaligned on mobile",
"severity": "critical | high | medium | low",
"description": "The submit button overflows on viewports < 375px",
"screenshotUrl": "<s3Key from Phase 3 showing the issue — omit if upload failed>"
}
],
"videoUrl": "<s3Key from Phase 3 for the happy-path recording — omit if upload failed>",
"screenshotUrls": ["<s3Key 1 from Phase 3>", "<s3Key 2 from Phase 3>"]
}
Field guidelines
- status: "pass" if all steps passed, "fail" if any critical step failed, "mixed" if some passed and some failed
- steps: one entry per distinct test action (navigate, click, fill, verify). Include the screenshot URL for that step if you took one.
- issues: only include actual problems found. Each issue should have a severity and clear description.
- screenshotUrls: flat list of ALL screenshot s3Keys from Phase 3 (for gallery display). Use the
s3Keyfield returned byget_upload_url, not theuploadUrl. Omit if no uploads succeeded. - commits and changedFiles: from git analysis in Phase 1
Output
After calling mcp__aiden__create_test_report, tell the user:
- The test report title, so the user can find the generated artifact in the Aiden UI
- A link to the artifact when you have enough context to form one (usually
/teams/<teamId>/docs?artifactId=<artifactId>) - A brief summary: what was tested, how many steps passed/failed, any issues found
- List any bugs or concerns discovered during testing
Do not lead with a raw UUID. Only include the artifact/report ID as secondary debug context if no usable title or link is available.
Do NOT render the full report as markdown. The MCP tool call creates the report as a structured artifact that the UI displays interactively.