analyze-test-failures
Analyze Test Failures
Analyze failing test cases with a balanced, investigative approach.
Context
When tests fail, there are two primary possibilities:
- False positive: The test itself is incorrect
- True positive: The test discovered a genuine bug
Assuming tests are wrong by default is a dangerous anti-pattern that defeats the purpose of testing.
Analysis Process
1. Initial Analysis
- Read the failing test carefully, understanding its intent
- Examine the test's assertions and expected behavior
- Review the error message and stack trace
2. Investigate the Implementation
- Check the actual implementation being tested
- Trace through the code path that leads to the failure
- Verify that implementation matches documented behavior
3. Apply Critical Thinking
For each failing test, ask:
- What behavior is the test trying to verify?
- Is this behavior clearly documented or implied by the API design?
- Does the current implementation actually provide this behavior?
- Could this be an edge case the implementation missed?
4. Make a Determination
Classify the failure as one of:
| Classification | Meaning |
|---|---|
| Test Bug | Test's expectations are incorrect |
| Implementation Bug | Code doesn't behave as it should |
| Ambiguous | Intended behavior is unclear |
5. Document Reasoning
Provide clear explanation including:
- Evidence supporting the conclusion
- Specific mismatch between expectation and reality
- Recommended fix (to test or implementation)
Example Analyses
Example 1: Ambiguous Behavior
Scenario: Test expects calculateDiscount(100, 0.2) to return 20, but it returns 80
Analysis:
- Test assumes function returns discount amount
- Implementation returns price after discount
- Function name is ambiguous
Determination: Ambiguous Recommendation: Check documentation or clarify intended behavior
Example 2: Implementation Bug
Scenario: Test expects validateEmail("user@example.com") to return true, but it returns false
Analysis:
- Test provides a valid email format
- Implementation regex is missing support for dots in domain
- Other valid emails also fail
Determination: Implementation Bug Recommendation: Fix the regex to properly validate email addresses per RFC standards
Example 3: Test Bug
Scenario: Test expects divide(10, 0) to return 0, but it throws an error
Analysis:
- Test assumes division by zero returns 0
- Implementation throws DivisionByZeroError
- Standard mathematical behavior is to treat as undefined/error
Determination: Test Bug Recommendation: Update test to expect an error, not 0
Output Format
For each failing test, provide:
Test: [test name/description]
Failure: [what failed and how]
Investigation:
- Test expects: [expected behavior]
- Implementation does: [actual behavior]
- Root cause: [why they differ]
Determination: [Test Bug | Implementation Bug | Ambiguous]
Recommendation:
[Specific fix to either test or implementation]
Key Principles
- NEVER automatically assume the test is wrong
- ALWAYS consider that the test might have found a real bug
- When uncertain, lean toward investigating the implementation
- Tests are often your specification - they define expected behavior
- A failing test is a gift - it's either catching a bug or clarifying requirements
Related Skills
- test-failure-mindset: Set investigative approach for session
- comprehensive-test-review: Full test suite review
More from jamie-bitflight/claude_skills
perl-lint
This skill should be used when the user asks to lint Perl code, run perlcritic, check Perl style, format Perl code, run perltidy, or mentions Perl Critic policies, code formatting, or style checking.
24brainstorming-skill
You MUST use this before any creative work - creating features, building components, adding functionality, modifying behavior, or when users request help with ideation, marketing, and strategic planning. Explores user intent, requirements, and design before implementation using 30+ research-validated prompt patterns.
11design-anti-patterns
Enforce anti-AI UI design rules based on the Uncodixfy methodology. Use when generating HTML, CSS, React, Vue, Svelte, or any frontend UI code. Prevents "Codex UI" — the generic AI aesthetic of soft gradients, floating panels, oversized rounded corners, glassmorphism, hero sections in dashboards, and decorative copy. Applies constraints from Linear/Raycast/Stripe/GitHub design philosophy: functional, honest, human-designed interfaces. Triggers on: UI generation, dashboard building, frontend component creation, CSS styling, landing page design, or any task producing visual interface code.
7python3-review
Comprehensive Python code review checking patterns, types, security, and performance. Use when reviewing Python code for quality issues, when auditing code before merge, or when assessing technical debt in a Python codebase.
7hooks-guide
Cross-platform hooks reference for AI coding assistants — Claude Code, GitHub Copilot, Cursor, Windsurf, Amp. Covers hook authoring in Node.js CJS and Python, per-platform event schemas, inline-agent hooks and MCP in agent frontmatter, common JSON I/O, exit codes, best practices, and a fetch script to refresh docs from official sources. Use when writing, reviewing, or debugging hooks for any AI assistant.
7agent-creator
Create high-quality Claude Code agents from scratch or by adapting existing agents as templates. Use when the user wants to create a new agent, modify agent configurations, build specialized subagents, or design agent architectures. Guides through requirements gathering, template selection, and agent file generation following Anthropic best practices (v2.1.63+).
6