hard-fix
Hard Fix
Comprehensive investigation workflow for bugs that resist normal debugging.
Usage
/hard-fix login keeps failing after auth changes
/hard-fix race condition in checkout - tried 3 fixes already
/hard-fix # Uses recent conversation context
Do NOT shortcut this workflow:
- "I think I already know the fix" -- If you knew, you wouldn't need this skill
- "Let me just try one more thing first" -- You've already tried. Follow the systematic process
- "I only need one of these investigation methods" -- Parallel investigation is the point, run ALL agents
Circuit Breaker Rule: If 3 sequential fix attempts have failed for the same issue:
- STOP attempting more fixes
- Document what was tried and why each failed
- This signals a systemic/architectural issue, not a localized bug
- Recommend architectural review rather than continuing to patch
Gotchas
- Phase 0 doc search uses
2>/dev/nullondocs/log/— if the directory doesn't exist, the search silently returns nothing and gives false confidence there are no prior known issues. - Phase 7 logging requires user confirmation that the fix works. If the session ends before confirmation, no log is written and the institutional knowledge loop breaks.
Workflow
Phase 0: Pre-Check Internal Documentation
Before full investigation, check internal docs for known issues and gotchas:
Check past issues:
grep -ri "<keywords>" docs/log/ 2>/dev/null | head -10
ls -la docs/log/ 2>/dev/null | grep -i "<related_terms>"
Check project documentation:
grep -ri "<keywords>" docs/ *.md 2>/dev/null | head -10
Look for documented gotchas, known issues, or patterns related to the problem area.
If a matching past issue is found:
- Read the full log file
- Present the previous solution to the user
- Ask: "We encountered this before. Should I apply the previous solution, or run a fresh investigation?"
If relevant documentation is found:
- Read the relevant sections
- Check if documented patterns/gotchas apply to this issue
If no match or user wants fresh investigation, continue.
Phase 1: Gather Context
Ask clarifying questions if needed:
- What behavior are you seeing vs. expecting?
- What fixes have already been tried?
- When did this start happening? (recent change, always broken, etc.)
- Are there error messages or logs?
Keep questions minimal - only ask what's essential.
Phase 2: Parallel Investigation
Launch ALL of these simultaneously using the Task tool:
| Agent | Skill/Tool | Focus |
|---|---|---|
| Research | research-online |
External solutions, known issues, library bugs |
| Debug | debug-log |
Add logging to trace the actual execution path |
| History | review-history |
Git blame, recent changes, past issue logs |
| Library Source | Read library code | Undocumented behavior, actual implementation |
| Opinion 1 | second-opinion |
Fresh perspective on the problem |
For detailed agent prompt templates, see references/templates.md.
Phase 3: Synthesize Findings
Wait for all agents. Combine into a root cause theory with evidence.
The synthesis must trace to mechanism, not stop at symptoms — good synthesis names the root cause with corroborating evidence from multiple agents (debug timing, git history, library source, research). See references/templates.md for BAD/GOOD synthesis examples.
Phase 4: Validate Theory
Run second-opinion with your synthesis and proposed fix. Check for blind spots.
Phase 5: Present to User
For the presentation template, see references/templates.md.
Ask: "Should I proceed with the recommended fix?"
Phase 6: Implement and Verify
- Implement the recommended fix
- Keep all debug logging in place — do NOT remove debug logs added during investigation
- Ask user to test/verify
- Wait for user confirmation that the fix worked
- Only after user confirms the fix works, remove the debug logging added in Phase 2
Do NOT remove debug logs until user confirms the fix is working. If the fix fails, the logs are essential for the next investigation round. Do NOT proceed to Phase 7 until user confirms the fix is working.
Phase 7: Log for Future Reference (after user confirms fix)
Only after user confirms fix works, write to docs/log/YYYY-MM-DD-{Issue}.md.
For the log template, see references/templates.md.
Examples
Token refresh bug after multiple failed fix attempts:
/hard-fix auth token refresh fails silently after 3 fix attempts
Launches parallel agents: research finds a known axios issue with token queuing, debug-log traces the refresh timing, history reveals the bug started after PR #234 moved refresh to background, and library source confirms axios doesn't queue requests during refresh. Synthesizes a root cause (race condition) with a concrete fix using axios-auth-refresh.
Race condition traced to library update via git history:
/hard-fix race condition in order processing since last deploy
History agent pinpoints a dependency bump that changed default concurrency behavior. Debug logging captures the interleaved execution order, and library source inspection confirms the breaking change in the new version. Presents the version diff and a targeted fix to restore the previous behavior.
Troubleshooting
Parallel investigation agents do not converge on root cause
Solution: Run second-opinion with a consolidated summary of all agent findings and the conflicting theories. If agents still disagree, prioritize the evidence from the debug-log agent (actual runtime behavior) over static analysis, and test the most likely theory first.
Root cause is in a third-party dependency
Solution: Pin the dependency to the last known working version as an immediate fix, then file an issue upstream with reproduction steps. If a patch is needed sooner, fork the dependency or apply a local patch using patch-package (JS) or the equivalent for your ecosystem.
Notes
- This is a heavyweight process - use for genuinely stuck problems
- The parallel investigation is key - each source provides different insights
- Only log confirmed fixes - don't log until user verifies it works
- Past issue logs are gold - check them first
More from nielsmadan/agentic-coding
pdf
Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill.
22code-review
Code review workflow. Use when reviewing code changes, PRs, or specific files for quality, bugs, and best practices.
13optimize-seo
Audit and optimize web pages for SEO. Use when user asks to "optimize SEO", "check SEO", "add meta tags", "add structured data", "add schema markup", "improve search ranking", "SEO audit", or wants to generate Open Graph tags, JSON-LD, or fix SEO issues on HTML pages.
11review-history
Analyze how code changed over time. Use when investigating regressions, understanding why code was written a certain way, or finding when a behavior changed.
11skill-creator
Guide for creating effective skills. Use when users want to create a new skill, update an existing skill, or need help structuring a skill that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations. Triggers on "create a skill", "build a skill", "new skill", "improve this skill", or "skill for [use case]".
11rn-upgrade
React Native upgrade workflow. Use when upgrading React Native, bumping RN version, migrating to a new RN release, or resolving breaking changes after a React Native update.
10