tools-debugging-root-cause
Root Cause Tracing
Overview
Bugs often surface deep in a stack trace—far away from the code that actually caused them. This skill teaches you to walk the chain backward until you reach the true trigger, then reinforce defenses so the issue cannot recur.
Core principle: Never patch just the symptom. Follow the evidence upstream and fix the source.
When to Use
- Error occurs deep inside execution (nested call stack, worker, async task).
- Stack trace is long or unhelpful; unclear where invalid data originated.
- Behavior differs across tests/runs and you need to identify the polluting test.
- You suspect earlier code (setup, beforeEach, fixtures) seeded bad state.
Decision flow:
Bug surfaces deep? → yes → Can you trace back? → yes → Trace to original trigger → Add defense-in-depth → Done
↘ no → Temporary symptom fix (last resort)
Tracing Process
- Observe the symptom
- Capture the exact error message, location, and context (
git init failed in ...).
- Capture the exact error message, location, and context (
- Identify immediate cause
- Locate the line/function performing the failing action (
execFileAsync('git', ...)).
- Locate the line/function performing the failing action (
- Ask who called it
- Walk up the call graph (stack trace, search references). Document each hop.
- Inspect parameters/state
- Determine what inputs were passed (e.g.,
projectDir = '').
- Determine what inputs were passed (e.g.,
- Continue upstream
- Repeat until you find where the bad value originated (e.g., fixture accessed before setup ran).
- Fix at the source
- Correct the earliest point, then add guards down the stack to prevent reintroduction.
Instrumentation & Stack Traces
When static tracing stalls, instrument the suspicious operation before it runs:
async function gitInit(directory: string) {
const stack = new Error().stack;
console.error('DEBUG git init', {
directory,
cwd: process.cwd(),
nodeEnv: process.env.NODE_ENV,
stack,
});
await execFileAsync('git', ['init'], { cwd: directory });
}
- Use
console.errorin tests (loggers may be muted). - Pipe stderr when running tests:
npm test 2>&1 | grep 'DEBUG git init'. - Analyze stack entries to find the initiating test or module.
Finding Polluting Tests
If you know an artifact is created but not which test caused it, bisect:
./find-polluter.sh '.git' 'src/**/*.test.ts'
Script runs tests individually and stops at the first polluter. Adapt paths to your suite.
Defense-in-Depth
Once the root cause is fixed, add guards at multiple layers:
- Validate inputs at entry points (e.g.,
Project.create()rejects empty dirs). - Add sanity checks in intermediary managers (workspace/session).
- Add environment guards (e.g., forbid
git initoutside temp directories in tests). - Keep lightweight instrumentation (stack logging) for early warning.
Verification Checklist
- Root cause documented (which function/test injected bad state).
- Symptom disappears after fix.
- Additional validation/guards added at key layers.
- Re-run entire test suite (or repro steps) to confirm no regressions.
- If instrumentation was added temporarily, either keep it gated (for future tracing) or remove once confident.
Stack Trace Tips
- Log before the risky action; failures might skip later logs.
- Include directory paths, environment vars, and timestamps for each log entry.
new Error().stackcaptures full call chain without throwing.- When working across async boundaries, capture stack inside the async function to preserve context.
Outcome
By tracing backward and reinforcing defenses, you turn “mystery” bugs into deterministic issues. The result: the bug becomes impossible or at least loudly detected long before it corrupts deeper systems.