Root Cause Tracing

Overview

Bugs often surface deep in a stack trace—far away from the code that actually caused them. This skill teaches you to walk the chain backward until you reach the true trigger, then reinforce defenses so the issue cannot recur.

Core principle: Never patch just the symptom. Follow the evidence upstream and fix the source.

When to Use

Error occurs deep inside execution (nested call stack, worker, async task).
Stack trace is long or unhelpful; unclear where invalid data originated.
Behavior differs across tests/runs and you need to identify the polluting test.
You suspect earlier code (setup, beforeEach, fixtures) seeded bad state.

Decision flow:

Bug surfaces deep? → yes → Can you trace back? → yes → Trace to original trigger → Add defense-in-depth → Done
                                          ↘ no → Temporary symptom fix (last resort)

Tracing Process

Observe the symptom
- Capture the exact error message, location, and context (git init failed in ...).
Identify immediate cause
- Locate the line/function performing the failing action (execFileAsync('git', ...)).
Ask who called it
- Walk up the call graph (stack trace, search references). Document each hop.
Inspect parameters/state
- Determine what inputs were passed (e.g., projectDir = '').
Continue upstream
- Repeat until you find where the bad value originated (e.g., fixture accessed before setup ran).
Fix at the source
- Correct the earliest point, then add guards down the stack to prevent reintroduction.

Instrumentation & Stack Traces

When static tracing stalls, instrument the suspicious operation before it runs:

async function gitInit(directory: string) {
  const stack = new Error().stack;
  console.error('DEBUG git init', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  });

  await execFileAsync('git', ['init'], { cwd: directory });
}

Use console.error in tests (loggers may be muted).
Pipe stderr when running tests: npm test 2>&1 | grep 'DEBUG git init'.
Analyze stack entries to find the initiating test or module.

Finding Polluting Tests

If you know an artifact is created but not which test caused it, bisect:

./find-polluter.sh '.git' 'src/**/*.test.ts'

Script runs tests individually and stops at the first polluter. Adapt paths to your suite.

Defense-in-Depth

Once the root cause is fixed, add guards at multiple layers:

Validate inputs at entry points (e.g., Project.create() rejects empty dirs).
Add sanity checks in intermediary managers (workspace/session).
Add environment guards (e.g., forbid git init outside temp directories in tests).
Keep lightweight instrumentation (stack logging) for early warning.

Verification Checklist

Root cause documented (which function/test injected bad state).
Symptom disappears after fix.
Additional validation/guards added at key layers.
Re-run entire test suite (or repro steps) to confirm no regressions.
If instrumentation was added temporarily, either keep it gated (for future tracing) or remove once confident.

Stack Trace Tips

Log before the risky action; failures might skip later logs.
Include directory paths, environment vars, and timestamps for each log entry.
new Error().stack captures full call chain without throwing.
When working across async boundaries, capture stack inside the async function to preserve context.

Outcome

By tracing backward and reinforcing defenses, you turn “mystery” bugs into deterministic issues. The result: the bug becomes impossible or at least loudly detected long before it corrupts deeper systems.

tools-debugging-root-cause

Root Cause Tracing

Overview

When to Use

Tracing Process

Instrumentation & Stack Traces

Finding Polluting Tests

Defense-in-Depth

Verification Checklist

Stack Trace Tips

Outcome

More from tjboudreaux/cc-plugin-engineering-excellence

eng-tdd

eng-verification

eng-performance

eng-user-impact

meta-superpowers

eng-observability