skills/lerianstudio/ring/ring:root-cause-tracing

ring:root-cause-tracing

SKILL.md

Root Cause Tracing

Overview

Bugs often manifest deep in the call stack (git init in wrong directory, file created in wrong location, database opened with wrong path). Your instinct is to fix where the error appears, but that's treating a symptom.

Core principle: Trace backward through the call chain until you find the original trigger, then fix at the source.

When to Use

Use root-cause-tracing when:

  • Error happens deep in execution (not at entry point)
  • Stack trace shows long call chain
  • Unclear where invalid data originated
  • systematic-debugging Phase 1 leads you here

Relationship with systematic-debugging:

  • root-cause-tracing is a SUB-SKILL of systematic-debugging
  • Use during systematic-debugging Phase 1, Step 5 (Trace Data Flow)
  • Can also use standalone if you KNOW bug is deep-stack issue
  • After tracing to source, return to systematic-debugging Phase 2

When NOT to use:

  • Bug appears at entry point → Use systematic-debugging Phase 1 directly
  • You haven't started systematic-debugging yet → Start there first
  • Root cause is obvious → Just fix it
  • Still gathering evidence → Continue systematic-debugging Phase 1

The Tracing Process

  1. Observe Symptom: Error: git init failed in /Users/jesse/project/packages/core
  2. Find Immediate Cause: await execFileAsync('git', ['init'], { cwd: projectDir })
  3. Ask: What Called This? WorktreeManager.createSessionWorktree(projectDir)Session.initializeWorkspace()Session.create() → test at Project.create()
  4. Keep Tracing Up: projectDir = '' (empty!) → resolves to process.cwd() → source code directory!
  5. Find Original Trigger: const context = setupCoreTest() returns { tempDir: '' } → accessed before beforeEach!

Adding Stack Traces

When you can't trace manually, add instrumentation before the problematic operation:

console.error('DEBUG git init:', { directory, cwd: process.cwd(), stack: new Error().stack });

Critical: Use console.error() in tests (logger may not show). Run: npm test 2>&1 | grep 'DEBUG'

Analyze: Look for test file names, line numbers, patterns (same test? same parameter?).

Finding Which Test Causes Pollution

If something appears during tests but you don't know which test:

Use the bisection script: @find-polluter.sh

./find-polluter.sh '.git' 'src/**/*.test.ts'

Runs tests one-by-one, stops at first polluter. See script for usage.

Real Example: Empty projectDir

Symptom: .git in packages/core/ (source code) Trace chain: git init in process.cwd() ← empty cwd ← WorktreeManager ← Session.create() ← test accessed context.tempDir before beforeEach ← setupCoreTest() returns { tempDir: '' } Root cause: Top-level variable initialization accessing empty value Fix: Made tempDir a getter that throws if accessed before beforeEach Defense-in-depth: (1) Project.create() validates (2) WorkspaceManager validates (3) NODE_ENV guard (4) Stack trace logging

Key Principle

Flow: Found immediate cause → Can trace up? (yes → trace backwards) → Is this source? (no → keep tracing | yes → fix at source) → Add validation at each layer → Bug impossible

NEVER fix just where the error appears. Trace back to find the original trigger.

Stack Trace Tips

  • In tests: console.error() not logger (may be suppressed)
  • Before operation: Log before dangerous op, not after fail
  • Include context: Directory, cwd, env vars, timestamps
  • Capture stack: new Error().stack shows complete chain

Real-World Impact

5-level trace → fixed at source (getter validation) → 4 layers defense → 1847 tests, zero pollution


Blocker Criteria

STOP and report if:

Decision Type Blocker Condition Required Action
Incomplete trace Cannot trace call chain back to original trigger STOP and add instrumentation before proceeding
Missing source access Cannot access code that calls problematic function STOP and report missing context
Complex async flow Call chain involves async/event-driven code that breaks linear trace STOP and instrument with timestamps before continuing
External dependency Root cause appears to be in external library or service STOP and report external dependency issue

Cannot Be Overridden

The following requirements CANNOT be waived:

  • MUST trace backward through call chain to find original trigger
  • MUST NOT fix where error appears without finding true source
  • MUST add instrumentation when manual tracing is insufficient
  • MUST use defense-in-depth: validate at multiple layers after finding root cause
  • MUST verify fix eliminates the symptom completely

Severity Calibration

Severity Condition Required Action
CRITICAL Error causes data corruption or test pollution across suite MUST trace to source immediately, cannot proceed until resolved
HIGH Error affects multiple call sites or components MUST complete full trace before any fix attempt
MEDIUM Error isolated to single execution path MUST trace at least 3 levels up before fixing
LOW Error is cosmetic or easily reproducible Should trace to source, can fix incrementally

Pressure Resistance

User Says Your Response
"Just fix it where the error appears" "MUST NOT fix at symptom location. Tracing to root cause prevents the bug from manifesting elsewhere."
"We don't have time for full tracing" "CANNOT skip tracing. Fixing symptoms creates whack-a-mole debugging that takes longer overall."
"The fix works in my test" "MUST verify fix eliminates root cause, not just masks symptom in one scenario."
"Add a try-catch and move on" "CANNOT suppress errors without tracing. Error indicates invalid state that will cause problems elsewhere."

Anti-Rationalization Table

Rationalization Why It's WRONG Required Action
"The error message tells us enough" Error location ≠ error source. Message describes symptom, not cause. MUST trace call chain backward
"I know this code, the fix is obvious" Familiarity breeds assumptions. Obvious fixes often mask deeper issues. MUST verify with actual tracing
"Tracing is overkill for this simple bug" Simple symptoms often have complex causes. Skipping trace leads to regression. MUST trace regardless of apparent simplicity
"Adding logging slows things down" Temporary instrumentation is cheap. Untraced bugs cause expensive debugging sessions. MUST add instrumentation when needed
"The stack trace shows the problem" Stack trace shows where, not why. Original invalid data may be layers above. MUST trace data flow, not just call stack
Weekly Installs
26
GitHub Stars
133
First Seen
Feb 1, 2026
Installed on
cursor26
opencode25
github-copilot25
codex25
gemini-cli25
codebuddy24