skills/freenet/freenet-agent-skills/systematic-debugging

systematic-debugging

SKILL.md

Systematic Debugging

When to Use

Invoke this methodology automatically when:

  • A test fails and the cause isn't immediately obvious
  • Unexpected behavior occurs in production or development
  • An error message doesn't directly point to the fix
  • Multiple potential causes exist

Core Principles

  1. Hypothesize before acting - Form explicit hypotheses about root cause before changing code
  2. Test hypotheses systematically - Validate or eliminate each hypothesis with evidence
  3. Parallelize investigation - Use subagents for concurrent readonly exploration
  4. Preserve test integrity - Never weaken tests to make them pass

Debugging Scope Ladder

Always prefer the smallest, most reproducible scope that demonstrates the bug. Work up the ladder only when the smaller scope can't reproduce or doesn't apply:

Priority Scope When to Use Command
1 Unit test Logic errors, algorithm bugs, single-function issues cargo test -p freenet -- specific_test
2 Mocked unit test Transport/ring logic needing isolation Unit test with MockNetworkBridge / MockRing
3 Simulation test Multi-node behavior, state machines, race conditions cargo test -p freenet --test simulation_integration -- --test-threads=1
4 SimNetwork + FaultConfig Fault tolerance, message loss, network partitions SimNetwork with configured fault injection
5 fdev single-process Quick multi-peer CI validation cargo run -p fdev -- test --seed 42 single-process
6 freenet-test-network 20+ peer large-scale behavior Docker-based freenet-test-network
7 Real network Issues that only manifest with real UDP/NAT/latency Manual multi-peer test across machines

Why this order matters:

  • Lower scopes are faster, deterministic, and reproducible by anyone
  • Higher scopes require more infrastructure, time, and may not be accessible to all contributors
  • Gateway logs, aggregate telemetry, and production metrics are not available to every developer — don't assume access to these when designing reproduction steps

Debugging Workflow

Phase 0: Claim the Issue

If you're working on a GitHub issue, check if it's already assigned before starting. If someone else is assigned, stop and inform the user — don't duplicate effort. If unassigned, assign it to yourself so others know it's being worked on:

gh issue view <ISSUE> --repo freenet/<REPO>  # Check assignees
gh issue edit <ISSUE> --repo freenet/<REPO> --add-assignee @me

Phase 1: Reproduce and Isolate

  1. Reproduce the failure — Confirm the bug exists and is reproducible
  2. Use the scope ladder — Start at the smallest scope that can demonstrate the bug:
    • Can you write a unit test? Try that first
    • Needs multiple nodes? Use the simulation framework with a deterministic seed
    • Only happens under fault conditions? Use SimNetwork with FaultConfig
    • Can't reproduce in simulation? Then escalate to real network testing
  3. Record the seed — When using simulation tests, always record the seed value for reproducibility
  4. Gather initial evidence — Read error messages, logs, stack traces

Simulation-first approach for distributed bugs:

# Run simulation tests deterministically
cargo test -p freenet --features simulation_tests --test sim_network -- --test-threads=1

# With logging to observe event sequences
RUST_LOG=info cargo test -p freenet --features simulation_tests --test sim_network -- --nocapture --test-threads=1

# Reproduce with a specific seed
cargo run -p fdev -- test --seed 0xDEADBEEF single-process

Phase 2: Form Hypotheses

Before touching any code, explicitly list potential causes:

Hypotheses:
1. [Most likely] The X component isn't handling Y case
2. [Possible] Race condition between A and B
3. [Less likely] Configuration mismatch in Z

Rank by likelihood based on evidence. Avoid anchoring on the first idea.

Freenet-specific hypothesis patterns:

  • State machine bugs — Invalid transitions in operations (CONNECT, GET, PUT, UPDATE, SUBSCRIBE)
  • Ring/routing errors — Incorrect peer selection, distance calculations, topology issues
  • Transport issues — UDP packet loss handling, encryption/decryption, connection lifecycle
  • Contract execution — WASM sandbox issues, state verification failures
  • Determinism violations — Code using std::time::Instant::now() instead of TimeSource, or rand::random() instead of GlobalRng
  • Silent failure / fire-and-forget — Spawned task dies with no error propagation (check: is the JoinHandle stored and polled? what happens if the task exits?), broadcast sent to zero targets with no warning, channel overflow silently dropping messages. Look for: tokio::spawn without .await/.abort(), let _ = sender.send(), missing logging on empty target sets
  • Resource exhaustion — HashMap/Vec/channel entries inserted but never removed, causing unbounded memory growth or channel backpressure. Check: is there a cleanup path for every insert? Is cleanup triggered on both success AND failure/timeout? Run sustained operations and assert collection sizes stay bounded
  • Incomplete wiring — Feature only works for some operation types (e.g., router feedback wired for GET but not subscribe/put/update). When debugging "X doesn't work for operation Y," check all enum variants in the dispatch path — commented-out arms, _ => Irrelevant catch-alls, and missing match arms are common
  • TTL/timing race conditions — Two time-dependent operations where the first can expire before the second completes (e.g., transient TTL expires before CONNECT handshake, interest TTL expires before subscription renewal, broadcast fires before subscriptions complete). Check: what happens if operation A takes longer than timeout B?
  • Regressions from "safe" changes — A seemingly harmless change (code simplification, removing a feature flag, changing defaults) breaks an invariant that nothing tests. When a recent commit looks innocent, check what implicit behaviors it removed
  • Mock/test divergence — Bug can't be reproduced in tests because the mock runtime behaves differently from production. Check: does the mock skip side effects (e.g., BSC emission)? Does the test use a different code path than production (e.g., explicit subscribe vs background subscribe)? Does the mock socket behave differently from real UDP?

See Module-Specific Debugging Guide for detailed bug patterns, data collection strategies, and test approaches per module.

Phase 3: Investigate Systematically

For each hypothesis:

  1. Identify what evidence would confirm or refute it
  2. Gather that evidence (logs, code reading, adding debug output)
  3. Update hypothesis ranking based on findings
  4. Move to next hypothesis if current one is eliminated

Freenet-specific data gathering:

What You Need How to Get It Access
Event sequences RUST_LOG=info + --nocapture on simulation tests Everyone
Network message patterns sim.get_network_stats() in simulation tests Everyone
Convergence behavior sim.await_convergence(timeout, poll, min_contracts) Everyone
Virtual time state sim.virtual_time().now_nanos() Everyone
Git history of affected code git log --oneline -20 -- path/to/file.rs Everyone
Fault injection results SimNetwork + FaultConfig, then inspect stats Everyone
Gateway logs Access to running gateway node Limited — not all contributors
Aggregate telemetry Production monitoring dashboards Limited — core team only
Real network packet captures Physical access to test machines Limited — specific environments

For module-specific data gathering techniques, see Module-Specific Debugging Guide — it covers observation APIs, #[freenet_test] event capture, RUST_LOG targets, and fault injection per module.

Parallel investigation with subagents:

Use general-purpose agents with codebase-investigator instructions for independent, readonly investigations. Spawn multiple in parallel, each with a specific focus.

Spawn investigators in parallel using Task tool (subagent_type="general-purpose"):

1. "You are a codebase-investigator. [Include agents/codebase-investigator.md instructions]
    Search for similar error handling patterns in the codebase related to [bug description]"

2. "You are a codebase-investigator. [Include agents/codebase-investigator.md instructions]
    Check git history for recent changes to [affected module/files]"

3. "You are a codebase-investigator. [Include agents/codebase-investigator.md instructions]
    Read and analyze [test file] and related fixtures for [component]"

Guidelines:

  • Each investigator focuses on one hypothesis or evidence type
  • Only parallelize readonly tasks — code changes must be sequential
  • Investigators report findings; you synthesize and decide next steps

Phase 4: Fix and Verify

  1. Fix the root cause — Not symptoms
  2. Verify with deterministic reproduction — Re-run the failing test with the same seed
  3. Check for regressionscargo test -p freenet
  4. Consider edge cases — Does the fix handle similar scenarios?
  5. Verify determinism — If you added new code, ensure it uses TimeSource and GlobalRng (not std::time / rand directly)

Phase 5: Test Coverage Analysis

Always ask: "Why didn't CI catch this?"

Freenet has multiple test layers:

Layer Scope What It Catches
Unit tests (~1000) Individual functions Logic errors, algorithm bugs
Integration tests (~80) Component interactions Interface mismatches, data flow bugs
Simulation tests Multi-node deterministic State machine bugs, race conditions, protocol errors
fdev single-process Quick multi-peer Basic distributed behavior
freenet-test-network 20+ peers in Docker Scale-dependent bugs, realistic network behavior
Real network tests Physical machines NAT traversal, real latency, UDP behavior

If a bug reached production or manual testing, there's a gap. Investigate:

  1. Which test layer should have caught this?

    • Logic error → unit test
    • Component interaction bug → integration test
    • Distributed/state machine behavior → simulation test with #[freenet_test]
    • Fault tolerance → SimNetwork with FaultConfig
    • Scale-dependent → freenet-test-network
  2. Why didn't the existing tests catch it?

    • Tests use different topology/configuration than production
    • Tests mock components that exhibit the bug in real usage
    • Simulation doesn't inject the right fault conditions
    • Test assertions too weak to detect the failure
    • Determinism violation — code path bypasses TimeSource/GlobalRng
  3. Document the gap — Include in the issue/PR:

    • What test would have caught this
    • Why existing tests didn't
    • Whether a new test should be added to prevent regression

Anti-Patterns to Avoid

Jumping to conclusions

  • Wrong: See error, immediately change code that seems related
  • Right: Form hypothesis, gather evidence, then act

Tunnel vision

  • Wrong: Spend hours on one theory despite contradicting evidence
  • Right: Set time bounds, pivot when evidence points elsewhere

Weakening tests

  • Wrong: Test fails, reduce assertions or add exceptions to make it pass
  • Right: Understand why the test expects what it does, fix the code to meet that expectation
  • Exception: The test itself has a bug or tests incorrect behavior (rare, requires clear justification)

Sequential investigation when parallel is possible

  • Wrong: Read file A, wait, read file B, wait, read file C
  • Right: Spawn codebase-investigator agents to read A, B, C concurrently, synthesize findings

Fixing without understanding

  • Wrong: Copy a fix from Stack Overflow that makes the error go away
  • Right: Understand why the fix works and whether it addresses root cause

Skipping the scope ladder

  • Wrong: Jump straight to real network debugging when the bug could be reproduced in a unit test
  • Right: Start small — unit test, then simulation, then real network

Breaking determinism

  • Wrong: Use std::time::Instant::now() or rand::random() in core logic
  • Right: Use TimeSource trait and GlobalRng so simulation tests remain reproducible

Assuming data access

  • Wrong: "Check the gateway logs to see what happened" (not everyone has gateway access)
  • Right: Design reproduction steps using simulation tests and RUST_LOG that any contributor can run

Checklist Before Declaring "Fixed"

  • Root cause identified and understood
  • Fix addresses root cause, not symptoms
  • Original failure no longer reproduces
  • No new test failures introduced
  • Test added if one didn't exist (when practical)
  • No test assertions weakened or disabled
  • Answered "why didn't CI catch this?" and documented the test gap
Weekly Installs
10
GitHub Stars
6
First Seen
Feb 21, 2026
Installed on
opencode10
gemini-cli10
github-copilot10
codex10
kimi-cli10
cursor10