skills/arabelatso/skills-4-se/reproduction-trace-instrumenter

reproduction-trace-instrumenter

SKILL.md

Reproduction Trace Instrumenter

Overview

This skill instruments source code to capture detailed execution traces for bug reproduction. It records function calls, variable values, control flow, and program state, then generates replay scripts to deterministically reproduce the bug for diagnosis.

Workflow

1. Identify the Bug Context

Before instrumentation, understand:

  • What is the bug or failure being investigated?
  • Which code paths are likely involved?
  • What inputs trigger the bug?
  • Is the bug deterministic or intermittent?

2. Instrument the Code

Use the appropriate instrumenter for your language:

Python Instrumentation

python scripts/python_instrumenter.py <source_file.py> -o <instrumented_file.py>

Options:

  • --no-functions: Disable function call tracing
  • --no-variables: Disable variable assignment tracing
  • --no-control-flow: Disable control flow tracing
  • --exclude <patterns>: Exclude functions matching patterns (e.g., __init__ test_*)

Example:

# Full instrumentation
python scripts/python_instrumenter.py app.py -o app_instrumented.py

# Minimal instrumentation (functions only)
python scripts/python_instrumenter.py app.py -o app_instrumented.py --no-variables --no-control-flow

# Exclude test functions
python scripts/python_instrumenter.py app.py -o app_instrumented.py --exclude test_ __

3. Run the Instrumented Code

Execute the instrumented program with the inputs that trigger the bug:

python app_instrumented.py

The execution trace will be automatically saved to trace.json when the program exits.

Trace Output:

  • trace.json: Complete execution trace with all recorded events
  • Console output: Summary of trace recording

4. Analyze the Trace

Generate a human-readable summary:

python scripts/replay_generator.py trace.json --summary

This shows:

  • Total number of events
  • Event type distribution
  • Function call sequence
  • Maximum call depth

5. Generate Replay Script

Create a replay script to reproduce the bug:

python scripts/replay_generator.py trace.json -o replay.py

Run the replay script:

python replay.py

The replay script executes the same sequence of operations, allowing you to:

  • Reproduce the bug consistently
  • Add breakpoints at specific steps
  • Modify values to test hypotheses
  • Understand the execution flow

Configuration

Use the trace configuration template to customize instrumentation:

cp assets/trace_config_template.json trace_config.json
# Edit trace_config.json as needed

Key Configuration Options:

Instrumentation Level:

  • trace_functions: Record function entry/exit
  • trace_variables: Record variable assignments
  • trace_control_flow: Record if/else, loops
  • trace_exceptions: Record exception handling

Filtering:

  • exclude_patterns: Function name patterns to skip
  • exclude_modules: Modules to skip entirely
  • max_string_length: Truncate long strings
  • max_call_depth: Limit trace depth

Performance:

  • buffer_size: Events to buffer before writing
  • async_write: Write traces asynchronously
  • max_trace_size_mb: Maximum trace file size

Instrumentation Levels

Choose the appropriate level based on your needs:

Minimal (Functions Only)

python scripts/python_instrumenter.py app.py -o app_inst.py --no-variables --no-control-flow
  • Overhead: 5-15%
  • Use when: You need to understand call sequence only
  • Trace size: Small

Standard (Functions + Variables)

python scripts/python_instrumenter.py app.py -o app_inst.py --no-control-flow
  • Overhead: 20-50%
  • Use when: You need to track state changes
  • Trace size: Medium

Full (Everything)

python scripts/python_instrumenter.py app.py -o app_inst.py
  • Overhead: 50-200%
  • Use when: You need complete execution details
  • Trace size: Large

Common Use Cases

Use Case 1: Intermittent Bug Reproduction

User: "I have a bug that only happens sometimes. Help me capture what's happening."
→ Instrument with full tracing
→ Run multiple times until bug occurs
→ Analyze the trace from the failing run
→ Generate replay script to reproduce consistently

Use Case 2: Understanding Complex Control Flow

User: "I don't understand why this function returns the wrong value."
→ Instrument with functions + variables
→ Run with problematic input
→ Review trace to see variable values at each step
→ Identify where the logic goes wrong

Use Case 3: Debugging Production Issues

User: "Users report a crash but I can't reproduce it locally."
→ Instrument production code (minimal level for performance)
→ Deploy and wait for crash
→ Retrieve trace.json from crashed instance
→ Generate replay script to reproduce locally

Use Case 4: Regression Testing

User: "I fixed a bug. How do I ensure it doesn't come back?"
→ Capture trace of the bug before fix
→ Generate replay script
→ Use replay script as regression test
→ Run after each code change

Trace Format

Traces are stored in JSON format with the following structure:

{
  "traces": [
    {
      "seq": 1,
      "timestamp": "2024-01-15T10:30:45.123",
      "type": "function_entry",
      "depth": 0,
      "data": {
        "function": "calculate_total",
        "arguments": {"price": 100, "tax_rate": 0.08}
      }
    },
    {
      "seq": 2,
      "timestamp": "2024-01-15T10:30:45.125",
      "type": "variable_assignment",
      "depth": 1,
      "data": {
        "variable": "tax",
        "value": 8.0,
        "type": "float"
      }
    }
  ],
  "metadata": {
    "total_events": 2,
    "max_depth": 1
  }
}

Best Practices

  1. Start Minimal: Begin with function-level tracing, add detail as needed

  2. Focus on Bug Area: Use --exclude to skip irrelevant code paths

  3. Test Instrumentation: Verify instrumented code behaves the same as original

  4. Manage Trace Size: Use filtering to keep traces manageable

  5. Validate Replay: Ensure replay script reproduces the bug consistently

  6. Clean Up: Remove instrumentation before committing code

Limitations

  1. Observer Effect: Instrumentation may change timing and behavior

    • Minimize by using lower instrumentation levels
    • Be aware of race conditions in concurrent code
  2. Performance Overhead: Instrumented code runs slower

    • Use sampling or selective instrumentation for performance-critical code
  3. Trace Size: Full traces can be very large

    • Apply filtering and size limits
    • Focus on specific code regions
  4. Non-Determinism: Some bugs involve external factors

    • Record external inputs (network, file system, time)
    • Use deterministic mode in configuration
  5. Language Support: Currently supports Python only

    • See references/instrumentation_techniques.md for other languages

Advanced Topics

Custom Instrumentation

Modify scripts/python_instrumenter.py to add custom tracing:

  • Trace specific function arguments
  • Record custom metrics
  • Add conditional breakpoints
  • Integrate with logging frameworks

Multi-Process Tracing

For programs with multiple processes:

  • Instrument each process separately
  • Use process ID in trace filenames
  • Merge traces for analysis

Distributed System Tracing

For distributed systems:

  • Add correlation IDs to trace events
  • Synchronize timestamps across nodes
  • Use distributed tracing tools (Jaeger, Zipkin)

Resources

scripts/python_instrumenter.py

AST-based Python code instrumenter that:

  • Parses Python source code
  • Inserts tracing calls at key points
  • Generates instrumented code with embedded trace runtime
  • Supports configurable instrumentation levels

scripts/replay_generator.py

Trace replay script generator that:

  • Reads execution traces from JSON
  • Generates executable Python replay scripts
  • Provides trace summaries and statistics
  • Enables deterministic bug reproduction

references/instrumentation_techniques.md

Comprehensive guide covering:

  • Instrumentation approaches (source, bytecode, dynamic)
  • What to trace and how to filter
  • Trace reduction strategies
  • Deterministic replay techniques
  • Language-specific considerations
  • Performance optimization
  • Best practices and common pitfalls

Read this reference when you need deeper understanding of instrumentation theory, want to implement instrumenters for other languages, or need to optimize trace performance.

assets/trace_config_template.json

Configuration template for customizing:

  • Instrumentation levels
  • Filtering rules
  • Performance settings
  • Replay options

Copy and modify this template to create custom trace configurations for specific use cases.

Weekly Installs
1
GitHub Stars
47
First Seen
11 days ago
Installed on
amp1
cline1
opencode1
cursor1
kimi-cli1
codex1