fix-ocaml-gc
Fixing OCaml Garbage Collector Bugs
This skill provides guidance for diagnosing and fixing bugs in the OCaml garbage collector runtime, with emphasis on memory management issues in sweeping, allocation, and free-list handling code.
When to Use This Skill
- Debugging segfaults or memory corruption in OCaml's C runtime
- Fixing bugs in the GC's sweeping or allocation logic
- Working with size-classed memory pools and free block management
- Investigating pointer arithmetic issues in memory managers
- Analyzing run-length encoded free lists
Environment Setup
Establish Context First
Before making any changes:
- Check repository type: Verify if the project is a git repository before running git commands
- Understand the build system: Read build documentation (e.g.,
HACKING.adoc,INSTALL,Makefile) to understand compilation steps - Verify file paths: Always use absolute paths when reading files to avoid path resolution errors
- Identify OCaml version: Different OCaml versions have different GC implementations (especially OCaml 5.x with multicore support)
Build Configuration
When building the OCaml compiler:
- Use appropriate timeouts: OCaml bootstrap compilation is lengthy; use timeouts of 10+ minutes for full builds
- Consider background builds: For long compilations, run in background and monitor progress
- Incremental builds: After initial bootstrap, use
makewithoutworldtarget for faster iteration
# Initial configuration
./configure
# Full build (may take 10+ minutes)
make world
# Incremental rebuild after changes
make
Debugging Approach
Locating GC Code
Key files in OCaml's runtime for GC-related issues:
runtime/shared_heap.c- Shared heap management (OCaml 5.x)runtime/major_gc.c- Major GC implementationruntime/minor_gc.c- Minor GC implementationruntime/memory.c- Memory allocation primitivesruntime/gc_ctrl.c- GC control and statistics
Understanding Memory Layout
When analyzing GC bugs, understand these concepts:
- Block headers: OCaml blocks have headers containing size (
wosize) and tag information - Size classes: Memory pools organize blocks by size class for efficient allocation
- Free lists: Free blocks may be linked or use run-length encoding
- Header repurposing: Free block headers may repurpose fields (e.g.,
wosizefor run-length counts)
Common Bug Patterns
Pointer Arithmetic Mismatches
A frequent bug pattern occurs when code uses header-derived sizes inappropriately:
// INCORRECT: Using header size for pool blocks
p += Whsize_hd(hd); // May read repurposed field
// CORRECT: Using known block size for size-classed pools
p += wh; // Use the fixed size class width
Root cause: In size-classed pools, all blocks have a fixed size wh determined by the size class. However, free block headers may repurpose the wosize field for other purposes (e.g., run-length encoding of contiguous free blocks). Using Whsize_hd(hd) reads this repurposed value instead of the actual block size.
Symptoms of Pointer Arithmetic Bugs
- Segfaults during sweeping or compaction
- Memory corruption that appears intermittent
- Crashes only occurring with certain heap sizes or allocation patterns
Systematic Code Analysis
When investigating a bug:
- Trace the iteration: Follow pointer advancement through loops
- Identify size sources: Determine where block sizes come from (header vs. pool metadata)
- Check free block handling: Special attention to how free blocks differ from allocated blocks
- Verify invariants: Ensure pointer stays within valid memory regions
Verification Strategies
Testing the Fix
- Compilation test: Ensure the runtime compiles without errors
- Basic testsuite: Run the basic test suite to catch regressions
# Run basic tests (use quotes for DIR variable)
make -C testsuite DIR='tests/basic' all
- Full testsuite: For comprehensive verification, run the complete test suite
Shell Command Pitfalls
When running tests via Makefiles:
- Quote variable assignments: Use
DIR='tests/basic'instead ofDIR=tests/basicto avoid shell interpretation issues - Watch for escaping: Makefile variables may need different quoting than direct shell commands
Search for Similar Bugs
After fixing a bug, search for similar patterns:
# Search for similar pointer arithmetic patterns
grep -n "Whsize_hd" runtime/*.c
grep -n "+= wh" runtime/*.c
Common Pitfalls
- Timeout too short: OCaml compilation needs extended timeouts (10+ minutes for full build)
- Relative paths: Always use absolute paths when reading files
- Git assumptions: Check if directory is a git repository before using git commands
- Incomplete verification: After fixing one instance, search for similar patterns elsewhere
- Shell quoting: Makefile variable assignments require careful quoting
- Header semantics: Remember that header fields may have different meanings for free vs. allocated blocks
Minimal Fix Principle
When fixing GC bugs:
- Understand the root cause: Ensure full understanding before changing code
- Make minimal changes: Change only what's necessary to fix the bug
- Preserve existing behavior: Avoid refactoring or "improving" surrounding code
- Document the fix: Ensure the change is self-explanatory or add a comment if needed
Verification Checklist
Before considering a fix complete:
- Code compiles without errors or warnings
- Basic testsuite passes
- Searched for similar patterns in the codebase
- Verified the fix addresses the root cause, not just symptoms
- Considered edge cases (empty pools, boundary conditions, different size classes)