adaptive-rejection-sampler
Adaptive Rejection Sampler
Overview
This skill provides guidance for implementing Adaptive Rejection Sampling (ARS) algorithms. ARS is a method for generating samples from log-concave probability distributions by constructing piecewise linear upper and lower envelopes of the log-density function. The skill focuses on procedural approaches, performance optimization, and verification strategies rather than providing implementation code.
When to Use This Skill
Use this skill when:
- Implementing adaptive rejection sampling from scratch
- Working with log-concave distribution samplers
- Building statistical sampling algorithms that require envelope construction
- Debugging or optimizing existing ARS implementations
- The task involves R, Python, or other statistical computing environments
Implementation Approach
Phase 1: Algorithm Design Before Coding
Before writing any code:
-
Understand the mathematical foundations
- Review the ARS algorithm requirements for log-concave functions
- Understand how piecewise linear envelopes are constructed from tangent lines
- Identify the squeeze function optimization
-
Design with performance in mind
- Plan iteration limits and safeguards from the start
- Consider worst-case computational complexity of the sampling loop
- Design for timeout constraints that may be stricter than development testing
-
Plan the module structure
- Log-concavity verification module
- Envelope construction and update module
- Sampling loop with rejection logic
- Initialization point selection
Phase 2: Critical Implementation Considerations
Log-Concavity Checking
When implementing log-concavity verification:
- Use appropriate tolerance values for numerical comparison
- Consider that some valid distributions have constant second derivatives (e.g., exponential distribution)
- Use
<= toleranceinstead of strict<comparisons when checking for non-positive second derivatives - Test with edge cases like exponential, normal, and gamma distributions
Initialization Points
Proper initialization significantly affects sampling quality and performance:
- Handle shifted distributions by adjusting initialization points relative to the mode
- Consider the support bounds when selecting initial points
- Use multiple initial points to ensure good envelope coverage
- Test with distributions that have modes far from the origin
Iteration Limits and Safeguards
Critical for preventing infinite loops and timeouts:
- Implement maximum iteration limits in all sampling loops
- Add progress indicators or logging for long-running computations
- Include timeout protection mechanisms
- Design early exit conditions for convergence
Phase 3: Performance-First Development
Timeout Considerations
- External test frameworks may use shorter timeouts than development testing
- If development tests use 180-second timeouts, production may use 60 seconds
- Profile performance early to catch issues before they become blocking
- Test with varying timeout constraints to ensure robustness
Computational Efficiency
- Analyze the computational complexity of the sampling loop
- Optimize envelope updates to minimize recalculation
- Consider caching frequently computed values
- Profile with realistic sample sizes
Verification Strategies
Unit Testing Approach
-
Test each module independently
- Log-concavity checker with known log-concave and non-log-concave functions
- Envelope construction with simple distributions
- Sampling loop with controlled random seeds
-
Test known distributions
- Standard normal distribution
- Exponential distribution (constant second derivative edge case)
- Truncated normal distributions
- Gamma distributions with various parameters
-
Test edge cases explicitly
- Non-function inputs
- Negative sample counts
- Invalid bounds (lower > upper)
- Very small and very large sample sizes
Performance Testing
- Match testing timeouts to expected production constraints
- Test worst-case behavior, not just average case
- Profile with different random seeds to catch stochastic failures
- Measure time per sample to identify performance degradation
Integration Testing
- Test the complete pipeline from input to samples
- Verify statistical properties of generated samples (mean, variance, distribution shape)
- Use statistical tests (KS test, chi-square) to verify sample quality
- Test with distributions that have known analytical moments
Common Pitfalls and Mistakes
Critical Mistakes to Avoid
-
Missing iteration limits
- Always include maximum iteration safeguards in rejection sampling loops
- An unbounded rejection loop will cause timeouts on difficult distributions
-
Inadequate performance testing
- Development tests passing does not guarantee production success
- Test with the same timeout constraints as the target environment
-
Tolerance issues in log-concavity checking
- Strict inequality checks (
< 0) may incorrectly reject valid distributions - Use tolerant comparisons (
<= tolerance) for numerical stability
- Strict inequality checks (
-
Poor initialization for shifted distributions
- Hard-coded initialization points fail for distributions with non-zero modes
- Always compute initialization relative to the distribution's characteristics
-
Incomplete file verification
- After writing large code blocks, verify the complete content was written
- Truncated files can cause subtle bugs
Debugging Approach
When tests fail or timeout:
-
Systematic debugging over trial-and-error
- Understand root causes before implementing fixes
- Use logging to identify where time is being spent
-
Isolate the problematic component
- Test log-concavity checking separately
- Test envelope construction separately
- Test sampling loop with mock envelopes
-
Check for infinite loops
- Add iteration counters to all loops
- Log when iteration limits are approached
- Verify exit conditions are reachable
Quality Checklist
Before considering the implementation complete:
- All sampling loops have maximum iteration limits
- Log-concavity checking uses appropriate tolerances
- Initialization points adapt to distribution characteristics
- Performance tested with target timeout constraints
- Edge cases for invalid inputs are handled
- Statistical properties of samples verified
- Code verified to be completely written (no truncation)
- Tested with multiple random seeds