Adaptive Rejection Sampler

Overview

This skill provides guidance for implementing Adaptive Rejection Sampling (ARS) algorithms. ARS is a method for generating samples from log-concave probability distributions by constructing piecewise linear upper and lower envelopes of the log-density function. The skill focuses on procedural approaches, performance optimization, and verification strategies rather than providing implementation code.

When to Use This Skill

Use this skill when:

Implementing adaptive rejection sampling from scratch
Working with log-concave distribution samplers
Building statistical sampling algorithms that require envelope construction
Debugging or optimizing existing ARS implementations
The task involves R, Python, or other statistical computing environments

Implementation Approach

Phase 1: Algorithm Design Before Coding

Before writing any code:

Understand the mathematical foundations
- Review the ARS algorithm requirements for log-concave functions
- Understand how piecewise linear envelopes are constructed from tangent lines
- Identify the squeeze function optimization
Design with performance in mind
- Plan iteration limits and safeguards from the start
- Consider worst-case computational complexity of the sampling loop
- Design for timeout constraints that may be stricter than development testing
Plan the module structure
- Log-concavity verification module
- Envelope construction and update module
- Sampling loop with rejection logic
- Initialization point selection

Phase 2: Critical Implementation Considerations

Log-Concavity Checking

When implementing log-concavity verification:

Use appropriate tolerance values for numerical comparison
Consider that some valid distributions have constant second derivatives (e.g., exponential distribution)
Use <= tolerance instead of strict < comparisons when checking for non-positive second derivatives
Test with edge cases like exponential, normal, and gamma distributions

Initialization Points

Proper initialization significantly affects sampling quality and performance:

Handle shifted distributions by adjusting initialization points relative to the mode
Consider the support bounds when selecting initial points
Use multiple initial points to ensure good envelope coverage
Test with distributions that have modes far from the origin

Iteration Limits and Safeguards

Critical for preventing infinite loops and timeouts:

Implement maximum iteration limits in all sampling loops
Add progress indicators or logging for long-running computations
Include timeout protection mechanisms
Design early exit conditions for convergence

Phase 3: Performance-First Development

Timeout Considerations

External test frameworks may use shorter timeouts than development testing
If development tests use 180-second timeouts, production may use 60 seconds
Profile performance early to catch issues before they become blocking
Test with varying timeout constraints to ensure robustness

Computational Efficiency

Analyze the computational complexity of the sampling loop
Optimize envelope updates to minimize recalculation
Consider caching frequently computed values
Profile with realistic sample sizes

Verification Strategies

Unit Testing Approach

Test each module independently
- Log-concavity checker with known log-concave and non-log-concave functions
- Envelope construction with simple distributions
- Sampling loop with controlled random seeds
Test known distributions
- Standard normal distribution
- Exponential distribution (constant second derivative edge case)
- Truncated normal distributions
- Gamma distributions with various parameters
Test edge cases explicitly
- Non-function inputs
- Negative sample counts
- Invalid bounds (lower > upper)
- Very small and very large sample sizes

Performance Testing

Match testing timeouts to expected production constraints
Test worst-case behavior, not just average case
Profile with different random seeds to catch stochastic failures
Measure time per sample to identify performance degradation

Integration Testing

Test the complete pipeline from input to samples
Verify statistical properties of generated samples (mean, variance, distribution shape)
Use statistical tests (KS test, chi-square) to verify sample quality
Test with distributions that have known analytical moments

Common Pitfalls and Mistakes

Critical Mistakes to Avoid

Missing iteration limits
- Always include maximum iteration safeguards in rejection sampling loops
- An unbounded rejection loop will cause timeouts on difficult distributions
Inadequate performance testing
- Development tests passing does not guarantee production success
- Test with the same timeout constraints as the target environment
Tolerance issues in log-concavity checking
- Strict inequality checks (< 0) may incorrectly reject valid distributions
- Use tolerant comparisons (<= tolerance) for numerical stability
Poor initialization for shifted distributions
- Hard-coded initialization points fail for distributions with non-zero modes
- Always compute initialization relative to the distribution's characteristics
Incomplete file verification
- After writing large code blocks, verify the complete content was written
- Truncated files can cause subtle bugs

Debugging Approach

When tests fail or timeout:

Systematic debugging over trial-and-error
- Understand root causes before implementing fixes
- Use logging to identify where time is being spent
Isolate the problematic component
- Test log-concavity checking separately
- Test envelope construction separately
- Test sampling loop with mock envelopes
Check for infinite loops
- Add iteration counters to all loops
- Log when iteration limits are approached
- Verify exit conditions are reachable

Quality Checklist

Before considering the implementation complete:

All sampling loops have maximum iteration limits
Log-concavity checking uses appropriate tolerances
Initialization points adapt to distribution characteristics
Performance tested with target timeout constraints
Edge cases for invalid inputs are handled
Statistical properties of samples verified
Code verified to be completely written (no truncation)
Tested with multiple random seeds

adaptive-rejection-sampler