NYC
skills/letta-ai/skills/rstan-to-pystan

rstan-to-pystan

SKILL.md

RStan to PyStan Conversion

This skill provides guidance for converting RStan (R interface to Stan) code to PyStan (Python interface to Stan), focusing on the significant API differences between the two libraries.

Key Insight: Stan Model Code is Language-Agnostic

The Stan modeling language itself is identical between RStan and PyStan. The Stan model code (data blocks, parameters, model, generated quantities) can typically be copied directly. The conversion challenge lies in the wrapper code that:

  • Prepares data for the model
  • Calls the sampler with correct parameters
  • Extracts and processes posterior samples

Pre-Conversion Checklist

Before writing any conversion code:

  1. Verify system dependencies for PyStan

    • PyStan 3.x requires a C++ compiler (g++ on Linux, clang on macOS)
    • Install with: apt-get install g++ or equivalent
    • Missing compiler causes cryptic compilation errors at runtime
  2. Identify PyStan version to target

    • PyStan 2.x API mirrors RStan closely
    • PyStan 3.x has a completely different API (recommended for new projects)
    • This guide focuses on PyStan 3.x conversion
  3. Read the complete R script before converting

    • Identify all hyperparameters used
    • Note any data transformations
    • Understand the expected output format

Hyperparameter Mapping (RStan to PyStan 3.x)

Create an explicit mapping table before coding:

RStan Parameter PyStan 3.x Equivalent Notes
iter N/A (see calculation) Total iterations including warmup
warmup num_warmup Number of warmup iterations
chains num_chains Number of MCMC chains
thin N/A PyStan 3 does not support thinning directly
seed random_seed Random seed for reproducibility
control=list(adapt_delta=X) N/A Not directly available in PyStan 3
control=list(max_treedepth=X) N/A Not directly available in PyStan 3

Critical calculation for num_samples:

PyStan num_samples = (RStan iter - RStan warmup) / RStan thin

Example: If RStan uses iter=2000, warmup=1000, thin=2:

  • Effective samples = (2000 - 1000) / 2 = 500
  • Use num_samples=500 in PyStan 3

Sample Extraction: Critical API Difference

This is the most common source of errors in conversion.

RStan sample extraction:

# Returns matrix with shape (n_iterations, n_parameters)
samples <- extract(fit)$parameter_name
# For vector parameters, shape is (n_iterations, param_length)

PyStan 3.x sample extraction:

# Returns array with shape (param_length, n_samples) - TRANSPOSED!
samples = fit['parameter_name']
# Single parameters have shape (n_samples,)

Verification approach:

# Always add shape debugging in first version
for param in ['rho', 'beta', 'sigma']:
    print(f"{param} shape: {fit[param].shape}")

Step-by-Step Conversion Process

Step 1: Analyze the RStan Script

  • Extract the Stan model code (between stan() or in separate .stan file)
  • Document all hyperparameters with their values
  • Identify data preparation steps
  • Note the expected output format and calculations

Step 2: Set Up Python Environment

# Verify compiler availability first
import subprocess
result = subprocess.run(['g++', '--version'], capture_output=True)
if result.returncode != 0:
    raise RuntimeError("g++ not found - install C++ compiler first")

# Then install and import
import stan  # PyStan 3.x

Step 3: Translate the Stan Model

  • Copy Stan model code directly (it's language-agnostic)
  • Store as a Python string with proper escaping
  • Verify all data block variables match your Python data preparation

Step 4: Prepare Data Dictionary

# RStan uses list(), PyStan uses dict
# Ensure all variable names match the Stan data block exactly
data = {
    'N': len(y),
    'K': X.shape[1],
    'y': y.tolist(),  # Convert numpy arrays to lists
    'X': X.tolist()
}

Step 5: Build and Sample

# PyStan 3.x pattern
posterior = stan.build(stan_code, data=data)
fit = posterior.sample(
    num_chains=chains,
    num_samples=num_samples,  # Calculated from RStan params
    num_warmup=warmup,
    random_seed=seed
)

Step 6: Extract Results with Shape Verification

# ALWAYS verify shapes before computing statistics
samples = fit['parameter_name']
print(f"Shape: {samples.shape}")  # Debug first

# For posterior means:
# - If shape is (n_samples,): use samples.mean()
# - If shape is (param_dim, n_samples): use samples.mean(axis=1)

Common Pitfalls and Solutions

Pitfall 1: Assuming RStan-like sample shapes

  • Symptom: Getting 1000 values when expecting 3 (or vice versa)
  • Cause: PyStan 3 returns (param_dim, n_samples) not (n_samples, param_dim)
  • Solution: Always check .shape before computing statistics

Pitfall 2: Missing C++ compiler

  • Symptom: Compilation errors when building Stan model
  • Cause: PyStan compiles Stan to C++ at runtime
  • Solution: Install g++ before attempting to run PyStan code

Pitfall 3: Incorrect num_samples calculation

  • Symptom: Different number of posterior samples than expected
  • Cause: Not accounting for thinning in calculation
  • Solution: Use formula: num_samples = (iter - warmup) / thin

Pitfall 4: Data type mismatches

  • Symptom: Stan compilation or runtime errors about data types
  • Cause: NumPy arrays not converted to Python lists
  • Solution: Use .tolist() on numpy arrays in data dictionary

Pitfall 5: Ignoring sampling warnings

  • Symptom: Warnings about Cholesky decomposition, divergences, etc.
  • Cause: Can indicate model issues or poor sampling
  • Solution: While often transient, log warnings and verify results make sense

Verification Strategy

After conversion, verify the following:

  1. Sample counts match expectations

    expected_samples = (rstan_iter - rstan_warmup) // rstan_thin
    assert fit['param'].shape[-1] == expected_samples
    
  2. Parameter dimensions are correct

    # For a 3-dimensional parameter vector
    assert fit['beta'].shape[0] == 3
    
  3. Posterior statistics are reasonable

    # Compare means, check they're in expected ranges
    posterior_mean = fit['param'].mean(axis=-1)
    print(f"Posterior mean: {posterior_mean}")
    
  4. Output format matches requirements

    • Verify JSON/CSV structure
    • Check key names and data types
    • Ensure numerical precision is adequate

Minimal Test Pattern

Before full conversion, write a minimal test to understand PyStan's behavior:

import stan

# Simple model to verify API understanding
code = """
data { int N; array[N] real y; }
parameters { real mu; real<lower=0> sigma; }
model { y ~ normal(mu, sigma); }
"""

data = {'N': 10, 'y': [1.0]*10}
posterior = stan.build(code, data=data)
fit = posterior.sample(num_chains=1, num_samples=100)

# Verify you understand the output structure
print(f"mu shape: {fit['mu'].shape}")
print(f"sigma shape: {fit['sigma'].shape}")
Weekly Installs
18
Repository
letta-ai/skills
First Seen
Jan 24, 2026
Installed on
claude-code13
codex12
gemini-cli12
opencode12
antigravity11
cursor9