Sharpe Ratio Non-IID Corrections

Generalized Sharpe ratio inference framework for non-Normal, serially correlated returns. Reference implementation of López de Prado, Lipton & Zoonekynd (2026).

Self-Evolving Skill: This skill improves through use. If instructions are wrong, parameters drifted, or a workaround was needed — fix this file immediately, don't defer. Only update for real, reproducible issues.

Quick Start

# Run all 18 validation tests (paper Exhibit 1 + numerical examples)
uv run --python 3.13 --with numpy --with scipy --with numba \
  python plugins/quant-research/skills/sharpe-ratio-non-iid-corrections/references/sharpe_numba.py

Paper Metadata

Field	Value
Title	How to Use the Sharpe Ratio
Authors	Marcos López de Prado, Alexander Lipton, Vincent Zoonekynd
Series	ADIA Lab Research Paper Series, No. 19
Date	March 7, 2026
SSRN	5520741
Replication	https://github.com/zoonek/2025-sharpe-ratio
Source PDF	lopez-de-prado-sharpe-2026.pdf

References

Topic	Reference File
Full Paper Transcription (82 equations)	how-to-use-the-sharpe-ratio-2026.md
Equation → Implementation Tracker	sharpe-paper-tracker.md
Numba JIT Implementation (850+ lines)	sharpe_numba.py
Source PDF	lopez-de-prado-sharpe-2026.pdf
Paper Figures	media/ (8 figures from pp. 12-45)

Preflight: Experimentation Context Assessment (CRITICAL)

Before computing DSR or SFDR, Claude Code MUST assess the experimentation context to determine the correct parameters. These inputs require subjective judgment that cannot be derived from code alone.

Decision: DSR vs SFDR

Situation	Use	Why
You have return series for all K strategies tested	DSR	Can compute K_eff from eigenvalues of correlation matrix
Strategy is the survivor of a long, unrecoverable experimentation history	SFDR	K is unknowable; SFDR sets a threshold without needing K
Evaluating a single strategy with no selection involved	PSR	No multiple-testing correction needed (K=1)

Default for this project: Use SFDR. Our strategies are survivors of hundreds of regime explorations across WFO configurations, feature sets, model architectures, and metric cutoffs. The full experimentation history is unrecoverable.

Parameters Requiring Judgment

Before calling sfdr_threshold(), assess each parameter by examining the project context:

Parameter	What to assess	How to assess it
P[H₁] (prior: fraction of genuine strategies)	"Out of all ideas ever tried in this project, what fraction were genuinely profitable?"	Check git history, experiment logs, signal-archaeology skill, WFO result archives. Conservative default: 0.05 (1 in 20 ideas is real).
SR₁ (alternative: what a real strategy's Sharpe looks like)	"When a strategy IS real, what Sharpe does it typically achieve?"	Look at validated production strategies, published benchmarks for the asset class. Crypto range bars: 0.3–0.5 monthly.
q (false discovery tolerance)	"What false positive rate can I tolerate?"	Research: 0.05. Production capital allocation: 0.01. Exploratory screening: 0.10.
γ₃, γ₄, ρ (return distribution shape)	Skewness, Pearson kurtosis, lag-1 autocorrelation	Compute directly from the return series under evaluation. These are objective — no judgment needed.
T (sample length)	Number of return observations	Count from data. Objective.

Preflight Checklist

When this skill is invoked, Claude Code should:

Identify the evaluation context: Is this a single new strategy, a WFO fold comparison, or a survivor from extensive search?
Determine DSR vs SFDR: If return series for all candidates exist → DSR with K_eff. Otherwise → SFDR.
Elicit P[H₁]: Search for experimentation history (git log, experiment catalogues, the signal-archaeology skill). Count approximate ideas tried vs ideas that worked. If unknowable, use 0.05.
Estimate SR₁: From validated production strategies or asset-class benchmarks.
Set q: Based on the decision's consequence (research paper vs capital deployment).
Compute objective parameters: γ₃, γ₄, ρ, T directly from the return series.
Report the assessment: State all parameter choices and reasoning before computing, so the user can override.

Example Preflight Output

SFDR Preflight Assessment:
  Context:    Survivor of ~300 WFO configs × 5 feature sets × 3 architectures
  Method:     SFDR (experimentation history unrecoverable)
  P[H₁]:     0.05 (conservative — ~15 genuine signals from ~300 ideas)
  SR₁:       0.4 (typical monthly Sharpe for validated crypto range bar strategies)
  q:         0.05 (research-grade threshold)
  γ₃:        -2.448 (computed from return series)
  γ₄:        10.164 (Pearson kurtosis, computed from return series)
  ρ:         0.20 (lag-1 autocorrelation, computed from return series)
  T:         24 months
  → SFDR threshold: SR_c = 0.760
  → Observed SR: 0.456
  → VERDICT: FAIL (observed SR below SFDR threshold)

The user may override any parameter. If they disagree with P[H₁] or SR₁, recompute with their values.

Key Formulas

All equations use the paper's non-IID variance (Eq 2-3) with Pearson kurtosis convention (γ₄=3 for Gaussian).

Eq	Name	Formula	Function
2-3	SR Variance	V[SR̂] = (1/T)·(a − b·γ₃·SR + c·(γ₄−1)/4·SR²)	`sr_variance()`
9	PSR	Φ((SR̂ − SR₀) / σ[SR₀])	`psr()`
11	MinTRL	(a − b·γ₃·SR₀ + c·(γ₄−1)/4·SR₀²) · (z_{1−α}/(SR̂−SR₀))²	`min_trl()`
13	Critical SR	SR₀ + σ[SR₀]·z_{1−α}	`critical_sr()`
15	Power	1 − Φ((SR_c − SR₁) / σ[SR₁])	`power()`
17	β (Type II)	Φ((z_{1−α}·√(a(ρ)) − SR₁·√T) / √(a − b·γ₃·SR₁ + c·(γ₄−1)/4·SR₁²))	`power()` → `1 - power`
21	pFDR	(1 + (1−β)·P[H₁]/(α·P[H₀]))⁻¹	`pfdr()`
24	oFDR	p·P[H₀] / (p·P[H₀] + (1−Φ[z*(SR₁)])·P[H₁])	`ofdr()`
28	E[max SR]	SR₀ + √V · ((1−γ)·Φ⁻¹[1−1/K] + γ·Φ⁻¹[1−1/(Ke)])	`expected_max_sr()`
29-31	DSR	PSR with SR₀ = E[max{SR̂_k}], σ = √V[max{SR̂_k}]	`dsr()`
32-33	SFDR	Find SR_c such that pFDR(SR_c) = q	`sfdr_threshold()`

Where a = (1+ρ)/(1−ρ), b = (1+ρ+ρ²)/(1−ρ²), c = (1+ρ²)/(1−ρ²) are AR(1) variance coefficients (ar1_variance_coeffs()).

Numerical Example (Paper Exhibit 1)

Hedge fund: T=24 months, γ₃=−2.448, γ₄=10.164 (Pearson), SR=0.036/0.079≈0.456, ρ=0.2.

Quantity	Equation	Value	Notes
σ[SR̂] (non-Gaussian)	Eq 3	0.379	vs 0.214 Gaussian — 77% wider
PSR (SR₀=0)	Eq 9	0.966	Still significant despite wider CI
PSR (SR₀=0.1)	Eq 9+5	0.900	Harder benchmark reduces confidence
MinTRL (SR₀=0)	Eq 11	19.543 months	T=24 > 19.5 → sufficient
MinTRL (SR₀=0.1)	Eq 11	39.369 months	More than doubles for SR₀ closer to SR̂
β (power, SR₁=0.5)	Eq 17	0.411	vs 0.224 IID Normal — 84% higher
pFDR (P[H₁]=0.1)	Eq 21	0.433	43.3% false discovery when true strategies rare
oFDR (SR₁=0.5)	Eq 24	0.361	Even 3.4% p-value → 36% observed FDR
Power (1−β)	Eq 15	0.589	Low for 24-month track record

Implementation Architecture

6-tier dependency hierarchy (Numba JIT, Rust-ready):

Tier 0: norm_cdf, norm_ppf, erfinv, Brent's method
Tier 1: sr_variance (Eqs 3/58)
Tier 2: psr (Eq 9), min_trl (Eq 11), critical_sr (Eq 13)
Tier 3: power (Eq 15), moments_mk (Eqs 62-65), expected_max_sr (Eq 28), var_max_sr
Tier 4: pfdr (Eq 18-21), ofdr (Eqs 22-24), fwer (Eq 25)
Tier 5: dsr (Eqs 29-31), sfdr_threshold (Eqs 32-33)  ← APEX

Related Skills

Skill	Relationship
opendeviation-eval-metrics	Consumes PSR, DSR, MinTRL for range bar evaluation; has quick-ref formulas (ρ=0 case)
adaptive-wfo-epoch	Uses DSR for WFE validation across walk-forward folds
evolutionary-metric-ranking	DSR as one of the metrics in multi-objective ranking

Post-Execution Reflection

After this skill completes, check before closing:

Did the command succeed? — If not, fix the instruction or error table that caused the failure.
Did parameters or output change? — If the underlying tool's interface drifted, update Usage examples and Parameters table to match.
Was a workaround needed? — If you had to improvise (different flags, extra steps), update this SKILL.md so the next invocation doesn't need the same workaround.

Only update if the issue is real and reproducible — not speculative.