Benchmarking & Performance
SKILL.md
Skill: Benchmarking & Performance
When to use this skill
- After adding or modifying a strategy
- To validate that a strategy is profitable
- To compare different configurations
- Before going from paper trading to live
Available scripts
| Script | Usage |
|---|---|
scripts/quick_benchmark.sh SYMBOL [DAYS] |
Quick benchmark |
scripts/validate_strategy.sh STRATEGY |
Multi-period validation |
Key metrics to monitor
Profitability metrics
| Metric | Description | Acceptable threshold |
|---|---|---|
| Total Return | Total return over period | > 0% |
| Win Rate | % of winning trades | > 50% (trend) or > 40% (mean rev) |
| Profit Factor | Gains / Losses | > 1.5 |
| Average Trade | Average P&L per trade | > 0 |
Risk metrics
| Metric | Description | Acceptable threshold |
|---|---|---|
| Sharpe Ratio | Risk-adjusted return | > 1.0 (good), > 2.0 (excellent) |
| Sortino Ratio | Same but penalizes downside | > 1.5 |
| Max Drawdown | Maximum loss from peak | < 20% |
| Time in Market | % of time with position | Depends on strategy |
Interpretation
Sharpe Ratio:
< 0.5 → Bad, don't use
0.5-1 → Mediocre, needs improvement
1-2 → Good
2-3 → Very good
> 3 → Excellent (or suspicious, check overfitting)
Max Drawdown:
< 10% → Conservative
10-20% → Moderate
20-30% → Aggressive
> 30% → Dangerous
Benchmark commands
Simple benchmark
# Backtest on one symbol
cargo run --bin benchmark -- --symbol AAPL --days 365
# Backtest on multiple symbols
cargo run --bin benchmark -- --symbols "AAPL,GOOGL,MSFT" --days 365
Advanced benchmark
# Parallel mode (multi-core)
cargo run --bin benchmark -- --parallel --symbols "AAPL,GOOGL,MSFT"
# With sequential comparison
cargo run --bin benchmark -- --compare-sequential
# Parameter matrix
cargo run --bin benchmark_matrix
Available scripts
# Stock benchmark
./scripts/benchmark_stocks.sh
# Market regime benchmark
./scripts/run_regime_benchmarks.sh
# Automatic benchmark
./scripts/auto_benchmark.sh
Strategy validation workflow
Step 1: Initial backtest
cargo run --bin benchmark -- --strategy <STRATEGY> --days 365
Verify:
- Sharpe Ratio > 1.0
- Max Drawdown < 20%
- Win Rate consistent with strategy type
- Profit Factor > 1.5
Step 2: Test on different periods
# Bull period
cargo run --bin benchmark -- --start 2021-01-01 --end 2021-12-31
# Bear period
cargo run --bin benchmark -- --start 2022-01-01 --end 2022-12-31
# Volatile period
cargo run --bin benchmark -- --start 2020-02-01 --end 2020-04-30
The strategy must be profitable (or at least not lose too much) in ALL conditions.
Step 3: Multi-symbol test
cargo run --bin benchmark -- --symbols "AAPL,MSFT,GOOGL,AMZN,META"
Verify result consistency across different assets.
Step 4: Stress test
Test on crash periods:
- COVID crash: February-March 2020
- 2022 Bear market: January-October 2022
- Flash crashes: Verify resilience
Pitfalls to avoid
Overfitting
Symptoms:
- Sharpe Ratio > 3 on backtest
- Performance degrades in live/forward test
- Too many optimized parameters
Solutions:
- Use train/test split
- Test on out-of-sample data
- Prefer simple strategies
Look-ahead bias
Symptom: Using future data in decisions
Solution: Verify indicators only use past data
Survivorship bias
Symptom: Only testing on assets that still exist
Solution: Include delisted assets in backtests
Key files
| File | Description |
|---|---|
src/bin/benchmark.rs |
Main benchmark CLI |
src/bin/benchmark_matrix.rs |
Parameter matrix tests |
src/application/optimization/parallel_benchmark.rs |
Parallel execution |
src/application/optimization/benchmark_metrics.rs |
Benchmark metrics |
src/domain/performance/metrics.rs |
Sharpe, Sortino, Drawdown calculation |
benchmark_results/ |
Saved results |
Checklist before production
- Positive backtests on 2+ years of data
- Sharpe Ratio > 1.0 on different periods
- Acceptable Max Drawdown (< 20% recommended)
- Tested on bull, bear AND sideways markets
- No sign of overfitting
- Paper trading validated for 1+ month