portfolio-optimization
Portfolio Optimization
Overview
This skill provides guidance for implementing high-performance portfolio optimization algorithms using Python C extensions. It covers the workflow for creating C extensions that interface with NumPy arrays, proper verification strategies, and common pitfalls to avoid when optimizing numerical computations.
When to Apply This Skill
Apply this skill when:
- Implementing portfolio risk calculations (variance, volatility, Sharpe ratio)
- Optimizing matrix-vector operations for large asset portfolios
- Creating C extensions for Python numerical code
- Performance requirements specify speedup ratios (e.g., >= 1.2x)
- Working with covariance matrices and portfolio weights
Recommended Workflow
Phase 1: Codebase Understanding
Before writing any code:
- Read all relevant source files completely - Understand the baseline implementation, data structures, and expected interfaces
- Identify the mathematical operations - Common operations include:
- Matrix-vector multiplication (covariance matrix times weights)
- Dot products (weights times returns)
- Square root operations (for volatility from variance)
- Understand the test suite - Know what correctness tolerances are expected (e.g., 1e-10) and what performance benchmarks must be met
- Document the input/output contracts - Array shapes, data types (typically float64), and return value specifications
Phase 2: Implementation Planning
Consider these factors before implementation:
-
Why C provides speedup:
- Eliminates Python interpreter overhead
- Enables direct memory access without bounds checking
- Allows compiler optimizations (vectorization, loop unrolling)
- Reduces temporary array allocations
-
Design decisions to make:
- Whether to use NumPy C API for zero-copy array access
- Memory layout assumptions (C-contiguous vs Fortran-contiguous)
- Error handling strategy for type mismatches and dimension errors
-
Potential algorithmic optimizations:
- Cache-friendly memory access patterns (row-major iteration for C arrays)
- SIMD vectorization opportunities
- Minimizing Python-to-C data conversion overhead
Phase 3: C Extension Implementation
When implementing the C extension:
-
Include proper headers:
Python.h(must be first)numpy/arrayobject.hfor NumPy array access
-
Initialize NumPy in the module init function:
- Call
import_array()to initialize NumPy C API
- Call
-
Use NumPy C API for array access:
PyArray_DATA()for getting data pointerPyArray_DIM()for dimensionsPyArray_STRIDE()for memory strides- Check
PyArray_IS_C_CONTIGUOUS()for memory layout
-
Implement robust error handling:
- Validate array dimensions match expected shapes
- Check data types (expect
NPY_FLOAT64for double precision) - Handle non-contiguous arrays (either reject or handle strides)
- Set appropriate Python exceptions on error
Phase 4: Python Wrapper Implementation
Create a Python module that:
- Imports the C extension module
- Provides a clean interface matching the baseline API
- Handles any necessary array preparation (ensuring contiguity)
- Documents the interface clearly
Phase 5: Verification Strategy
Critical: Verify every change completely
-
After editing files, re-read them - Confirm edits were applied correctly, especially for multi-line changes
-
Test incrementally:
- Build the C extension first and verify it compiles
- Test individual functions before running full benchmarks
- Use small test cases for correctness verification before scaling up
-
Correctness verification:
- Compare outputs against baseline implementation
- Use appropriate numerical tolerances (typically 1e-10 for double precision)
- Test with known inputs where expected outputs can be calculated manually
-
Performance verification:
- Run benchmarks with representative data sizes
- Verify speedup meets requirements across different portfolio sizes
- Test edge cases: small portfolios (n=1, n=10), large portfolios (n=5000+)
Edge Cases to Handle
Ensure the implementation addresses:
- Empty portfolios (n=0) - Return appropriate default or error
- Single-asset portfolios (n=1) - Degenerate case for covariance
- Dimension mismatches - Weights vector length vs covariance matrix dimensions
- Invalid inputs:
- Non-square covariance matrices
- NaN or infinity values in inputs
- Negative variance (mathematically invalid)
- Memory considerations:
- Non-contiguous NumPy arrays
- Memory allocation failures in C code
- Large portfolios that may stress memory
Common Pitfalls to Avoid
Code Completeness
- Never truncate code in edit operations - always provide complete implementations
- Verify file contents after editing to confirm changes applied correctly
- Document all design choices explicitly
Testing Approach
- Avoid going directly from implementation to full benchmark testing
- Test each function individually before integration testing
- Do not rely solely on "tests pass" for validation - understand why they pass
C Extension Specific
- Always check NumPy array types before accessing data
- Handle reference counting properly to avoid memory leaks
- Initialize NumPy API with
import_array()in module init - Use
PyErr_SetString()to set exceptions on errors
Performance Validation
- Verify speedup is consistent across different input sizes
- Profile if further optimizations might be needed
- Consider the overhead of Python-to-C transitions for small inputs
Build and Test Commands
Typical workflow commands:
# Build the C extension
python setup.py build_ext --inplace
# Run correctness tests
python -c "from portfolio_optimized import *; # test calls"
# Run benchmark
python benchmark.py
# Run full test suite
pytest test_portfolio.py -v
Verification Checklist
Before considering the task complete:
- All source files read and understood
- C extension compiles without warnings
- Individual functions tested for correctness
- Numerical results match baseline within tolerance
- Performance meets speedup requirements
- Edge cases explicitly tested or handled
- Error handling implemented for invalid inputs
- File contents verified after all edits
- No memory leaks in C code (proper reference counting)