algo-risk-credit
Credit Scoring Model
Overview
Credit scoring models predict the probability of default (PD) from borrower characteristics using logistic regression or gradient boosting. Output: a score (300-850 range) or PD (0-1). Used for loan approval, pricing, and portfolio risk management.
When to Use
Trigger conditions:
- Building a scorecard for loan/credit approval decisions
- Predicting default probability for risk-based pricing
- Evaluating existing credit models for discriminatory power
When NOT to use:
- For corporate bankruptcy prediction (use Altman Z-Score)
- For market risk measurement (use VaR)
Algorithm
IRON LAW: A Credit Model Must Discriminate AND Be Calibrated
Discrimination (AUC): correctly ranking good vs bad borrowers.
Calibration: predicted PD matches actual default rates.
A model with AUC=0.85 but predicted PD 2x actual default rate will
cause systematic over/under-pricing. Need BOTH properties.
Phase 1: Input Validation
Collect: borrower features (income, debt ratio, credit history length, delinquency count, utilization), outcome variable (default within 12-24 months). Handle: missing values, class imbalance (typically 2-5% default rate). Gate: Sufficient defaults (300+ events), features available at decision time.
Phase 2: Core Algorithm
- Feature engineering: WOE (Weight of Evidence) binning for logistic regression, or direct encoding for GBDT
- Train model: logistic regression (interpretable, regulatory-preferred) or GBDT (higher accuracy)
- Calibrate: Platt scaling on holdout, ensure predicted PD matches actual default rate by decile
- Convert to score: Score = offset + factor × log(odds), scaled to 300-850 range
Phase 3: Verification
Evaluate: AUC (>0.70 acceptable, >0.80 good), KS statistic, Gini coefficient. Population stability index (PSI) for monitoring drift. Gate: AUC > 0.70, calibration acceptable, no discriminatory bias in protected attributes.
Phase 4: Output
Return score, PD, and key risk drivers.
Output Format
{
"score": 680,
"pd": 0.035,
"risk_grade": "B",
"top_risk_factors": [{"factor": "high_utilization", "impact": -45}, {"factor": "short_history", "impact": -30}],
"metadata": {"model": "logistic_regression", "auc": 0.78, "vintage": "2024-Q3"}
}
Examples
Sample I/O
Input: Borrower: income=$60K, DTI=35%, 5yr credit history, 0 delinquencies, 60% utilization Expected: Score ~680, PD ~3.5%, Grade B (some risk from high utilization)
Edge Cases
| Input | Expected | Why |
|---|---|---|
| No credit history (thin file) | High uncertainty, default to conservative | Insufficient data for scoring |
| All features identical | Same score regardless of outcome | Model can't differentiate — need more features |
| Major economy shift | PSI > 0.25, model needs recalibration | Population has shifted from training distribution |
Gotchas
- Reject inference: Training data only includes approved applicants. Rejected applicants' outcomes are unknown, creating selection bias. Use reject inference techniques.
- Fair lending: Models must not discriminate by protected attributes (race, gender, age). Even proxy variables (zip code ≈ race) can create disparate impact. Test with fairness metrics.
- Through-the-door vs on-the-books: TTD samples include all applicants; OTB only approved ones. Model purpose determines which sample to use.
- Vintage analysis: Default rates vary by economic conditions. A 2019-trained model may not predict well in a recession. Track model performance by vintage.
- Regulatory requirements: Financial regulators (Basel, OCC, FDIC) have specific requirements for model validation, documentation, and fair lending testing.
References
- For WOE binning methodology, see
references/woe-binning.md - For reject inference techniques, see
references/reject-inference.md