backend-principle-eng-python-ml-pro-max
SKILL.md
Backend Principle Eng Python ML Pro Max
Principal-level guidance for Python AI/ML backends, training pipelines, and inference services. Emphasizes data integrity, reproducibility, and production reliability.
When to Apply
- Designing or refactoring ML training or inference systems
- Reviewing ML code for data leakage, evaluation quality, and reliability
- Building feature pipelines, batch scoring, or real-time serving
- Incident response for model regressions or data drift
Priority Model (highest to lowest)
| Priority | Category | Goal | Signals |
|---|---|---|---|
| 1 | Data Quality & Leakage | Trust the data | Clean splits, lineage, leakage checks |
| 2 | Correctness & Reproducibility | Same inputs, same outputs | Versioned data, pinned deps, deterministic runs |
| 3 | Reliability & Resilience | Stable training and serving | Timeouts, retries, graceful degradation |
| 4 | Model Evaluation & Safety | Real-world performance | Offline + online eval, bias checks |
| 5 | Performance & Cost | Efficient training/inference | GPU utilization, batching, cost budgets |
| 6 | Observability & Monitoring | Fast detection | Drift, latency, error budgets |
| 7 | Security & Privacy | Protect sensitive data | Access controls, data minimization |
| 8 | Operability & MLOps | Sustainable delivery | CI/CD, model registry, rollback |
Quick Reference (Rules)
1. Data Quality & Leakage (CRITICAL)
lineage- Track dataset provenance and transformationsleakage- Strict train/val/test separation with time-based splits when neededfeatures- Feature definitions are versioned and documentedvalidation- Schema and distribution checks on every data ingest
2. Correctness & Reproducibility (CRITICAL)
versioning- Data, code, and model versions are pinneddeterminism- Fixed seeds and deterministic ops where possibleconfig- Single source of truth for hyperparametersartifact- Immutable model artifacts and metadata
3. Reliability & Resilience (CRITICAL)
timeouts- Explicit timeouts for all external callsretries- Bounded retries with jitterfallbacks- Safe fallback models or rules when inference failsidempotency- Safe retries for batch scoring
4. Model Evaluation & Safety (HIGH)
offline-eval- Metrics aligned to product goalsonline-eval- Shadow or canary before full rolloutbias- Bias and fairness checks for sensitive domainscalibration- Calibrate probabilities for decision thresholds
5. Performance & Cost (HIGH)
batching- Batch inference to improve throughputcaching- Cache features and embeddings when safeprofiling- Profile training and inference hot spotscost-budgets- Define and enforce cost ceilings
6. Observability & Monitoring (HIGH)
drift- Monitor data and concept driftlatency- Track P95/P99 for inferencequality- Monitor model quality against ground truthalerts- SLO-based alerts with runbooks
7. Security & Privacy (HIGH)
access- Least privilege for data and model artifactspii- Redact or tokenize sensitive fieldssecrets- Use vault/KMS; never in code or logscompliance- Retention and deletion policies
8. Operability & MLOps (MEDIUM)
registry- Model registry with lineage and approvalsrollout- Canary, blue/green, or shadow deploymentsrollback- Fast revert on regressionci-cd- Automated tests for data, training, and serving
Execution Workflow
- Define product goals, metrics, and safety constraints
- Validate data sources and prevent leakage
- Define features and versioned pipelines
- Train with reproducible configs and tracked artifacts
- Evaluate offline, then validate online via shadow or canary
- Deploy with monitoring for drift, latency, and quality
- Establish rollback and retraining triggers
Language-Specific Guidance
See references/python-ml-core.md for stack defaults, MLOps patterns, and tooling.
Weekly Installs
10
Repository
prakharmnnit/sk…personasFirst Seen
Feb 8, 2026
Security Audits
Installed on
opencode10
kilo10
junie10
gemini-cli10
antigravity10
cline10