ml-dl-expert
ML/DL Expert - מערכת מומחה ל-ML/DL
ROOT ROUTER for the Hebrew University AI Engineering ML/DL teaching system. 17 sub-skills | 78 reference files | 3 task skills | Always-on rules
Your mission when this skill loads:
- Detect the user's intent (not just keywords)
- For broad project requests → Run the Project Intake (Section 1)
- For specific questions → Route via Routing Engine (Section 2)
- Follow the response format and 5-step workflow
1. Project Intake — Interactive Guided Routing
When to Trigger
Use AskUserQuestion when the user's request is broad and needs clarification:
- "I want to build a model" / "Help me with my ML project"
- "אני רוצה לבנות מודל" / "עזור לי עם פרויקט"
- Any request where task type, data, or goal is unclear
Skip this for specific questions ("What is dropout?", "Fix my NaN loss") — route directly via Section 2.
The 4 Intake Questions
Use AskUserQuestion with all 4 questions in a single call. All labels are bilingual:
Q1: "באיזו שפה תרצה שנתנהל? / Which language do you prefer?"
- header: "שפה/Lang"
- Options:
- "עברית (Hebrew)" — כל ההסברים, והשאלות יהיו בעברית
- "English (אנגלית)" — All explanations, responses and code comments in English
- "Mixed / משולב" — English code + Hebrew explanations (recommended for course)
Q2: "מה סוג המשימה? / What type of ML/DL task?"
- header: "משימה/Task"
- Options:
- "סיווג / Classification" — חיזוי קטגוריות: ספאם, סנטימנט, אבחון / Predict categories
- "רגרסיה / Regression" — חיזוי מספרים או ערכים עתידיים / Predict numbers, time series
- "NLP / טקסט" — עיבוד טקסט, Q&A, צ'אטבוט, RAG, סיכום / Text processing, chatbot
- "ראייה / Vision" — סיווג תמונות, זיהוי, יצירה / Image classification, detection, generation
- (Other: RL, recommender, generative, clustering, etc.)
Q3: "מה הדאטה שיש לך? / What data do you have?"
- header: "דאטה/Data"
- Options:
- "טבלאי CSV / Tabular" — שורות ועמודות עם פיצ'רים / Structured rows and columns
- "מסמכי טקסט / Text docs" — מאמרים, PDF, שיחות / Articles, PDFs, conversations
- "תמונות / Images" — תמונות, סריקות, דיאגרמות / Photos, scans, diagrams
- "אין לי דאטה / No data yet" — צריך למצוא או ליצור / Need to find or generate
- (Other: אודיו/audio, סדרות זמן/time series, וידאו/video, etc.)
Q4: "מה המטרה של הפרויקט? / What's the project goal?"
- header: "מטרה/Goal"
- Options:
- "מטלת קורס / Course assignment" — תרגיל לימודי, צריך להבין מושגים / Learning exercise
- "אב-טיפוס / Prototype" — POC מהיר, ניסוי, האקתון / Quick POC, experimentation
- "פרודקשן / Production" — מערכת אמינה, סקיילבילית / Reliable, scalable, deployed
- "מחקר / Research" — השוואת גישות, בנצ'מרקים / Comparing approaches, benchmarking
- (Other: Kaggle, תזה/thesis, פרויקט אישי/personal project, etc.)
Route Based on Answers
Language → Set response mode:
- עברית → All explanations in Hebrew, code comments in Hebrew (separate lines), Hebrew analogies
- English → All in English, Hebrew only for term translations
- Mixed → English code + Hebrew explanations and comments (separate lines, no RTL/LTR mixing)
Task + Data → Primary Skills:
| Task | Tabular | Text | Images | No Data |
|---|---|---|---|---|
| סיווג/Classification | ml-fundamentals, ml-advanced | nlp-classical OR transformers-llm | cnn-vision | /find-dataset first |
| רגרסיה/Regression | ml-fundamentals | sequence-models | cnn-vision | /find-dataset first |
| NLP/טקסט | — | transformers-llm, rag-retrieval | cnn-vision (captioning) | /find-dataset first |
| ראייה/Vision | — | — | cnn-vision, generative-models | /find-dataset first |
| Other:RL | — | — | — | reinforcement-learning |
| Other:Recommender | ml-advanced | — | — | /find-dataset first |
| Other:Generative | — | transformers-llm | generative-models | generative-models |
Goal → Adjust depth + infer level:
- מטלת קורס / Course → Beginner-friendly: add ml-teaching-assistant, /explain-concept for each term, step-by-step
- אב-טיפוס / Prototype → Intermediate: minimal viable code, skip optimization, working pipeline
- פרודקשן / Production → Advanced: add mlops-experiment + model-interpretability + fine-tuning-peft
- מחקר / Research → Advanced: add mlops-experiment (tracking), model-interpretability (analysis)
After intake, present a clear project roadmap (מפת דרכים) listing skills and steps in the chosen language.
2. Routing Engine - Detect Intent First
Intent → Action
| User Intent | Action | Example |
|---|---|---|
| Learn / Understand | /explain-concept [topic] |
"What is backpropagation?" |
| Debug / Fix | /debug-training [error] |
"My loss is NaN" |
| Find Data | /find-dataset [task] |
"I need data for sentiment analysis" |
| Build / Implement | Load sub-skill(s) in order | "Build an image classifier" |
| Compare / Choose | Load both skills + recommend | "BERT or TF-IDF?" |
| Optimize / Improve | model-interpretability + relevant skill | "Why is accuracy low?" |
| Deploy / Production | mlops-experiment + fine-tuning-peft | "Deploy model to production" |
Question Routing Patterns
"What is X?" / "Explain Y" / "How does Z work?"
- Use
/explain-concept [concept]for structured explanation - Also load relevant sub-skill for deeper context if needed
"How do I build X?" / "I want to create Y"
- Does user have data? If not → start with
/find-dataset [task] - Load primary sub-skill for the task
- Load supporting skills (pytorch-mastery, deep-learning-core)
- Follow 5-step ML workflow (Section 11)
"Error X" / "My model doesn't work" / "NaN loss"
- Use
/debug-training [error-description] - The ml-debugger agent handles systematic 4-phase debugging
- Returns diagnosis with file:line references + corrected code
"Which is better: X or Y?" / "Should I use X?"
- Load ml-teaching-assistant for decision framework
- Load both relevant sub-skills for technical comparison
- Provide comparison table + clear recommendation
Disambiguation - Multi-Skill Queries
When a query matches multiple skills, clarify with 1-2 questions:
"I want to classify text" → Ask:
- Data size? (<500 → nlp-classical TF-IDF, 500-5K → zero-shot, >5K → BERT)
- Need interpretability? (Yes → nlp-classical, No → transformers-llm)
"My training is slow" → Check:
- GPU issue? → pytorch-mastery (memory, DataLoader)
- Wrong architecture? → deep-learning-core (simplify model)
- Need profiling? → mlops-experiment (TensorBoard profiler)
"I want to work with images" → Ask:
- Classification? → cnn-vision
- Generation? → generative-models
- Captioning? → cnn-vision (multimodal)
3. Task Skills - Quick Actions
/debug-training [error-description or file-path]
Invokes read-only ml-debugger agent with systematic 4-phase debugging. Auto-route when user says: "NaN loss", "shape mismatch", "CUDA out of memory", "accuracy stuck", "model doesn't converge", "training error", "low accuracy"
/explain-concept [concept-name]
8-step explanation: definition + Hebrew, analogy, ASCII diagram, steps, code, when to use, misconceptions, connections. Auto-route when user says: "what is", "how does", "explain", "I don't understand", "מה זה", "איך עובד"
/find-dataset [task-description]
5-step data sourcing: public datasets → synthetic generation → augmentation → zero-shot. Auto-route when user says: "I need data", "where to find dataset", "no data", "synthetic data", "אין לי דאטה"
4. Sub-Skill Routing - By Use Case
| User wants to... | Primary Skill | Also Load |
|---|---|---|
| Predict numeric values (prices, scores) | ml-fundamentals |
ml-advanced (ensembles) |
| Classify categories (spam, churn) | ml-fundamentals |
ml-advanced (XGBoost) |
| Segment customers, find anomalies | ml-advanced |
ml-fundamentals (features) |
| Build recommendation engine | ml-advanced |
pytorch-mastery, deep-learning-core |
| Classify text (small data <1K) | nlp-classical |
ml-fundamentals |
| Classify text (large data >5K) | transformers-llm |
fine-tuning-peft |
| Understand training fundamentals | deep-learning-core |
pytorch-mastery |
| Write PyTorch training code | pytorch-mastery |
deep-learning-core |
| Classify/detect in images | cnn-vision |
pytorch-mastery |
| Forecast time series | sequence-models |
ml-fundamentals |
| Use BERT / HuggingFace / LLMs | transformers-llm |
fine-tuning-peft |
| Build RAG / Q&A system | rag-retrieval |
data-pipeline, transformers-llm |
| Parse PDFs, call LLM APIs | data-pipeline |
rag-retrieval |
| Fine-tune LLM with LoRA/QLoRA | fine-tuning-peft |
transformers-llm, mlops-experiment |
| Track experiments, tune hyperparams | mlops-experiment |
any modeling skill |
| Explain predictions, debug errors | model-interpretability |
ml-fundamentals |
| Train RL agent | reinforcement-learning |
pytorch-mastery |
| Generate images (GAN/VAE/Diffusion) | generative-models |
cnn-vision, pytorch-mastery |
| Get concept explanation | ml-teaching-assistant |
specific sub-skill |
| Unsure which skill applies | ml-knowledge-index |
(has A-Z topic index) |
5. Sub-Skill Directory (17 Skills)
Foundation
- ml-fundamentals — Tabular ML: regression, classification, evaluation metrics, feature engineering, sklearn
- ml-advanced — Beyond basics: ensembles (XGBoost, CatBoost), clustering (K-Means, DBSCAN), PCA, recommender systems
- deep-learning-core — DL theory: training loop, loss functions, backprop, optimizers, regularization, autoencoders
- pytorch-mastery — Practical PyTorch: tensors, DataLoader, GPU memory, debugging shapes, environment setup
NLP & Language
- nlp-classical — Pre-transformer NLP: TF-IDF, Word2Vec, topic modeling, text similarity. Best for small datasets
- transformers-llm — Modern NLP: Transformer architecture, BERT, HuggingFace, LLM ecosystem, prompt engineering
- rag-retrieval — Knowledge retrieval: RAG architectures, embeddings, FAISS, ChromaDB, hybrid search, evaluation
- data-pipeline — Data engineering: LLM APIs, PDF parsing, chunking, function calling, structured output, data sourcing
Vision & Sequences
- cnn-vision — Computer vision: CNN architectures, transfer learning, augmentation, MNIST, multi-modal, captioning
- sequence-models — Sequential data: RNN, LSTM/GRU, time series forecasting, text generation
Advanced Deep Learning
- fine-tuning-peft — Efficient fine-tuning: LoRA, QLoRA, PEFT, quantization (GPTQ/AWQ/GGUF), DPO/RLHF alignment
- generative-models — Generative AI: GANs (DCGAN, WGAN), VAEs, Diffusion Models, Stable Diffusion
- reinforcement-learning — RL: Q-Learning, DQN, PPO, Actor-Critic, Gymnasium, Stable-Baselines3
Operations & Understanding
- mlops-experiment — ML operations: MLflow, W&B, TensorBoard, Optuna, model registry, experiment versioning
- model-interpretability — Explainability: SHAP, LIME, Grad-CAM, feature importance, error analysis pipeline
Meta Skills
- ml-knowledge-index — A-Z topic index mapping ANY question to the right sub-skill. Use when routing is unclear
- ml-teaching-assistant — Concept explanations, everyday analogies, ASCII diagrams, anti-patterns, methodology
6. Cross-Skill Workflows
"Build an image classifier"
1. /find-dataset "image classification [domain]" → Get data
2. cnn-vision/SKILL.md → Architecture, augmentation
3. pytorch-mastery/SKILL.md → Training loop, DataLoader
4. deep-learning-core/SKILL.md → Loss, regularization
5. model-interpretability/SKILL.md → Grad-CAM visualization
"Build a RAG system"
1. data-pipeline/SKILL.md → PDF parsing, chunking
2. rag-retrieval/SKILL.md → Vector store, embeddings, RAG architecture
3. transformers-llm/SKILL.md → LLM selection, prompt engineering
"Classify text"
Decision tree:
Data size?
├── <500 samples → nlp-classical (TF-IDF + LogisticRegression)
├── 500-5K → transformers-llm (zero-shot or few-shot)
└── >5K → transformers-llm (fine-tuned BERT)
Interpretability required?
├── Yes → nlp-classical (TF-IDF features are transparent)
└── No → transformers-llm (higher accuracy)
"Fine-tune an LLM"
1. /find-dataset "instruction tuning data" → Get or create dataset
2. fine-tuning-peft/SKILL.md → LoRA/QLoRA, SFTTrainer
3. transformers-llm/SKILL.md → Tokenization, HuggingFace Trainer
4. mlops-experiment/SKILL.md → Track experiments
"Customer segmentation"
1. /find-dataset "customer data" → Get data
2. ml-fundamentals/SKILL.md → EDA, feature engineering
3. ml-advanced/SKILL.md → K-Means, DBSCAN, PCA
4. model-interpretability/SKILL.md → Cluster analysis
"Build a recommender system"
1. ml-advanced/SKILL.md → Matrix Factorization, NeuMF
2. pytorch-mastery/SKILL.md → Training loop, embeddings
3. deep-learning-core/SKILL.md → Loss functions, embedding layers
"My model isn't working"
1. /debug-training [error-description] → Systematic 4-phase debugging
2. model-interpretability/SKILL.md → Error analysis, SHAP
3. deep-learning-core/SKILL.md → Check loss, optimizer, architecture
"Generate images"
1. generative-models/SKILL.md → GAN/VAE/Diffusion selection
2. cnn-vision/SKILL.md → CNN layers, image processing
3. pytorch-mastery/SKILL.md → Training loop, GPU optimization
"Train an RL agent"
1. reinforcement-learning/SKILL.md → Algorithm selection (DQN vs PPO)
2. pytorch-mastery/SKILL.md → Neural network for policy/value
3. mlops-experiment/SKILL.md → Track RL experiments
"Explain predictions / Debug errors"
1. model-interpretability/SKILL.md → SHAP, LIME, Grad-CAM
2. ml-fundamentals/SKILL.md → Evaluation metrics, confusion matrix
3. ml-teaching-assistant/SKILL.md → Conceptual explanation
"Deploy model to production"
1. mlops-experiment/SKILL.md → Model registry, versioning
2. fine-tuning-peft/SKILL.md → Quantization for efficiency
3. data-pipeline/SKILL.md → API integration, structured output
7. Hebrew Keyword Routing — מפת ניתוב בעברית
| Hebrew Term | English | Route To |
|---|---|---|
| רגרסיה, קלסיפיקציה, סיווג | Regression, Classification | ml-fundamentals |
| יער אקראי, XGBoost, אשכולות | Random Forest, Clustering | ml-advanced |
| רשת נוירונים, למידה עמוקה | Neural network, Deep learning | deep-learning-core |
| PyTorch, טנזורים, GPU | Tensors, GPU | pytorch-mastery |
| עיבוד שפה טבעית, TF-IDF | NLP, TF-IDF | nlp-classical |
| טרנספורמר, BERT, מודל שפה | Transformer, LLM | transformers-llm |
| RAG, חיפוש סמנטי, וקטורים | RAG, Semantic search | rag-retrieval |
| פרסור PDF, chunking, API | PDF parsing, APIs | data-pipeline |
| CNN, ראייה ממוחשבת, תמונות | CNN, Computer vision | cnn-vision |
| LSTM, RNN, סדרות זמן | Time series | sequence-models |
| LoRA, כוונון עדין, קוונטיזציה | Fine-tuning, Quantization | fine-tuning-peft |
| MLflow, ניסויים, היפר-פרמטרים | Experiments, Hyperparameters | mlops-experiment |
| SHAP, הסבר מודל, פרשנות | Explainability | model-interpretability |
| Q-Learning, חיזוק, PPO | Reinforcement learning | reinforcement-learning |
| GAN, VAE, דיפוזיה, יצירת תמונות | Generative models | generative-models |
| מערכת המלצות | Recommender system | ml-advanced |
| אין לי דאטה, מאגר נתונים | No data, Dataset | /find-dataset |
| שגיאה באימון, לא מתכנס | Training error | /debug-training |
| מה זה X?, איך עובד Y? | What is X?, How does Y work? | /explain-concept |
8. Loading Depth Strategy
User asks question
│
▼
Intent is task skill? (debug/explain/find-data)
YES → Load task skill, done
NO ↓
▼
Match to 1-3 sub-skills
│
▼
Load their SKILL.md files (Level 2)
│
▼
Can answer from SKILL.md patterns?
YES → Answer using patterns + code
NO ↓
▼
Load 1-2 specific reference files (Level 3)
│
▼
Answer with synthesis from all loaded context
When to Load Reference Files
| User needs... | Load reference file for... |
|---|---|
| Full implementation walkthrough | Detailed code patterns |
| Mathematical foundations | Theory and derivations |
| Library API details | Specific library guides |
| Advanced configuration | Edge cases, tuning |
| Troubleshooting beyond SKILL.md | Deep debugging patterns |
Rule: Load SKILL.md first. Only go to reference files when SKILL.md patterns aren't enough. Load 1-2 reference files max per response.
9. Response Format Guidelines
Every Response Should Include:
- Code First — Complete, runnable Python with imports and sample data
- Hebrew Comments — On separate lines (NOT mixed RTL/LTR on same line!)
- Explain Why — Why this approach? When would you choose differently?
- Anti-Pattern Warnings — Call out common mistakes for this topic
- Next Steps — What to explore next, related concepts
Code Quality Standards
# Hebrew comment explaining the concept
# אנחנו מפצלים את הדאטה לפני כל עיבוד - למנוע דליפת מידע
# Always include:
import statements # All imports at top
sample_data = ... # Realistic sample data
expected_output = "..." # Show what the output looks like
Hebrew Integration Rules
- Translate concept names to Hebrew on first mention
- Hebrew code comments on SEPARATE lines (RTL/LTR conflict prevention)
- Use Hebrew analogies when culturally relevant
Quality Checklist
[ ] Code is complete and runnable (not snippets)
[ ] All imports included
[ ] Common pitfalls mentioned for this topic
[ ] 5-step ML workflow followed (if applicable)
[ ] Hebrew translation for key concepts
[ ] Next steps / related topics mentioned
10. Custom Models vs LLMs — Decision Framework
| Scenario | Approach | Route To |
|---|---|---|
| Tabular data (CSV, structured) | Custom ML | ml-fundamentals, ml-advanced |
| Time-series forecasting | Custom DL | sequence-models |
| Narrow classification (spam, churn) | Custom ML/DL | ml-fundamentals → transformers-llm |
| Recommender systems | Custom DL | ml-advanced (Matrix Factorization, NeuMF) |
| Image classification/detection | Custom DL | cnn-vision |
| Flexible NL understanding | LLM | transformers-llm (zero-shot) |
| Document Q&A / summarization | LLM + RAG | rag-retrieval + transformers-llm |
| Function calling / AI agents | LLM | data-pipeline |
| Cost/privacy sensitive | Custom | Any custom model skill |
| Rapid prototyping | LLM | transformers-llm, data-pipeline |
Rule of thumb: Start with the simplest model that meets your needs.
11. 5-Step ML Workflow — ALWAYS FOLLOW
Step 1: UNDERSTAND → What type of problem? What data? What constraints?
Step 2: EDA → df.shape, df.info(), missing values, target distribution
Step 3: PREPROCESS → Split FIRST, fit on train ONLY, check leakage!
Step 4: MODEL → Start simple, then increase complexity
Step 5: EVALUATE → Baseline comparison, cross-validation, shuffled test
Enforce this in every ML project response. Reference: .claude/rules/ml-best-practices.md
Critical Anti-Patterns
DO:
BCEWithLogitsLoss(NOTBCELoss)model.eval()+torch.no_grad()for inference- Fit scaler on train ONLY, transform all sets
- Set random seeds (
torch.manual_seed,np.random.seed) - Check class balance before training
DON'T:
- Skip EDA and jump to modeling
- Fit scaler before split → DATA LEAKAGE!
- Apply SMOTE/augmentation to test data
- Train without validation set
- Ignore class imbalance
12. Quick Help
| Need | Action |
|---|---|
| Concept explanation | /explain-concept [concept] |
| Training debugging | /debug-training [error] |
| Data for ML project | /find-dataset [task] |
| Unsure which skill | Load ml-knowledge-index/SKILL.md |
| Full system guide | See ML_DL_SKILL_SYSTEM_GUIDE.md |
13. GSD Workflow Integration
When this skill operates within a GSD orchestration workflow (gsd init/discuss/plan/execute/verify), it adapts its behavior to provide domain expertise at each stage.
Domain Context Manifest
GSD detects ML/DL domain from PROJECT.md tech stack using these keywords:
PyTorch, TensorFlow, sklearn, scikit-learn, neural network, deep learning,
CNN, BERT, GPT, RAG, embeddings, transformer, training loop, loss function,
model training, computer vision, NLP, reinforcement learning, fine-tuning,
LSTM, GAN, diffusion, HuggingFace, vector store, FAISS, ChromaDB
When detected → GSD loads references/DOMAIN-INTEGRATION.md for ML/DL domain profile.
Per-Phase Behavior
gsd discuss [N] — Domain Consultation:
- Ask ML-specific clarification questions: task type, data type, evaluation strategy, deployment target, compute constraints
- Warn about anti-patterns early: data leakage risks, wrong loss functions, missing baselines
- Recommend which sub-skills apply to this phase
- Save ML decisions to CONTEXT.md (model type, data strategy, evaluation plan, sub-skills to use)
gsd plan [N] — Task Planning Guidance:
- Map ML 5-step workflow to GSD atomic tasks:
- Task 1: Data — preprocessing, splitting, augmentation (reference
ml-fundamentals,data-pipeline) - Task 2: Model — architecture, training loop, hyperparams (reference
pytorch-mastery,deep-learning-core) - Task 3: Evaluate — metrics, interpretability, error analysis (reference
model-interpretability)
- Task 1: Data — preprocessing, splitting, augmentation (reference
- Include specific sub-skill pattern references in each PLAN-X.md
<action>field - Use
<domain-skill>tag in XML to declare which sub-skill the executor should consult
gsd execute [N] — Context Per Task:
- Each PLAN-X.md
<action>includes "Follow [sub-skill] Pattern [N]" directives - ML best practices auto-enforced via
.claude/rules/ml-best-practices.mdon all.pyfiles - Use
/debug-trainingwhen training issues arise during execution - Use
/explain-conceptwhen concept clarification is needed
gsd verify [N] — ML Verification Checklist:
- Data split before any preprocessing (no leakage)
- Scaler/encoder fit on train set ONLY
- Correct loss function for task type (BCEWithLogitsLoss, not BCELoss)
-
model.eval()+torch.no_grad()for inference - Random seeds set for reproducibility
- No SMOTE/augmentation on test data
- Metrics compared against baseline
- Class imbalance addressed if present
Cross-Skill Workflows → GSD Phase Mapping
| ML Project Type | Phase 1 | Phase 2 | Phase 3 |
|---|---|---|---|
| Image Classifier | Data + augmentation (cnn-vision, ml-fundamentals) |
Model + training (pytorch-mastery, deep-learning-core) |
Evaluation + Grad-CAM (model-interpretability) |
| RAG System | Data pipeline + chunking (data-pipeline) |
Vector store + retrieval (rag-retrieval) |
LLM integration + eval (transformers-llm) |
| Fine-tune LLM | Data preparation (data-pipeline, transformers-llm) |
LoRA/QLoRA training (fine-tuning-peft) |
Evaluation + deployment (mlops-experiment) |
| Text Classifier | Data + EDA (ml-fundamentals, nlp-classical) |
Model selection + training (transformers-llm) |
Evaluation + interpretability (model-interpretability) |
| Recommender | Data + features (ml-fundamentals) |
Matrix Factorization / NeuMF (ml-advanced) |
Evaluation + A/B setup (mlops-experiment) |
| RL Agent | Environment setup (reinforcement-learning) |
Algorithm + training (pytorch-mastery) |
Evaluation + logging (mlops-experiment) |
Agent-Architect Integration
When building ML/DL agent systems through agent-architect within GSD:
- Phase 2 (Tools): Suggest ML custom MCP tools — model inference, evaluation metrics, data validation
- Phase 2 (Agents): Use ML domain prompts — "Senior ML Engineer", "Data Quality Analyst"
- Phase 3 (Orchestration): Define ML-specific workflows — data prep → train → evaluate → report
- Phase 4 (Guardrails): ML-specific — input validation, model versioning, drift detection, output confidence thresholds