ml-engineering
SKILL.md
ML Engineering Guide
Production-grade ML/AI systems, MLOps, and model deployment.
When to Use
- Deploying ML models to production
- Building ML platforms and infrastructure
- Implementing MLOps pipelines
- Integrating LLMs into production systems
- Setting up model monitoring and drift detection
Tech Stack
| Category | Tools |
|---|---|
| ML Frameworks | PyTorch, TensorFlow, Scikit-learn, XGBoost |
| LLM Frameworks | LangChain, LlamaIndex, DSPy |
| Data Tools | Spark, Airflow, dbt, Kafka, Databricks |
| Deployment | Docker, Kubernetes, AWS/GCP/Azure |
| Monitoring | MLflow, Weights & Biases, Prometheus |
| Databases | PostgreSQL, BigQuery, Snowflake, Pinecone |
Production Patterns
Model Deployment Pipeline
# Model serving with FastAPI
from fastapi import FastAPI
import torch
app = FastAPI()
model = torch.load("model.pth")
@app.post("/predict")
async def predict(data: dict):
tensor = preprocess(data)
with torch.no_grad():
prediction = model(tensor)
return {"prediction": prediction.tolist()}
Feature Store Integration
# Feast feature store
from feast import FeatureStore
store = FeatureStore(repo_path=".")
features = store.get_online_features(
features=["user_features:age", "user_features:location"],
entity_rows=[{"user_id": 123}]
).to_dict()
Model Monitoring
# Drift detection
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=ref_df, current_data=curr_df)
MLOps Best Practices
Development
- Test-driven development for ML pipelines
- Version control models and data
- Reproducible experiments with MLflow
Production
- A/B testing infrastructure
- Canary deployments for models
- Automated retraining pipelines
- Model monitoring and drift detection
Performance Targets
| Metric | Target |
|---|---|
| P50 Latency | < 50ms |
| P95 Latency | < 100ms |
| P99 Latency | < 200ms |
| Throughput | > 1000 RPS |
| Availability | 99.9% |
LLM Integration Patterns
RAG System
# Basic RAG with LangChain
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA
vectorstore = Pinecone.from_existing_index(
index_name="docs",
embedding=OpenAIEmbeddings()
)
qa = RetrievalQA.from_chain_type(
llm=llm,
retriever=vectorstore.as_retriever()
)
Prompt Management
# Structured prompts with DSPy
import dspy
class QA(dspy.Signature):
"""Answer questions based on context."""
context = dspy.InputField()
question = dspy.InputField()
answer = dspy.OutputField()
qa = dspy.Predict(QA)
Common Commands
# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/
# Training
python scripts/train.py --config prod.yaml
mlflow run . -P epochs=10
# Deployment
docker build -t model:v1 .
kubectl apply -f k8s/model-serving.yaml
# Monitoring
mlflow ui --port 5000
Security & Compliance
- Authentication for model endpoints
- Data encryption (at rest & in transit)
- PII handling and anonymization
- GDPR/CCPA compliance
- Model access audit logging
Weekly Installs
32
Repository
eyadsibai/ltkFirst Seen
Jan 28, 2026
Security Audits
Installed on
gemini-cli27
opencode25
github-copilot24
codex24
claude-code22
kimi-cli20