skills/eyadsibai/ltk/ml-engineering

ml-engineering

SKILL.md

ML Engineering Guide

Production-grade ML/AI systems, MLOps, and model deployment.

When to Use

  • Deploying ML models to production
  • Building ML platforms and infrastructure
  • Implementing MLOps pipelines
  • Integrating LLMs into production systems
  • Setting up model monitoring and drift detection

Tech Stack

Category Tools
ML Frameworks PyTorch, TensorFlow, Scikit-learn, XGBoost
LLM Frameworks LangChain, LlamaIndex, DSPy
Data Tools Spark, Airflow, dbt, Kafka, Databricks
Deployment Docker, Kubernetes, AWS/GCP/Azure
Monitoring MLflow, Weights & Biases, Prometheus
Databases PostgreSQL, BigQuery, Snowflake, Pinecone

Production Patterns

Model Deployment Pipeline

# Model serving with FastAPI
from fastapi import FastAPI
import torch

app = FastAPI()
model = torch.load("model.pth")

@app.post("/predict")
async def predict(data: dict):
    tensor = preprocess(data)
    with torch.no_grad():
        prediction = model(tensor)
    return {"prediction": prediction.tolist()}

Feature Store Integration

# Feast feature store
from feast import FeatureStore

store = FeatureStore(repo_path=".")
features = store.get_online_features(
    features=["user_features:age", "user_features:location"],
    entity_rows=[{"user_id": 123}]
).to_dict()

Model Monitoring

# Drift detection
from evidently import ColumnMapping
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=ref_df, current_data=curr_df)

MLOps Best Practices

Development

  • Test-driven development for ML pipelines
  • Version control models and data
  • Reproducible experiments with MLflow

Production

  • A/B testing infrastructure
  • Canary deployments for models
  • Automated retraining pipelines
  • Model monitoring and drift detection

Performance Targets

Metric Target
P50 Latency < 50ms
P95 Latency < 100ms
P99 Latency < 200ms
Throughput > 1000 RPS
Availability 99.9%

LLM Integration Patterns

RAG System

# Basic RAG with LangChain
from langchain.vectorstores import Pinecone
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import RetrievalQA

vectorstore = Pinecone.from_existing_index(
    index_name="docs",
    embedding=OpenAIEmbeddings()
)
qa = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=vectorstore.as_retriever()
)

Prompt Management

# Structured prompts with DSPy
import dspy

class QA(dspy.Signature):
    """Answer questions based on context."""
    context = dspy.InputField()
    question = dspy.InputField()
    answer = dspy.OutputField()

qa = dspy.Predict(QA)

Common Commands

# Development
python -m pytest tests/ -v --cov
python -m black src/
python -m pylint src/

# Training
python scripts/train.py --config prod.yaml
mlflow run . -P epochs=10

# Deployment
docker build -t model:v1 .
kubectl apply -f k8s/model-serving.yaml

# Monitoring
mlflow ui --port 5000

Security & Compliance

  • Authentication for model endpoints
  • Data encryption (at rest & in transit)
  • PII handling and anonymization
  • GDPR/CCPA compliance
  • Model access audit logging
Weekly Installs
32
Repository
eyadsibai/ltk
First Seen
Jan 28, 2026
Installed on
gemini-cli27
opencode25
github-copilot24
codex24
claude-code22
kimi-cli20