example-projects
ML Training Example Projects
Purpose: Provide complete, runnable example projects demonstrating ML training workflows from data preparation through deployment.
Activation Triggers:
- User requests example projects or starter templates
- User wants to see working sentiment classification code
- User needs text generation training examples
- User mentions RedAI trade classifier
- User wants reference implementations
- User needs to understand complete training workflows
Key Resources:
scripts/setup-example.sh- Initialize and setup any example projectscripts/run-training.sh- Execute training for any examplescripts/test-inference.sh- Test trained modelsexamples/sentiment-classification/- Binary sentiment classification (IMDB-style)examples/text-generation/- GPT-style text generation with LoRAexamples/redai-trade-classifier/- Financial trade classification with Modal deploymenttemplates/- Scaffolding for new projects
Available Example Projects
1. Sentiment Classification
Use Case: Binary sentiment analysis (positive/negative reviews)
Features:
- DistilBERT fine-tuning for text classification
- Custom dataset loading from JSON
- Training with validation metrics
- Model saving and inference
- Production-ready inference API
Files:
train.py- Complete training scriptdata.json- Sample training data (50 examples)inference.py- Inference serverREADME.md- Setup and usage guide
Dataset Format:
{"text": "This movie was amazing!", "label": 1}
{"text": "Terrible waste of time", "label": 0}
2. Text Generation
Use Case: Fine-tune GPT-2 for custom text generation
Features:
- GPT-2 small model fine-tuning
- LoRA (Low-Rank Adaptation) for efficient training
- Custom tokenization
- Generation with temperature/top-p sampling
- Modal deployment configuration
Files:
train.py- LoRA training scriptconfig.yaml- Hyperparameters and model configgenerate.py- Text generation scriptmodal_deploy.py- Modal deploymentREADME.md- Complete guide
Config Structure:
model:
name: gpt2
max_length: 512
training:
epochs: 3
batch_size: 4
learning_rate: 2e-4
lora:
r: 8
alpha: 16
dropout: 0.1
3. RedAI Trade Classifier
Use Case: Financial trade classification (buy/sell/hold)
Features:
- Multi-class classification for trading signals
- Feature engineering from market data
- Class imbalance handling
- Modal deployment for production inference
- Real-time prediction API
Files:
train.py- Training with class weightingmodal_deploy.py- Complete Modal deploymentdata_preprocessing.py- Feature engineeringREADME.md- Trading strategy guide
Model Input:
- Price features (open, high, low, close)
- Volume indicators
- Technical indicators (RSI, MACD, moving averages)
- Sentiment scores
Quick Start
Setup Any Example
# Initialize example project
./scripts/setup-example.sh <project-name>
# Options: sentiment-classification, text-generation, redai-trade-classifier
./scripts/setup-example.sh sentiment-classification
What it does:
- Creates project directory
- Copies example files
- Installs dependencies
- Downloads/prepares sample data
- Validates environment
Run Training
# Train model for any example
./scripts/run-training.sh <project-name>
# Examples:
./scripts/run-training.sh sentiment-classification
./scripts/run-training.sh text-generation
./scripts/run-training.sh redai-trade-classifier
Monitors:
- Training progress
- Loss curves
- Validation metrics
- GPU utilization
- Checkpoint saving
Test Inference
# Test trained model
./scripts/test-inference.sh <project-name> <input>
# Examples:
./scripts/test-inference.sh sentiment-classification "This product is great!"
./scripts/test-inference.sh text-generation "Once upon a time"
./scripts/test-inference.sh redai-trade-classifier market_data.json
Common Workflows
Start From Example Template
-
Choose example based on use case:
- Classification → sentiment-classification
- Generation → text-generation
- Financial ML → redai-trade-classifier
-
Setup project:
./scripts/setup-example.sh <example-name> -
Customize for your data:
- Update data loading in
train.py - Modify model architecture if needed
- Adjust hyperparameters in config
- Update data loading in
-
Run training:
./scripts/run-training.sh <example-name> -
Deploy:
- Local: Use
inference.py - Production: Use
modal_deploy.py
- Local: Use
Extend Example with Custom Data
- Prepare data in example format
- Replace data files (data.json, config.yaml)
- Update preprocessing if needed
- Train with same script
- Test inference with new data
Deploy Example to Production
All examples include Modal deployment:
# Deploy to Modal
cd examples/<project-name>
modal deploy modal_deploy.py
# Get endpoint URL
modal app show <app-name>
Example Comparison
| Feature | Sentiment | Text Gen | Trade Classifier |
|---|---|---|---|
| Task Type | Binary Classification | Generation | Multi-class |
| Model | DistilBERT | GPT-2 + LoRA | Custom Transformer |
| Training Time | 5-10 min | 15-30 min | 10-20 min |
| GPU Required | Optional | Recommended | Required |
| Modal Deploy | ✅ | ✅ | ✅ |
| Custom Data | Easy | Moderate | Advanced |
Customization Guide
Sentiment Classification
Change dataset:
# In train.py, update load_data()
def load_data(path):
# Your custom loading logic
return texts, labels
Change model:
# Replace DistilBERT with other models
model_name = "bert-base-uncased" # or roberta-base, etc.
Text Generation
Change generation style:
# In config.yaml
generation:
temperature: 0.8 # Higher = more creative
top_p: 0.9 # Nucleus sampling
max_length: 200 # Output length
Add custom prompts:
# In generate.py
prompts = [
"Your custom prompt here",
"Another prompt"
]
Trade Classifier
Add features:
# In data_preprocessing.py
def engineer_features(df):
df['rsi'] = calculate_rsi(df['close'])
df['macd'] = calculate_macd(df['close'])
# Add your custom indicators
return df
Change strategy:
# Update labels in train.py
# 0 = sell, 1 = hold, 2 = buy
labels = your_strategy(prices, indicators)
Dependencies
Each example includes its own requirements.txt:
Sentiment Classification:
- transformers
- torch
- datasets
- scikit-learn
Text Generation:
- transformers
- peft (LoRA)
- torch
- modal (deployment)
Trade Classifier:
- transformers
- pandas
- numpy
- modal
- ta (technical analysis)
Troubleshooting
Training Fails
Issue: Out of memory Fix: Reduce batch size in config
Issue: CUDA not available Fix: Use CPU or install CUDA toolkit
Inference Errors
Issue: Model not found Fix: Check checkpoint path in inference script
Issue: Wrong input format Fix: Validate input matches training data format
Deployment Issues
Issue: Modal authentication
Fix: Run modal token new to authenticate
Issue: Dependency conflicts Fix: Use exact versions from requirements.txt
Resources
Scripts: All scripts are in scripts/ with execution permissions
Examples: Complete projects in examples/ directory
Templates: Scaffolding in templates/ for creating new projects
Documentation: Each example has detailed README.md
Supported Frameworks: PyTorch, Transformers, PEFT Deployment Platforms: Modal, Local, FastAPI Version: 1.0.0