ML Training Cost Calculator

Purpose: Provide production-ready cost estimation tools for ML training and inference across cloud GPU platforms (Modal, Lambda Labs, RunPod).

Activation Triggers:

Estimating training costs for ML models
Comparing GPU platform pricing
Calculating GPU hours for training jobs
Budgeting for ML projects
Optimizing inference costs
Evaluating cost-effectiveness of different GPU types
Planning resource allocation

Key Resources:

scripts/estimate-training-cost.sh - Calculate training costs based on model size, data, GPU type
scripts/estimate-inference-cost.sh - Estimate inference costs for production workloads
scripts/calculate-gpu-hours.sh - Convert training parameters to GPU hours
scripts/compare-platforms.sh - Compare costs across Modal, Lambda, RunPod
templates/cost-breakdown.json - Structured cost breakdown template
templates/platform-pricing.yaml - Up-to-date platform pricing data
examples/training-cost-estimate.md - Example training cost calculation
examples/inference-cost-estimate.md - Example inference cost analysis

Platform Pricing Overview

Modal (Serverless - Pay Per Second)

GPU Options:

T4: $0.000164/sec ($0.59/hr) - Development, small models
L4: $0.000222/sec ($0.80/hr) - Cost-effective training
A10: $0.000306/sec ($1.10/hr) - Mid-range training
A100 40GB: $0.000583/sec ($2.10/hr) - Large model training
A100 80GB: $0.000694/sec ($2.50/hr) - Very large models
H100: $0.001097/sec ($3.95/hr) - Cutting-edge training
H200: $0.001261/sec ($4.54/hr) - Latest generation
B200: $0.001736/sec ($6.25/hr) - Maximum performance

Free Credits:

Starter: $30/month free
Startup credits: Up to $50,000 FREE

Lambda Labs (On-Demand Hourly)

Single GPU:

1x A10: $0.31/hr - Cheapest single GPU option
1x V100 16GB: $0.55/hr - Most affordable multi-GPU base

8x GPU Clusters:

8x V100: $4.40/hr ($0.55/GPU) - Most affordable multi-GPU
8x A100 40GB: $10.32/hr ($1.29/GPU)
8x A100 80GB: $14.32/hr ($1.79/GPU)
8x H100: $23.92/hr ($2.99/GPU)

RunPod (Serverless - Pay Per Minute)

Key Features:

Pay-per-minute billing
FlashBoot <200ms cold-starts
Zero egress fees on storage
30+ GPU SKUs available

Cost Estimation Scripts

1. Estimate Training Cost

Script: scripts/estimate-training-cost.sh

Usage:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal

Parameters:

--model-size: Model size (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--dataset-size: Number of training samples
--epochs: Number of training epochs
--batch-size: Training batch size (default: auto-calculated)
--gpu: GPU type (t4, a10, a100-40gb, a100-80gb, h100)
--platform: Cloud platform (modal, lambda, runpod)
--peft: Use PEFT/LoRA (yes/no, default: no)
--mixed-precision: Use FP16/BF16 (yes/no, default: yes)

Output:

{
  "model": "7B",
  "dataset_size": 10000,
  "epochs": 3,
  "gpu": "T4",
  "platform": "Modal",
  "estimated_hours": 4.2,
  "cost_breakdown": {
    "compute_cost": 2.48,
    "storage_cost": 0.05,
    "total_cost": 2.53
  },
  "cost_optimizations": {
    "with_peft": 1.26,
    "savings_percentage": 50
  },
  "alternative_platforms": {
    "lambda_a10": 1.30,
    "runpod_t4": 2.40
  }
}

Calculation Methodology:

Estimates tokens per sample (avg 500 tokens)
Calculates total training tokens
Applies throughput rates per GPU type
Accounts for PEFT (90% memory reduction)
Accounts for mixed precision (2x speedup)

2. Estimate Inference Cost

Script: scripts/estimate-inference-cost.sh

Usage:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Parameters:

--requests-per-day: Expected daily requests
--avg-latency: Average inference time (seconds)
--gpu: GPU type
--platform: Cloud platform
--deployment: Deployment type (serverless, dedicated)
--batch-inference: Batch requests (yes/no, default: no)

Output:

{
  "requests_per_day": 1000,
  "requests_per_month": 30000,
  "avg_latency_sec": 2,
  "gpu": "T4",
  "platform": "Modal Serverless",
  "cost_breakdown": {
    "daily_compute_seconds": 2000,
    "daily_cost": 0.33,
    "monthly_cost": 9.90,
    "cost_per_request": 0.00033
  },
  "scaling_analysis": {
    "requests_10k_day": 99.00,
    "requests_100k_day": 990.00
  },
  "dedicated_alternative": {
    "monthly_cost": 442.50,
    "break_even_requests_day": 4500
  }
}

3. Calculate GPU Hours

Script: scripts/calculate-gpu-hours.sh

Usage:

bash scripts/calculate-gpu-hours.sh \
  --model-params 7B \
  --tokens-total 30M \
  --gpu a100-40gb

Parameters:

--model-params: Model parameters (125M, 350M, 1B, 3B, 7B, 13B, 70B)
--tokens-total: Total training tokens
--gpu: GPU type
--peft: Use PEFT (yes/no)
--multi-gpu: Number of GPUs (default: 1)

GPU Throughput Benchmarks:

T4 (16GB):
  - 7B full fine-tune: 150 tokens/sec
  - 7B with PEFT: 600 tokens/sec

A100 40GB:
  - 7B full fine-tune: 800 tokens/sec
  - 7B with PEFT: 3200 tokens/sec
  - 13B with PEFT: 1600 tokens/sec

A100 80GB:
  - 13B full fine-tune: 600 tokens/sec
  - 70B with PEFT: 400 tokens/sec

H100:
  - 70B with PEFT: 1200 tokens/sec

4. Compare Platforms

Script: scripts/compare-platforms.sh

Usage:

bash scripts/compare-platforms.sh \
  --training-hours 4 \
  --gpu-type a100-40gb

Output:

# Platform Cost Comparison

## Training Job: 4 hours on A100 40GB

| Platform | GPU Cost | Egress Fees | Total | Notes |
|----------|----------|-------------|-------|-------|
| Modal | $8.40 | $0.00 | $8.40 | Serverless, pay-per-second |
| Lambda | $5.16 | $0.00 | $5.16 | Cheapest for dedicated |
| RunPod | $8.00 | $0.00 | $8.00 | Pay-per-minute |

## Winner: Lambda Labs ($5.16)

Savings: $3.24 (38.6% vs Modal)

Recommendation: Use Lambda for long-running dedicated training, Modal for
serverless/bursty workloads.

Cost Templates

Cost Breakdown Template

Template: templates/cost-breakdown.json

{
  "project_name": "ML Training Project",
  "cost_estimate": {
    "training": {
      "model_size": "7B",
      "training_runs": 4,
      "hours_per_run": 4.2,
      "gpu_type": "T4",
      "platform": "Modal",
      "cost_per_run": 2.48,
      "total_training_cost": 9.92
    },
    "inference": {
      "deployment_type": "serverless",
      "expected_requests_month": 30000,
      "gpu_type": "T4",
      "platform": "Modal",
      "monthly_cost": 9.90
    },
    "storage": {
      "model_artifacts_gb": 14,
      "dataset_storage_gb": 5,
      "monthly_storage_cost": 0.50
    },
    "total_monthly_cost": 20.32,
    "breakdown_percentage": {
      "training": 48.8,
      "inference": 48.7,
      "storage": 2.5
    }
  },
  "cost_optimizations_applied": {
    "peft_lora": "50% training cost reduction",
    "mixed_precision": "2x faster training",
    "serverless_inference": "Pay only for actual usage",
    "batch_inference": "Up to 10x reduction in inference cost"
  },
  "potential_savings": {
    "without_optimizations": 45.00,
    "with_optimizations": 20.32,
    "total_savings": 24.68,
    "savings_percentage": 54.8
  }
}

Platform Pricing Data

Template: templates/platform-pricing.yaml

platforms:
  modal:
    billing: per-second
    free_credits: 30  # USD per month
    startup_credits: 50000  # USD for eligible startups
    gpus:
      t4:
        price_per_sec: 0.000164
        price_per_hour: 0.59
        vram_gb: 16
      a100_40gb:
        price_per_sec: 0.000583
        price_per_hour: 2.10
        vram_gb: 40
      h100:
        price_per_sec: 0.001097
        price_per_hour: 3.95
        vram_gb: 80

  lambda:
    billing: per-hour
    free_credits: 0
    minimum_billing: 1-hour
    gpus:
      a10_1x:
        price_per_hour: 0.31
        vram_gb: 24
      a100_40gb_1x:
        price_per_hour: 1.29
        vram_gb: 40
      a100_40gb_8x:
        price_per_hour: 10.32
        total_vram_gb: 320

  runpod:
    billing: per-minute
    free_credits: 0
    features:
      - zero_egress_fees
      - flashboot_200ms
    gpus:
      t4:
        price_per_hour: 0.60  # Approximate
        vram_gb: 16

Cost Estimation Examples

Example 1: Training 7B Model

File: examples/training-cost-estimate.md

Scenario:

Model: Llama 2 7B fine-tuning
Dataset: 10,000 samples (5M tokens)
Epochs: 3
Total tokens: 15M
Method: LoRA/PEFT

Cost Calculation:

bash scripts/estimate-training-cost.sh \
  --model-size 7B \
  --dataset-size 10000 \
  --epochs 3 \
  --gpu t4 \
  --platform modal \
  --peft yes

Results:

Training Time: 4.2 hours
Modal T4 Cost: $2.48
Alternative (Lambda A10): $1.30 (47% cheaper)

Optimization Impact:
- Without PEFT: $12.40 (5x more expensive)
- With PEFT: $2.48
- Savings: $9.92 (80%)

Recommendation: Use Lambda A10 for cheapest option, or Modal T4 for serverless convenience.

Example 2: Production Inference

File: examples/inference-cost-estimate.md

Scenario:

Model: Custom 7B classifier
Expected traffic: 1,000 requests/day
Avg latency: 2 seconds per request
Growth: 10x in 6 months

Cost Calculation:

bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --gpu t4 \
  --platform modal \
  --deployment serverless

Current (1K requests/day):

Serverless Modal T4:
- Daily cost: $0.33
- Monthly cost: $9.90
- Cost per request: $0.00033

Dedicated Lambda A10:
- Monthly cost: $223 (24/7 instance)
- Break-even: 2,250 requests/day
- Not recommended for current traffic

After Growth (10K requests/day):

Serverless Modal T4:
- Monthly cost: $99.00
- Still cost-effective

Dedicated Lambda A10:
- Monthly cost: $223
- Break-even reached at 2,250 requests/day
- Recommendation: Stay serverless until 10K+ daily

Cost Optimization Strategies

1. Use PEFT/LoRA

Savings: 50-90% training cost reduction

# Calculate savings
bash scripts/estimate-training-cost.sh --model-size 7B --peft no
# Cost: $12.40

bash scripts/estimate-training-cost.sh --model-size 7B --peft yes
# Cost: $2.48
# Savings: $9.92 (80%)

2. Mixed Precision Training

Savings: 2x faster training, 50% cost reduction

Automatically enabled in cost estimations with --mixed-precision yes

3. Platform Selection

Use Case Guidelines:

# Short jobs (<1 hour): Modal serverless
bash scripts/compare-platforms.sh --training-hours 0.5 --gpu-type t4
# Winner: Modal ($0.30 vs Lambda $0.31 minimum)

# Long jobs (4+ hours): Lambda dedicated
bash scripts/compare-platforms.sh --training-hours 4 --gpu-type a100-40gb
# Winner: Lambda ($5.16 vs Modal $8.40)

# Variable workloads: Modal serverless
# Pay only for actual usage, no idle cost

4. Batch Inference

Savings: Up to 10x reduction in inference cost

# Single inference
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 2 \
  --batch-inference no
# Cost: $9.90/month

# Batch inference (10 requests per batch)
bash scripts/estimate-inference-cost.sh \
  --requests-per-day 1000 \
  --avg-latency 0.3 \
  --batch-inference yes
# Cost: $1.49/month
# Savings: $8.41 (85%)

Quick Reference: Cost Per Use Case

Small Model Training (< 1B params)

Best GPU: T4
Best Platform: Modal (serverless)
Typical Cost: $0.50-$2.00 per run
Time: 30 min - 2 hours

Medium Model Training (1B-7B params)

Best GPU: T4 (with PEFT) or A100 40GB
Best Platform: Lambda A10 (cheapest) or Modal T4 (convenience)
Typical Cost: $1.00-$8.00 per run
Time: 2-8 hours

Large Model Training (7B-70B params)

Best GPU: A100 80GB or H100 (with PEFT)
Best Platform: Lambda (dedicated) or Modal (serverless)
Typical Cost: $10-$100 per run
Time: 8-48 hours

Low-Traffic Inference (<1K requests/day)

Best Deployment: Modal serverless
Best GPU: T4
Typical Cost: $5-$15/month

High-Traffic Inference (>10K requests/day)

Best Deployment: Dedicated or batch serverless
Best GPU: A10 or A100
Typical Cost: $100-$500/month

Dependencies

Required for scripts:

# Bash 4.0+ (for associative arrays)
bash --version

# jq (for JSON processing)
sudo apt-get install jq

# bc (for floating-point calculations)
sudo apt-get install bc

# yq (for YAML processing)
pip install yq

Best Practices Summary

Always estimate before training - Use cost scripts to avoid surprises
Use PEFT for large models - 50-90% cost savings
Enable mixed precision - 2x speedup with no quality loss
Choose platform based on workload:
- Modal: Serverless, short jobs, variable workloads
- Lambda: Long-running, dedicated, multi-GPU
- RunPod: Per-minute billing flexibility
Batch inference when possible - Up to 10x cost reduction
Apply for startup credits - Modal offers $50K free
Monitor actual costs - Compare estimates to actuals, optimize
Use smallest viable GPU - T4 often sufficient with PEFT

Supported Platforms: Modal, Lambda Labs, RunPod GPU Types: T4, L4, A10, A100 (40GB/80GB), H100, H200, B200 Output Format: JSON cost breakdowns and markdown reports Version: 1.0.0

cost-calculator

ML Training Cost Calculator

Platform Pricing Overview

Modal (Serverless - Pay Per Second)

Lambda Labs (On-Demand Hourly)

RunPod (Serverless - Pay Per Minute)

Cost Estimation Scripts

1. Estimate Training Cost

2. Estimate Inference Cost

3. Calculate GPU Hours

4. Compare Platforms

Cost Templates

Cost Breakdown Template

Platform Pricing Data

Cost Estimation Examples

Example 1: Training 7B Model

Example 2: Production Inference

Cost Optimization Strategies

1. Use PEFT/LoRA

2. Mixed Precision Training

3. Platform Selection

4. Batch Inference

Quick Reference: Cost Per Use Case

Small Model Training (< 1B params)

Medium Model Training (1B-7B params)

Large Model Training (7B-70B params)

Low-Traffic Inference (<1K requests/day)

High-Traffic Inference (>10K requests/day)

Dependencies

Best Practices Summary