LLM Training

Frameworks and techniques for training and finetuning large language models.

Framework Comparison

Framework	Best For	Multi-GPU	Memory Efficient
Accelerate	Simple distributed	Yes	Basic
DeepSpeed	Large models, ZeRO	Yes	Excellent
PyTorch Lightning	Clean training loops	Yes	Good
Ray Train	Scalable, multi-node	Yes	Good
TRL	RLHF, reward modeling	Yes	Good
Unsloth	Fast LoRA finetuning	Limited	Excellent

Minimal wrapper for distributed training. Run accelerate config for interactive setup.

Key concept: Wrap model, optimizer, dataloader with accelerator.prepare(), use accelerator.backward() for loss.

Microsoft's optimization library for training massive models.

ZeRO Stages:

Key concept: Configure via JSON, higher stages = more memory savings but more communication overhead.

HuggingFace library for reinforcement learning from human feedback.

Training types:

SFT (Supervised Finetuning): Standard instruction tuning
DPO (Direct Preference Optimization): Simpler than RLHF, uses preference pairs
PPO: Classic RLHF with reward model

Key concept: DPO is often preferred over PPO - simpler, no reward model needed, just chosen/rejected response pairs.

Optimized LoRA finetuning - 2x faster, 60% less memory.

Key concept: Drop-in replacement for standard LoRA with automatic optimizations. Best for 7B-13B models.