unsloth-sft
Overview
Supervised Fine-Tuning (SFT) in Unsloth focuses on training models to follow instructions using specific formats. It provides tools for chat template mapping, multi-turn conversation synthesis via conversation_extension, and optimized dataset processing.
When to Use
- When training models on instruction-response datasets (e.g., Alpaca).
- When developing multi-turn conversational agents.
- When you need to standardize various dataset formats (ShareGPT, OpenAI) for training.
Decision Tree
- Is your dataset single-turn?
- Yes: Use
conversation_extensionto synthetically create multi-turn samples. - No: Map columns using
standardize_sharegpt.
- Yes: Use
- Are you training on Windows?
- Yes: Set
dataset_num_proc = 1in SFTConfig. - No: Use multiple processes for faster mapping.
- Yes: Set
- Want to increase multi-turn accuracy?
- Yes: Enable masking of inputs to train on completions only.
Workflows
Chat Template Implementation
- Select a template (e.g., 'chatml', 'llama-3.1') using
get_chat_template(tokenizer, chat_template='...'). - Map dataset columns using the mapping parameter (e.g.,
mapping = {'role' : 'from', 'content' : 'value'}). - Apply the formatting function to the dataset using
dataset.mapwithbatched=True.
Multi-turn Data Preparation
- Load a standard single-turn dataset like Alpaca.
- Use
standardize_sharegpt(dataset)to unify the role and content keys. - Apply
conversation_extension=Nto randomly concatenate N rows into single interactive samples.
Non-Obvious Insights
- Training on completions only (masking out inputs) significantly increases accuracy, particularly for multi-turn conversations where input context is repetitive.
- Standardizing datasets to ShareGPT format before mapping is the most robust way to ensure compatibility with Unsloth's internal formatting kernels.
- On Windows,
dataset_num_procmust be 1; otherwise, the multi-processing overhead or library incompatibilities will cause trainer crashes.
Evidence
- "We introduced the conversation_extension parameter, which essentially selects some random rows in your single turn dataset, and merges them into 1 conversation!" Source
- "Training on completions only (masking out inputs) increases accuracy by quite a bit, especially for multi-turn conversational finetunes!" Source
Scripts
scripts/unsloth-sft_tool.py: Python tool for formatting datasets into ShareGPT/ChatML format.scripts/unsloth-sft_tool.js: JavaScript logic for mapping Alpaca-style datasets to conversation formats.
Dependencies
- unsloth
- trl
- datasets
References
- [[references/README.md]]
More from cuba6112/skillfactory
ollama-rag
Build RAG systems with Ollama local + cloud models. Latest cloud models include DeepSeek-V3.2 (GPT-5 level), Qwen3-Coder-480B (1M context), MiniMax-M2. Use for document Q&A, knowledge bases, and agentic RAG. Covers LangChain, LlamaIndex, ChromaDB, and embedding models.
17torchaudio
Audio signal processing library for PyTorch. Covers feature extraction (spectrograms, mel-scale), waveform manipulation, and GPU-accelerated data augmentation techniques. (torchaudio, melscale, spectrogram, pitchshift, specaugment, waveform, resample)
5pytorch-onnx
Exporting PyTorch models to ONNX format for cross-platform deployment. Includes handling dynamic axes, graph optimization in ONNX Runtime, and INT8 model quantization. (onnx, onnxruntime, torch.onnx.export, dynamic_axes, constant-folding, edge-deployment)
5unsloth-lora
Configuring and optimizing 16-bit Low-Rank Adaptation (LoRA) and Rank-Stabilized LoRA (rsLoRA) for efficient LLM fine-tuning using triggers like lora, qlora, rslora, rank selection, lora_alpha, lora_dropout, and target_modules.
4pytorch-quantization
Techniques for model size reduction and inference acceleration using INT8 quantization, including Post-Training Quantization (PTQ) and Quantization Aware Training (QAT). (quantization, int8, qat, fbgemm, qnnpack, ptq, dequantize)
3torchvision
Computer vision library for PyTorch featuring pretrained models, advanced image transforms (v2), and utilities for handling complex data types like bounding boxes and masks. (torchvision, transforms, tvtensor, resnet, cutmix, mixup, pretrained models, vision transforms)
3