unsloth-lora
Overview
Unsloth optimizes Low-Rank Adaptation (LoRA) by providing 16-bit trainable matrices that allow for efficient fine-tuning without updating all model weights. It supports standard LoRA and Rank-Stabilized LoRA (rsLoRA), utilizing specialized kernels to accelerate training and reduce memory overhead.
When to Use
- When fine-tuning large language models on consumer-grade or limited GPU hardware.
- When aiming to match full fine-tuning performance with significantly lower VRAM usage.
- When specialized scaling (rsLoRA) is required for higher rank stability.
Decision Tree
- Need to update all weights?
- Yes: Use [[unsloth-fft]].
- No: Proceed to LoRA.
- Using high rank (r > 64)?
- Yes: Enable
use_rslora = Truefor sqrt(r) scaling. - No: Use standard LoRA.
- Yes: Enable
- Maximizing speed?
- Yes: Set
lora_dropout = 0to enable internal kernel optimizations.
- Yes: Set
Workflows
Optimizing LoRA Architecture
- Target all 7 major linear layers (q, k, v, o, gate, up, down) to match full fine-tuning performance.
- Initialize rank (r) between 16 and 32 for general tasks, or up to 128 for complex domain adaptation.
- Set lora_alpha equal to r or 2*r to maintain aggressive learning while ensuring numerical stability.
Configuring Rank-Stabilized LoRA (rsLoRA)
- Set
use_rslora = Trueinget_peft_modelto enable sqrt(r) scaling. - Increase rank (r) without the typical instability risks associated with high-alpha standard LoRA.
- Monitor training loss to ensure the model captures underlying patterns without memorization.
Non-Obvious Insights
- Setting
lora_dropoutto 0 is not just a parameter choice; it explicitly triggers internal Unsloth kernel-level optimizations that significantly speed up the training loop. - Unsloth includes a custom gradient accumulation fix that ensures results are mathematically identical regardless of the batch size and accumulation step combination.
- For verifying weight updates, MD5 checksums or absolute difference sums are more reliable than
np.allclose()because LoRA induces subtle Gaussian-distributed changes.
Evidence
- "LoRA: Fine-tunes small, trainable matrices in 16-bit without updating all model weights." Source
- "For optimal performance, LoRA should be applied to all major linear layers: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj." Source
- "Set use_rslora = True... the effective scaling becomes lora_alpha / sqrt(r) instead of the standard lora_alpha / r." Source
Scripts
scripts/unsloth-lora_tool.py: Python utility for configuring LoRA parameters in the Unsloth framework.scripts/unsloth-lora_tool.js: JavaScript helper for generating LoRA configuration objects.
Dependencies
- unsloth
- torch
- peft
- bitsandbytes
References
- [[references/README.md]]
More from cuba6112/skillfactory
pytorch-quantization
Techniques for model size reduction and inference acceleration using INT8 quantization, including Post-Training Quantization (PTQ) and Quantization Aware Training (QAT). (quantization, int8, qat, fbgemm, qnnpack, ptq, dequantize)
3mcp-builder
Guide for creating high-quality MCP (Model Context Protocol) servers that enable LLMs to interact with external services through well-designed tools. Use when building MCP servers to integrate external APIs or services, whether in Python (FastMCP) or Node/TypeScript (MCP SDK).
2torchtext
Natural Language Processing utilities for PyTorch (Legacy). Includes tokenizers, vocabulary building, and DataPipe-based dataset handling for text processing pipelines. (torchtext, tokenizer, vocab, datapipe, regextokenizer, nlp-pipeline)
2prompt-engineering
Comprehensive prompt engineering techniques for Claude models. Use this skill when crafting, optimizing, or debugging prompts for Claude API, Claude Code, or any Claude-powered application. Covers system prompts, role prompting, multishot examples, chain of thought, XML structuring, long context handling, extended thinking, prompt chaining, Claude 4.x-specific best practices, and agentic orchestration including subagents, agent loops, skills, MCP integration, and multi-agent workflows.
2notion-meeting-intelligence
Prepare meeting materials with Notion context and Codex research; use when gathering context, drafting agendas/pre-reads, and tailoring materials to attendees.
2