finetuning
SKILL.md
Finetuning
Adapting Foundation Models for specific tasks.
When to Finetune
DO Finetune
- Improve quality on specific domain
- Reduce latency (smaller model)
- Reduce cost (fewer tokens)
- Ensure consistent style
- Add specialized capabilities
DON'T Finetune
- Prompt engineering is enough
- Insufficient data (<1000 examples)
- Need frequent updates
- RAG can solve the problem
Memory Requirements
def training_memory_gb(num_params_billion, precision="fp16"):
bytes_per = {"fp32": 4, "fp16": 2, "int8": 1}
model = num_params_billion * 1e9 * bytes_per[precision]
optimizer = num_params_billion * 1e9 * 4 * 2 # AdamW states
gradients = num_params_billion * 1e9 * bytes_per[precision]
return (model + optimizer + gradients) / 1e9
# 7B model full finetuning: ~112 GB!
# With LoRA: ~16 GB
# With QLoRA: ~6 GB
LoRA (Low-Rank Adaptation)
from peft import LoraConfig, get_peft_model
config = LoraConfig(
r=8, # Rank (lower = fewer params)
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"],
lora_dropout=0.05,
task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, config)
# ~0.06% of 7B trainable!
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
QLoRA (4-bit + LoRA)
from transformers import BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto"
)
model = get_peft_model(model, lora_config)
# 7B on 16GB GPU!
Training
from transformers import Trainer, TrainingArguments
args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-5,
warmup_steps=100,
fp16=True,
gradient_checkpointing=True,
optim="paged_adamw_8bit"
)
trainer = Trainer(
model=model,
args=args,
train_dataset=train_data,
eval_dataset=eval_data
)
trainer.train()
# Merge LoRA back
merged = model.merge_and_unload()
merged.save_pretrained("./finetuned")
Model Merging
Task Arithmetic
def task_vector_merge(base, finetuned_models, scale=0.3):
merged = base.state_dict()
for ft in finetuned_models:
for key in merged:
task_vector = ft.state_dict()[key] - merged[key]
merged[key] += scale * task_vector
return merged
Best Practices
- Start with small rank (r=8)
- Use QLoRA for limited GPU
- Monitor validation loss
- Test merged models carefully
- Keep base model for comparison
Weekly Installs
1
Repository
doanchienthangdev/omgkitGitHub Stars
3
First Seen
6 days ago
Security Audits
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1