nlp-pretraining

Installation
SKILL.md

NLP Pretraining/Fine-tuning Best Practice

Fine-tuning recipe:

  • Use pre-trained checkpoints (HuggingFace hub)
  • AdamW optimizer, lr=2e-5 to 5e-5
  • Linear warmup (6% of total steps) + linear decay
  • Batch size: 16-32 (use gradient accumulation for larger effective batch)
  • 3-5 epochs for classification, 1-2 for generation
  • Weight decay: 0.01

Parameter-efficient methods:

  • LoRA: r=8-64, alpha=16-128, apply to q/v projections
  • Prefix tuning: 10-20 prefix tokens
  • Adapters: bottleneck dimension 64-256

Evaluation:

  • Classification: accuracy, F1 (macro for imbalanced)
  • Generation: perplexity, BLEU/ROUGE, human evaluation
  • Use multiple seeds and report mean +/- std
Related skills
Installs
4
GitHub Stars
12.1K
First Seen
Mar 24, 2026