knowledge-distillation
Originally fromovachiever/droid-tings
Installation
SKILL.md
Knowledge Distillation: Compressing LLMs
When to Use This Skill
Use Knowledge Distillation when you need to:
- Compress models from 70B → 7B while retaining 90%+ performance
- Transfer capabilities from proprietary models (GPT-4) to open-source (LLaMA, Mistral)
- Reduce inference costs by deploying smaller student models
- Create specialized models by distilling domain-specific knowledge
- Improve small models using synthetic data from large teachers
Key Techniques: Temperature scaling, soft targets, reverse KLD (MiniLLM), logit distillation, response distillation
Papers: Hinton et al. 2015 (arXiv 1503.02531), MiniLLM (arXiv 2306.08543), KD Survey (arXiv 2402.13116)