quantizing-models-bitsandbytes
Originally fromovachiever/droid-tings
Installation
SKILL.md
bitsandbytes - LLM Quantization
Quick start
bitsandbytes reduces LLM memory by 50% (8-bit) or 75% (4-bit) with <1% accuracy loss.
Installation:
pip install bitsandbytes transformers accelerate
8-bit quantization (50% memory reduction):
from transformers import AutoModelForCausalLM, BitsAndBytesConfig