gptq

Originally fromovachiever/droid-tings

Installation

SKILL.md

GPTQ (Generative Pre-trained Transformer Quantization)

Post-training quantization method that compresses LLMs to 4-bit with minimal accuracy loss using group-wise quantization.

When to use GPTQ

Use GPTQ when:

Need to fit large models (70B+) on limited GPU memory
Want 4× memory reduction with <2% accuracy loss
Deploying on consumer GPUs (RTX 4090, 3090)
Need faster inference (3-4× speedup vs FP16)

Use AWQ instead when:

Need slightly better accuracy (<1% loss)
Have newer GPUs (Ampere, Ada)
Want Marlin kernel support (2× faster on some GPUs)

Installs

286

Repository

davila7/claude-…emplates

GitHub Stars

30.0K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass

gptq — davila7/claude-code-templates