transformers
SKILL.md
HuggingFace Transformers
Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.
When to Use
- Quick inference with pipelines
- Text generation, classification, QA, NER
- Image classification, object detection
- Fine-tuning on custom datasets
- Loading pre-trained models from HuggingFace Hub
Pipeline Tasks
NLP Tasks
| Task | Pipeline Name | Output |
|---|---|---|
| Text Generation | text-generation |
Completed text |
| Classification | text-classification |
Label + confidence |
| Question Answering | question-answering |
Answer span |
| Summarization | summarization |
Shorter text |
| Translation | translation_en_to_fr |
Translated text |
| NER | ner |
Entity spans + types |
| Fill Mask | fill-mask |
Predicted tokens |
Vision Tasks
| Task | Pipeline Name | Output |
|---|---|---|
| Image Classification | image-classification |
Label + confidence |
| Object Detection | object-detection |
Bounding boxes |
| Image Segmentation | image-segmentation |
Pixel masks |
Audio Tasks
| Task | Pipeline Name | Output |
|---|---|---|
| Speech Recognition | automatic-speech-recognition |
Transcribed text |
| Audio Classification | audio-classification |
Label + confidence |
Model Loading Patterns
Auto Classes
| Class | Use Case |
|---|---|
| AutoModel | Base model (embeddings) |
| AutoModelForCausalLM | Text generation (GPT-style) |
| AutoModelForSeq2SeqLM | Encoder-decoder (T5, BART) |
| AutoModelForSequenceClassification | Classification head |
| AutoModelForTokenClassification | NER, POS tagging |
| AutoModelForQuestionAnswering | Extractive QA |
Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.
Generation Parameters
| Parameter | Effect | Typical Values |
|---|---|---|
| max_new_tokens | Output length | 50-500 |
| temperature | Randomness (0=deterministic) | 0.1-1.0 |
| top_p | Nucleus sampling threshold | 0.9-0.95 |
| top_k | Limit vocabulary per step | 50 |
| num_beams | Beam search (disable sampling) | 4-8 |
| repetition_penalty | Discourage repetition | 1.1-1.3 |
Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).
Memory Management
Device Placement Options
| Option | When to Use |
|---|---|
| device_map="auto" | Let library decide GPU allocation |
| device_map="cuda:0" | Specific GPU |
| device_map="cpu" | CPU only |
Quantization Options
| Method | Memory Reduction | Quality Impact |
|---|---|---|
| 8-bit | ~50% | Minimal |
| 4-bit | ~75% | Small for most tasks |
| GPTQ | ~75% | Requires calibration |
| AWQ | ~75% | Activation-aware |
Key concept: Use torch_dtype="auto" to automatically use the model's native precision (often bfloat16).
Fine-Tuning Concepts
Trainer Arguments
| Argument | Purpose | Typical Value |
|---|---|---|
| num_train_epochs | Training passes | 3-5 |
| per_device_train_batch_size | Samples per GPU | 8-32 |
| learning_rate | Step size | 2e-5 for fine-tuning |
| weight_decay | Regularization | 0.01 |
| warmup_ratio | LR warmup | 0.1 |
| evaluation_strategy | When to eval | "epoch" or "steps" |
Fine-Tuning Strategies
| Strategy | Memory | Quality | Use Case |
|---|---|---|---|
| Full fine-tuning | High | Best | Small models, enough data |
| LoRA | Low | Good | Large models, limited GPU |
| QLoRA | Very Low | Good | 7B+ models on consumer GPU |
| Prefix tuning | Low | Moderate | When you can't modify weights |
Tokenization Concepts
| Parameter | Purpose |
|---|---|
| padding | Make sequences same length |
| truncation | Cut sequences to max_length |
| max_length | Maximum tokens (model-specific) |
| return_tensors | Output format ("pt", "tf", "np") |
Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.
Best Practices
| Practice | Why |
|---|---|
| Use pipelines for inference | Handles preprocessing automatically |
| Use device_map="auto" | Optimal GPU memory distribution |
| Batch inputs | Better throughput |
| Use quantization for large models | Run 7B+ on consumer GPUs |
| Match tokenizer to model | Vocabularies differ between models |
| Use Trainer for fine-tuning | Built-in best practices |
Resources
- Docs: https://huggingface.co/docs/transformers
- Model Hub: https://huggingface.co/models
- Course: https://huggingface.co/course
Weekly Installs
31
Repository
eyadsibai/ltkFirst Seen
Jan 28, 2026
Security Audits
Installed on
gemini-cli26
opencode24
github-copilot23
codex23
claude-code21
antigravity20