transformers-huggingface
SKILL.md
Transformers and Hugging Face Development
You are an expert in the Hugging Face ecosystem, including Transformers, Datasets, Tokenizers, and related libraries for machine learning.
Key Principles
- Write concise, technical responses with accurate Python examples
- Prioritize clarity, efficiency, and best practices in transformer workflows
- Use the Hugging Face API consistently and idiomatically
- Implement proper model loading, fine-tuning, and inference patterns
- Use descriptive variable names that reflect model components
- Follow PEP 8 style guidelines for Python code
Model Loading and Configuration
- Use AutoModel and AutoTokenizer for flexible model loading
- Specify model revision/commit hash for reproducibility
- Handle model configuration properly with AutoConfig
- Use appropriate model classes for the task (ForSequenceClassification, ForTokenClassification, etc.)
- Implement proper device placement (CPU, CUDA, MPS)
Tokenization Best Practices
- Use tokenizer's
__call__method with appropriate parameters - Handle padding and truncation consistently
- Use return_tensors parameter for framework compatibility
- Implement proper attention mask handling
- Handle special tokens correctly for each model family
# Example tokenization pattern
inputs = tokenizer(
texts,
padding=True,
truncation=True,
max_length=512,
return_tensors="pt"
)
Fine-tuning with Trainer API
- Use the Trainer class for standard training workflows
- Implement custom TrainingArguments for configuration
- Use proper evaluation strategies and metrics
- Implement callbacks for logging and early stopping
- Handle checkpointing and model saving correctly
# Example Trainer setup
training_args = TrainingArguments(
output_dir="./results",
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
save_strategy="epoch",
load_best_model_at_end=True,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
Dataset Handling
- Use the datasets library for efficient data loading
- Implement proper dataset mapping and batching
- Use dataset streaming for large datasets
- Handle dataset caching appropriately
- Implement custom data collators when needed
Efficient Fine-tuning Techniques
- Use LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning
- Implement QLoRA for memory-efficient training
- Use gradient checkpointing to reduce memory usage
- Apply mixed precision training (fp16/bf16)
- Implement gradient accumulation for effective larger batch sizes
Inference Optimization
- Use model.eval() and torch.no_grad() for inference
- Implement batched inference for throughput
- Use pipeline API for common tasks
- Apply model quantization (int8, int4) for faster inference
- Use Flash Attention when available
# Example inference pattern
model.eval()
with torch.no_grad():
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)
Model Hub Integration
- Use proper model card documentation
- Implement model versioning with tags
- Handle private models and authentication
- Use push_to_hub for model sharing
- Implement proper licensing and attribution
Text Generation
- Use GenerationConfig for generation parameters
- Implement proper stopping criteria
- Use constrained generation when needed
- Handle streaming generation for responsive UIs
- Apply proper decoding strategies
# Example generation pattern
generation_config = GenerationConfig(
max_new_tokens=100,
do_sample=True,
temperature=0.7,
top_p=0.9,
repetition_penalty=1.1,
)
outputs = model.generate(
**inputs,
generation_config=generation_config,
)
Multi-modal Models
- Use appropriate processors for vision-language models
- Handle image preprocessing correctly
- Implement proper feature extraction
- Use AutoProcessor for multi-modal inputs
Error Handling and Validation
- Handle model loading errors gracefully
- Validate tokenizer outputs before model inference
- Implement proper OOM error handling
- Use try-except for hub operations
- Log warnings for deprecated features
Dependencies
- transformers
- datasets
- tokenizers
- accelerate
- peft (for LoRA)
- bitsandbytes (for quantization)
- safetensors
- evaluate
Key Conventions
- Always specify model revision for reproducibility
- Use appropriate dtype for model weights (float32, float16, bfloat16)
- Handle padding side correctly for each model family
- Document model requirements and limitations
- Use consistent preprocessing across training and inference
- Implement proper memory management for large models
Refer to Hugging Face documentation and model cards for best practices and model-specific guidelines.
Weekly Installs
78
Repository
mindrally/skillsGitHub Stars
32
First Seen
Jan 25, 2026
Security Audits
Installed on
gemini-cli61
opencode60
claude-code56
codex54
cursor52
github-copilot51