skills/eyadsibai/ltk/transformers

transformers

SKILL.md

HuggingFace Transformers

Access thousands of pre-trained models for NLP, vision, audio, and multimodal tasks.

When to Use

  • Quick inference with pipelines
  • Text generation, classification, QA, NER
  • Image classification, object detection
  • Fine-tuning on custom datasets
  • Loading pre-trained models from HuggingFace Hub

Pipeline Tasks

NLP Tasks

Task Pipeline Name Output
Text Generation text-generation Completed text
Classification text-classification Label + confidence
Question Answering question-answering Answer span
Summarization summarization Shorter text
Translation translation_en_to_fr Translated text
NER ner Entity spans + types
Fill Mask fill-mask Predicted tokens

Vision Tasks

Task Pipeline Name Output
Image Classification image-classification Label + confidence
Object Detection object-detection Bounding boxes
Image Segmentation image-segmentation Pixel masks

Audio Tasks

Task Pipeline Name Output
Speech Recognition automatic-speech-recognition Transcribed text
Audio Classification audio-classification Label + confidence

Model Loading Patterns

Auto Classes

Class Use Case
AutoModel Base model (embeddings)
AutoModelForCausalLM Text generation (GPT-style)
AutoModelForSeq2SeqLM Encoder-decoder (T5, BART)
AutoModelForSequenceClassification Classification head
AutoModelForTokenClassification NER, POS tagging
AutoModelForQuestionAnswering Extractive QA

Key concept: Always use Auto classes unless you need a specific architecture—they handle model detection automatically.


Generation Parameters

Parameter Effect Typical Values
max_new_tokens Output length 50-500
temperature Randomness (0=deterministic) 0.1-1.0
top_p Nucleus sampling threshold 0.9-0.95
top_k Limit vocabulary per step 50
num_beams Beam search (disable sampling) 4-8
repetition_penalty Discourage repetition 1.1-1.3

Key concept: Higher temperature = more creative but less coherent. For factual tasks, use low temperature (0.1-0.3).


Memory Management

Device Placement Options

Option When to Use
device_map="auto" Let library decide GPU allocation
device_map="cuda:0" Specific GPU
device_map="cpu" CPU only

Quantization Options

Method Memory Reduction Quality Impact
8-bit ~50% Minimal
4-bit ~75% Small for most tasks
GPTQ ~75% Requires calibration
AWQ ~75% Activation-aware

Key concept: Use torch_dtype="auto" to automatically use the model's native precision (often bfloat16).


Fine-Tuning Concepts

Trainer Arguments

Argument Purpose Typical Value
num_train_epochs Training passes 3-5
per_device_train_batch_size Samples per GPU 8-32
learning_rate Step size 2e-5 for fine-tuning
weight_decay Regularization 0.01
warmup_ratio LR warmup 0.1
evaluation_strategy When to eval "epoch" or "steps"

Fine-Tuning Strategies

Strategy Memory Quality Use Case
Full fine-tuning High Best Small models, enough data
LoRA Low Good Large models, limited GPU
QLoRA Very Low Good 7B+ models on consumer GPU
Prefix tuning Low Moderate When you can't modify weights

Tokenization Concepts

Parameter Purpose
padding Make sequences same length
truncation Cut sequences to max_length
max_length Maximum tokens (model-specific)
return_tensors Output format ("pt", "tf", "np")

Key concept: Always use the tokenizer that matches the model—different models use different vocabularies.


Best Practices

Practice Why
Use pipelines for inference Handles preprocessing automatically
Use device_map="auto" Optimal GPU memory distribution
Batch inputs Better throughput
Use quantization for large models Run 7B+ on consumer GPUs
Match tokenizer to model Vocabularies differ between models
Use Trainer for fine-tuning Built-in best practices

Resources

Weekly Installs
31
Repository
eyadsibai/ltk
First Seen
Jan 28, 2026
Installed on
gemini-cli26
opencode24
github-copilot23
codex23
claude-code21
antigravity20