huggingface

SKILL.md

Hugging Face

Hub for pretrained models and the Transformers library.

When to Use

  • NLP tasks (classification, NER, QA)
  • Using pretrained models
  • Fine-tuning transformers
  • Model hosting and inference

Quick Start

from transformers import pipeline

# Zero-shot classification
classifier = pipeline("text-classification", model="distilbert-base-uncased-finetuned-sst-2-english")
result = classifier("This movie is fantastic!")
print(result)  # [{'label': 'POSITIVE', 'score': 0.99}]

Core Concepts

Pipelines

from transformers import pipeline

# Text generation
generator = pipeline("text-generation", model="gpt2")
output = generator("Once upon a time", max_length=50)

# Summarization
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")
summary = summarizer(long_text, max_length=130, min_length=30)

# Question answering
qa = pipeline("question-answering")
answer = qa(question="What is Python?", context="Python is a programming language...")

Model Loading

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model(**inputs)

Common Patterns

Fine-tuning with Trainer

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    num_train_epochs=3,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=64,
    warmup_steps=500,
    weight_decay=0.01,
    logging_dir="./logs",
    evaluation_strategy="epoch"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

trainer.train()

Inference API

import requests

API_URL = "https://api-inference.huggingface.co/models/gpt2"
headers = {"Authorization": f"Bearer {HF_TOKEN}"}

response = requests.post(API_URL, headers=headers, json={"inputs": "Hello"})

Best Practices

Do:

  • Use Auto classes for flexibility
  • Cache models locally
  • Use device_map="auto" for large models
  • Quantize for deployment

Don't:

  • Load full model if using pipeline
  • Ignore tokenizer special tokens
  • Skip input validation
  • Fine-tune without freezing layers

Troubleshooting

Issue Cause Solution
OOM error Model too large Use quantization or smaller model
Slow inference No GPU Enable CUDA or use quantized
Token mismatch Wrong tokenizer Use matching tokenizer

References

Weekly Installs
2
GitHub Stars
7
First Seen
Feb 10, 2026
Installed on
mcpjam2
claude-code2
replit2
junie2
windsurf2
zencoder2