skills/nico-martin/skills/transformers-js

transformers-js

Originally fromhuggingface/skills
SKILL.md

Transformers.js - Machine Learning for JavaScript

Transformers.js enables running state-of-the-art machine learning models directly in JavaScript, both in browsers and Node.js environments, with no server required.

When to Use This Skill

Use this skill when you need to:

  • Run ML models for text analysis, generation, or translation in JavaScript
  • Perform image classification, object detection, or segmentation
  • Implement speech recognition or audio processing
  • Build multimodal AI applications (text-to-image, image-to-text, etc.)
  • Run models client-side in the browser without a backend

Installation

NPM Installation

npm install @huggingface/transformers

Browser Usage (CDN)

<script type="module">
  import { pipeline } from 'https://cdn.jsdelivr.net/npm/@huggingface/transformers';
</script>

Core Concepts

1. Pipeline API

The pipeline API is the easiest way to use models. It groups together preprocessing, model inference, and postprocessing:

import { pipeline } from '@huggingface/transformers';

// Create a pipeline for a specific task
const pipe = await pipeline('sentiment-analysis');

// Use the pipeline
const result = await pipe('I love transformers!');
// Output: [{ label: 'POSITIVE', score: 0.999817686 }]

// IMPORTANT: Always dispose when done to free memory
await classifier.dispose();

⚠️ Memory Management: All pipelines must be disposed with pipe.dispose() when finished to prevent memory leaks. See examples in Code Examples for cleanup patterns across different environments.

2. Model Selection

You can specify a custom model as the second argument:

const pipe = await pipeline(
  'sentiment-analysis',
  'Xenova/bert-base-multilingual-uncased-sentiment'
);

Finding Models:

Browse available Transformers.js models on Hugging Face Hub:

Tip: Filter by task type, sort by trending/downloads, and check model cards for performance metrics and usage examples.

3. Device Selection

Choose where to run the model:

// Run on CPU (default for WASM)
const pipe = await pipeline('sentiment-analysis', 'model-id');

// Run on GPU (WebGPU - experimental)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  device: 'webgpu',
});

4. Quantization Options

Control model precision vs. performance:

// Use quantized model (faster, smaller)
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q4',  // Options: 'fp32', 'fp16', 'q8', 'q4'
});

Supported Tasks

Note: All examples below show basic usage.

Natural Language Processing

Text Classification

const classifier = await pipeline('text-classification');
const result = await classifier('This movie was amazing!');

Named Entity Recognition (NER)

const ner = await pipeline('token-classification');
const entities = await ner('My name is John and I live in New York.');

Question Answering

const qa = await pipeline('question-answering');
const answer = await qa({
  question: 'What is the capital of France?',
  context: 'Paris is the capital and largest city of France.'
});

Text Generation

const generator = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX');
const text = await generator('Once upon a time', {
  max_new_tokens: 100,
  temperature: 0.7
});

For streaming and chat: See Text Generation Guide for:

  • Streaming token-by-token output with TextStreamer
  • Chat/conversation format with system/user/assistant roles
  • Generation parameters (temperature, top_k, top_p)
  • Browser and Node.js examples
  • React components and API endpoints

Translation

const translator = await pipeline('translation', 'Xenova/nllb-200-distilled-600M');
const output = await translator('Hello, how are you?', {
  src_lang: 'eng_Latn',
  tgt_lang: 'fra_Latn'
});

Summarization

const summarizer = await pipeline('summarization');
const summary = await summarizer(longText, {
  max_length: 100,
  min_length: 30
});

Zero-Shot Classification

const classifier = await pipeline('zero-shot-classification');
const result = await classifier('This is a story about sports.', ['politics', 'sports', 'technology']);

Computer Vision

Image Classification

const classifier = await pipeline('image-classification');
const result = await classifier('https://example.com/image.jpg');
// Or with local file
const result = await classifier(imageUrl);

Object Detection

const detector = await pipeline('object-detection');
const objects = await detector('https://example.com/image.jpg');
// Returns: [{ label: 'person', score: 0.95, box: { xmin, ymin, xmax, ymax } }, ...]

Image Segmentation

const segmenter = await pipeline('image-segmentation');
const segments = await segmenter('https://example.com/image.jpg');

Depth Estimation

const depthEstimator = await pipeline('depth-estimation');
const depth = await depthEstimator('https://example.com/image.jpg');

Zero-Shot Image Classification

const classifier = await pipeline('zero-shot-image-classification');
const result = await classifier('image.jpg', ['cat', 'dog', 'bird']);

Audio Processing

Automatic Speech Recognition

const transcriber = await pipeline('automatic-speech-recognition');
const result = await transcriber('audio.wav');
// Returns: { text: 'transcribed text here' }

Audio Classification

const classifier = await pipeline('audio-classification');
const result = await classifier('audio.wav');

Text-to-Speech

const synthesizer = await pipeline('text-to-speech', 'Xenova/speecht5_tts');
const audio = await synthesizer('Hello, this is a test.', {
  speaker_embeddings: speakerEmbeddings
});

Multimodal

Image-to-Text (Image Captioning)

const captioner = await pipeline('image-to-text');
const caption = await captioner('image.jpg');

Document Question Answering

const docQA = await pipeline('document-question-answering');
const answer = await docQA('document-image.jpg', 'What is the total amount?');

Zero-Shot Object Detection

const detector = await pipeline('zero-shot-object-detection');
const objects = await detector('image.jpg', ['person', 'car', 'tree']);

Feature Extraction (Embeddings)

const extractor = await pipeline('feature-extraction');
const embeddings = await extractor('This is a sentence to embed.');
// Returns: tensor of shape [1, sequence_length, hidden_size]

// For sentence embeddings (mean pooling)
const extractor = await pipeline('feature-extraction', 'onnx-community/all-MiniLM-L6-v2-ONNX');
const embeddings = await extractor('Text to embed', { pooling: 'mean', normalize: true });

Finding and Choosing Models

Browsing the Hugging Face Hub

Discover compatible Transformers.js models on Hugging Face Hub:

Base URL (all models):

https://huggingface.co/models?library=transformers.js&sort=trending

Filter by task using the pipeline_tag parameter:

Task URL
Text Generation https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending
Text Classification https://huggingface.co/models?pipeline_tag=text-classification&library=transformers.js&sort=trending
Translation https://huggingface.co/models?pipeline_tag=translation&library=transformers.js&sort=trending
Summarization https://huggingface.co/models?pipeline_tag=summarization&library=transformers.js&sort=trending
Question Answering https://huggingface.co/models?pipeline_tag=question-answering&library=transformers.js&sort=trending
Image Classification https://huggingface.co/models?pipeline_tag=image-classification&library=transformers.js&sort=trending
Object Detection https://huggingface.co/models?pipeline_tag=object-detection&library=transformers.js&sort=trending
Image Segmentation https://huggingface.co/models?pipeline_tag=image-segmentation&library=transformers.js&sort=trending
Speech Recognition https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&library=transformers.js&sort=trending
Audio Classification https://huggingface.co/models?pipeline_tag=audio-classification&library=transformers.js&sort=trending
Image-to-Text https://huggingface.co/models?pipeline_tag=image-to-text&library=transformers.js&sort=trending
Feature Extraction https://huggingface.co/models?pipeline_tag=feature-extraction&library=transformers.js&sort=trending
Zero-Shot Classification https://huggingface.co/models?pipeline_tag=zero-shot-classification&library=transformers.js&sort=trending

Sort options:

  • &sort=trending - Most popular recently
  • &sort=downloads - Most downloaded overall
  • &sort=likes - Most liked by community
  • &sort=modified - Recently updated

Choosing the Right Model

Consider these factors when selecting a model:

1. Model Size

  • Small (< 100MB): Fast, suitable for browsers, limited accuracy
  • Medium (100MB - 500MB): Balanced performance, good for most use cases
  • Large (> 500MB): High accuracy, slower, better for Node.js or powerful devices

2. Quantization Models are often available in different quantization levels:

  • fp32 - Full precision (largest, most accurate)
  • fp16 - Half precision (smaller, still accurate)
  • q8 - 8-bit quantized (much smaller, slight accuracy loss)
  • q4 - 4-bit quantized (smallest, noticeable accuracy loss)

3. Task Compatibility Check the model card for:

  • Supported tasks (some models support multiple tasks)
  • Input/output formats
  • Language support (multilingual vs. English-only)
  • License restrictions

4. Performance Metrics Model cards typically show:

  • Accuracy scores
  • Benchmark results
  • Inference speed
  • Memory requirements

Example: Finding a Text Generation Model

// 1. Visit: https://huggingface.co/models?pipeline_tag=text-generation&library=transformers.js&sort=trending

// 2. Browse and select a model (e.g., onnx-community/gemma-3-270m-it-ONNX)

// 3. Check model card for:
//    - Model size: ~270M parameters
//    - Quantization: q4 available
//    - Language: English
//    - Use case: Instruction-following chat

// 4. Use the model:
import { pipeline } from '@huggingface/transformers';

const generator = await pipeline(
  'text-generation',
  'onnx-community/gemma-3-270m-it-ONNX',
  { dtype: 'q4' } // Use quantized version for faster inference
);

const output = await generator('Explain quantum computing in simple terms.', {
  max_new_tokens: 100
});

await generator.dispose();

Tips for Model Selection

  1. Start Small: Test with a smaller model first, then upgrade if needed
  2. Check ONNX Support: Ensure the model has ONNX files (look for onnx folder in model repo)
  3. Read Model Cards: Model cards contain usage examples, limitations, and benchmarks
  4. Test Locally: Benchmark inference speed and memory usage in your environment
  5. Community Models: Look for models by Xenova (Transformers.js maintainer) or onnx-community
  6. Version Pin: Use specific git commits in production for stability:
    const pipe = await pipeline('task', 'model-id', { revision: 'abc123' });
    

Advanced Configuration

Environment Configuration (env)

The env object provides comprehensive control over Transformers.js execution, caching, and model loading.

Quick Overview:

import { env } from '@huggingface/transformers';

// View version
console.log(env.version); // e.g., '3.8.1'

// Common settings
env.allowRemoteModels = true;  // Load from Hugging Face Hub
env.allowLocalModels = false;  // Load from file system
env.localModelPath = '/models/'; // Local model directory
env.useFSCache = true;         // Cache models on disk (Node.js)
env.useBrowserCache = true;    // Cache models in browser
env.cacheDir = './.cache';     // Cache directory location

Configuration Patterns:

// Development: Fast iteration with remote models
env.allowRemoteModels = true;
env.useFSCache = true;

// Production: Local models only
env.allowRemoteModels = false;
env.allowLocalModels = true;
env.localModelPath = '/app/models/';

// Custom CDN
env.remoteHost = 'https://cdn.example.com/models';

// Disable caching (testing)
env.useFSCache = false;
env.useBrowserCache = false;

For complete documentation on all configuration options, caching strategies, cache management, pre-downloading models, and more, see:

Configuration Reference

Working with Tensors

import { AutoTokenizer, AutoModel } from '@huggingface/transformers';

// Load tokenizer and model separately for more control
const tokenizer = await AutoTokenizer.from_pretrained('bert-base-uncased');
const model = await AutoModel.from_pretrained('bert-base-uncased');

// Tokenize input
const inputs = await tokenizer('Hello world!');

// Run model
const outputs = await model(inputs);

Batch Processing

const classifier = await pipeline('sentiment-analysis');

// Process multiple texts
const results = await classifier([
  'I love this!',
  'This is terrible.',
  'It was okay.'
]);

Browser-Specific Considerations

WebGPU Usage

WebGPU provides GPU acceleration in browsers:

const pipe = await pipeline('text-generation', 'onnx-community/gemma-3-270m-it-ONNX', {
  device: 'webgpu',
  dtype: 'fp32'
});

Note: WebGPU is experimental. Check browser compatibility and file issues if problems occur.

WASM Performance

Default browser execution uses WASM:

// Optimized for browsers with quantization
const pipe = await pipeline('sentiment-analysis', 'model-id', {
  dtype: 'q8'  // or 'q4' for even smaller size
});

Progress Tracking & Loading Indicators

Models can be large (ranging from a few MB to several GB) and consist of multiple files. Track download progress by passing a callback to the pipeline() function:

import { pipeline } from '@huggingface/transformers';

// Track progress for each file
const fileProgress = {};

function onProgress(info) {
  console.log(`${info.status}: ${info.file}`);
  
  if (info.status === 'progress') {
    fileProgress[info.file] = info.progress;
    console.log(`${info.file}: ${info.progress.toFixed(1)}%`);
  }
  
  if (info.status === 'done') {
    console.log(`${info.file} complete`);
  }
}

// Pass callback to pipeline
const classifier = await pipeline('sentiment-analysis', null, {
  progress_callback: onProgress
});

Progress Info Properties:

interface ProgressInfo {
  status: 'initiate' | 'download' | 'progress' | 'done' | 'ready';
  name: string;      // Model id or path
  file: string;      // File being processed
  progress?: number; // Percentage (0-100, only for 'progress' status)
  loaded?: number;   // Bytes downloaded (only for 'progress' status)
  total?: number;    // Total bytes (only for 'progress' status)
}

For complete examples including browser UIs, React components, CLI progress bars, and retry logic, see:

Pipeline Options - Progress Callback

Error Handling

try {
  const pipe = await pipeline('sentiment-analysis', 'model-id');
  const result = await pipe('text to analyze');
} catch (error) {
  if (error.message.includes('fetch')) {
    console.error('Model download failed. Check internet connection.');
  } else if (error.message.includes('ONNX')) {
    console.error('Model execution failed. Check model compatibility.');
  } else {
    console.error('Unknown error:', error);
  }
}

Performance Tips

  1. Reuse Pipelines: Create pipeline once, reuse for multiple inferences
  2. Use Quantization: Start with q8 or q4 for faster inference
  3. Batch Processing: Process multiple inputs together when possible
  4. Cache Models: Models are cached automatically (see Caching Reference for details on browser Cache API, Node.js filesystem cache, and custom implementations)
  5. WebGPU for Large Models: Use WebGPU for models that benefit from GPU acceleration
  6. Prune Context: For text generation, limit max_new_tokens to avoid memory issues
  7. Clean Up Resources: Call pipe.dispose() when done to free memory

Memory Management

IMPORTANT: Always call pipe.dispose() when finished to prevent memory leaks.

const pipe = await pipeline('sentiment-analysis');
const result = await pipe('Great product!');
await pipe.dispose();  // ✓ Free memory (100MB - several GB per model)

When to dispose:

  • Application shutdown or component unmount
  • Before loading a different model
  • After batch processing in long-running apps

Models consume significant memory and hold GPU/CPU resources. Disposal is critical for browser memory limits and server stability.

For detailed patterns (React cleanup, servers, browser), see Code Examples

Troubleshooting

Model Not Found

  • Verify model exists on Hugging Face Hub
  • Check model name spelling
  • Ensure model has ONNX files (look for onnx folder in model repo)

Memory Issues

  • Use smaller models or quantized versions (dtype: 'q4')
  • Reduce batch size
  • Limit sequence length with max_length

WebGPU Errors

  • Check browser compatibility (Chrome 113+, Edge 113+)
  • Try dtype: 'fp16' if fp32 fails
  • Fall back to WASM if WebGPU unavailable

Reference Documentation

This Skill

Official Transformers.js

Best Practices

  1. Always Dispose Pipelines: Call pipe.dispose() when done - critical for preventing memory leaks
  2. Start with Pipelines: Use the pipeline API unless you need fine-grained control
  3. Test Locally First: Test models with small inputs before deploying
  4. Monitor Model Sizes: Be aware of model download sizes for web applications
  5. Handle Loading States: Show progress indicators for better UX
  6. Version Pin: Pin specific model versions for production stability
  7. Error Boundaries: Always wrap pipeline calls in try-catch blocks
  8. Progressive Enhancement: Provide fallbacks for unsupported browsers
  9. Reuse Models: Load once, use many times - don't recreate pipelines unnecessarily
  10. Graceful Shutdown: Dispose models on SIGTERM/SIGINT in servers

Quick Reference: Task IDs

Task Task ID
Text classification text-classification or sentiment-analysis
Token classification token-classification or ner
Question answering question-answering
Fill mask fill-mask
Summarization summarization
Translation translation
Text generation text-generation
Text-to-text generation text2text-generation
Zero-shot classification zero-shot-classification
Image classification image-classification
Image segmentation image-segmentation
Object detection object-detection
Depth estimation depth-estimation
Image-to-image image-to-image
Zero-shot image classification zero-shot-image-classification
Zero-shot object detection zero-shot-object-detection
Automatic speech recognition automatic-speech-recognition
Audio classification audio-classification
Text-to-speech text-to-speech or text-to-audio
Image-to-text image-to-text
Document question answering document-question-answering
Feature extraction feature-extraction
Sentence similarity sentence-similarity

This skill enables you to integrate state-of-the-art machine learning capabilities directly into JavaScript applications without requiring separate ML servers or Python environments.

Weekly Installs
5
First Seen
Feb 18, 2026
Installed on
amp5
github-copilot5
codex5
kimi-cli5
gemini-cli5
opencode5