Vision Model Training on Hugging Face Jobs

Train object detection, image classification, and SAM/SAM2 segmentation models on managed cloud GPUs. No local GPU setup required—results are automatically saved to the Hugging Face Hub.

When to Use This Skill

Use this skill when users want to:

Fine-tune object detection models (D-FINE, RT-DETR v2, DETR, YOLOS) on cloud GPUs or local
Fine-tune image classification models (timm: MobileNetV3, MobileViT, ResNet, ViT/DINOv3, or any Transformers classifier) on cloud GPUs or local
Fine-tune SAM or SAM2 models for segmentation / image matting using bbox or point prompts
Train bounding-box detectors on custom datasets
Train image classifiers on custom datasets
Train segmentation models on custom mask datasets with prompts
Run vision training jobs on Hugging Face Jobs infrastructure
Ensure trained vision models are permanently saved to the Hub

hugging-face-vision-trainer

Vision Model Training on Hugging Face Jobs

When to Use This Skill

Related Skills