gemma-tuner-multimodal
Installation
SKILL.md
Gemma Multimodal Fine-Tuner
Skill by ara.so — Daily 2026 Skills collection.
Fine-tune Gemma 4 and Gemma 3n models on text, images, and audio data entirely on Apple Silicon (MPS), with support for streaming large datasets from GCS/BigQuery without filling local storage.
What It Does
- Text LoRA: instruction-tuning or completion fine-tuning from local CSV
- Image + Text LoRA: captioning and VQA from local CSV
- Audio + Text LoRA: the only Apple-Silicon-native path for this modality
- Cloud streaming: train on terabytes from GCS/BigQuery without local copy
- MPS-native: no NVIDIA GPU required — runs on MacBook Pro/Air/Mac Studio