modal-finetuning
SKILL.md
Modal Fine-Tuning
Quick Start
- Verify the actual local Modal environment before writing code.
modal --version
python -c "import modal,sys; print(modal.__version__); print(sys.executable)"
modal profile current
- Do not assume the default
pythoninterpreter matches the environment behind themodalCLI.
- Classify the training goal before writing code.
- Adapt a text model with supervised fine-tuning, LoRA, or QLoRA.
- Train a diffusion LoRA for image generation or style adaptation.
- Fine-tune a YOLO-style vision model on labeled images.
- Run GRPO as RL-style post-training.
- Read references/training-playbook.md and then exactly one primary workflow reference.
- LLM SFT, LoRA, or QLoRA: references/llm-sft-and-lora.md
- Diffusion LoRA or YOLO: references/vision-and-diffusion-finetuning.md
- GRPO: references/rl-post-training.md
- Ground every implementation in the actual task: base model, dataset location and format, GPU type and count, checkpoint destination, secrets, and what should happen after training finishes.
Choose the Workflow
- Use the LLM path for supervised fine-tuning, LoRA, or QLoRA on a single node. Prefer Unsloth or another PEFT-style workflow unless the user explicitly needs full-parameter updates. Read references/llm-sft-and-lora.md.
- Use the diffusion path when the goal is image-generation adaptation, style transfer, or a small concept or brand LoRA. Keep checkpoints and sample outputs easy to inspect later. Read references/vision-and-diffusion-finetuning.md.
- Use the YOLO path when the task is labeled vision training for detection or related downstream evaluation. Keep dataset staging and exported weights separate from later serving. Read references/vision-and-diffusion-finetuning.md.
- Use GRPO only for reinforcement-learning-style post-training. Prefer TRL for the simpler single-node path, and use verl when the task needs a more explicit rollout or trainer split, or more advanced vLLM-backed RL plumbing. Read references/rl-post-training.md.
Default Rules
- Prefer PEFT methods such as LoRA or QLoRA before full fine-tuning unless the user explicitly needs weight updates across the whole model.
- Start with a cheap smoke test on a tiny ungated model and tiny dataset before a long or expensive run. Use it to validate image builds, trainer API compatibility, dataset formatting, checkpoint paths, and one saved sample artifact.
- Persist datasets, pretrained weights, checkpoints, merged adapters, and sample outputs in Modal Volumes. Do not rely on ephemeral container disk for anything that must survive retries or later inspection.
- Keep Hugging Face, Weights & Biases, Roboflow, and similar credentials in Modal Secrets.
- Design long runs so they can resume from checkpoints. Add retries only when resume behavior is correct.
- Set
timeout=intentionally for long training jobs and keep one stateful container per training run by default. Prefersingle_use_containers=Truewhen retries should start from a fresh container. - Use
@app.local_entrypointor a plain local launcher to expose hyperparameters and dataset switches as CLI arguments instead of hard-coding every experiment. - Keep the first version single-node unless the user explicitly asks for clusters. Modal multi-node training is a separate advanced path and is currently a beta workflow.
- Store final artifacts in a layout that makes handoff obvious: base model cache, dataset cache, checkpoint tree, and final exported weights or adapters.
- Separate training outputs from later serving. Hand the exported model or adapter off to a serving workflow instead of mixing training and HTTP serving in one artifact.
- If the task is really about serving the tuned model behind HTTP, OpenAI-compatible APIs, vLLM, or SGLang, stop and use
modal-llm-serving. - If the task is really about detached job orchestration, queueing,
.map,.spawn, or@modal.batched, stop and usemodal-batch-processing. - If the task is really about sandbox lifecycle, tunnels,
Sandbox.exec(...), or secure runtime code execution infrastructure, stop and usemodal-sandbox.
Validate
- Run
npx skills add . --listafter editing the package metadata or skill descriptions. - Keep
evals/evals.jsonandevals/trigger-evals.jsonaligned with the actual workflow boundaries of the skill.
References
- Read references/training-playbook.md first for shared storage, GPU, secret, and resume rules.
- Read references/llm-sft-and-lora.md for single-node LLM SFT, LoRA, and QLoRA guidance grounded in the Unsloth example.
- Read references/vision-and-diffusion-finetuning.md for YOLO and Diffusers LoRA training flows.
- Read references/rl-post-training.md for GRPO with TRL or verl.
- Read references/example-patterns.md for compact adaptation templates.
- Read references/troubleshooting.md for common training failures and recovery paths.
Weekly Installs
5
Repository
jamesrobmccall/…l_skillsGitHub Stars
2
First Seen
8 days ago
Security Audits
Installed on
claude-code5
opencode4
github-copilot4
codex4
kimi-cli4
gemini-cli4