iOS Machine Learning Router

You MUST use this skill for ANY on-device machine learning or speech-to-text work.

When to Use

Use this router when:

Converting PyTorch/TensorFlow models to CoreML
Deploying ML models on-device
Compressing models (quantization, palettization, pruning)
Working with large language models (LLMs)
Implementing KV-cache for transformers
Using MLTensor for model stitching
Building speech-to-text features
Transcribing audio (live or recorded)

Boundary with ios-ai

ios-ml vs ios-ai — know the difference:

Developer Intent	Router
"Use Apple Intelligence / Foundation Models"	ios-ai — Apple's on-device LLM
"Run my own ML model on device"	ios-ml — CoreML conversion + deployment
"Add text generation with @Generable"	ios-ai — Foundation Models structured output
"Deploy a custom LLM with KV-cache"	ios-ml — Custom model optimization
"Use Vision framework for image analysis"	ios-vision — Not ML deployment
"Use pre-trained Apple NLP models"	ios-ai — Apple's models, not custom

Rule of thumb: If the developer is converting/compressing/deploying their own model → ios-ml. If they're using Apple's built-in AI → ios-ai. If they're doing computer vision → ios-vision.

Routing Logic

CoreML Work

Implementation patterns → /skill coreml

Model conversion workflow
MLTensor for model stitching
Stateful models with KV-cache
Multi-function models (adapters/LoRA)
Async prediction patterns
Compute unit selection

API reference → /skill coreml-ref

CoreML Tools Python API
MLModel lifecycle
MLTensor operations
MLComputeDevice availability
State management APIs
Performance reports

Diagnostics → /skill coreml-diag

Model won't load
Slow inference
Memory issues
Compression accuracy loss
Compute unit problems

Speech Work

Implementation patterns → /skill speech

SpeechAnalyzer setup (iOS 26+)
SpeechTranscriber configuration
Live transcription
File transcription
Volatile vs finalized results
Model asset management

Decision Tree

Implementing / converting ML models? → coreml
CoreML API reference? → coreml-ref
Debugging ML issues (load, inference, compression)? → coreml-diag
Speech-to-text / transcription? → speech

Anti-Rationalization

Thought	Reality
"CoreML is just load and predict"	CoreML has compression, stateful models, compute unit selection, and async prediction. coreml covers all.
"My model is small, no optimization needed"	Even small models benefit from compute unit selection and async prediction. coreml has the patterns.
"I'll just use SFSpeechRecognizer"	iOS 26 has SpeechAnalyzer with better accuracy and offline support. speech skill covers the modern API.

Critical Patterns

coreml:

Model conversion (PyTorch → CoreML)
Compression (palettization, quantization, pruning)
Stateful KV-cache for LLMs
Multi-function models for adapters
MLTensor for pipeline stitching
Async concurrent prediction

coreml-diag:

Load failures and caching
Inference performance issues
Memory pressure from models
Accuracy degradation from compression

speech:

SpeechAnalyzer + SpeechTranscriber setup
AssetInventory model management
Live transcription with volatile results
Audio format conversion

Example Invocations

User: "How do I convert a PyTorch model to CoreML?" → Invoke: /skill coreml

User: "Compress my model to fit on iPhone" → Invoke: /skill coreml

User: "Implement KV-cache for my language model" → Invoke: /skill coreml

User: "Model loads slowly on first launch" → Invoke: /skill coreml-diag

User: "My compressed model has bad accuracy" → Invoke: /skill coreml-diag

User: "Add live transcription to my app" → Invoke: /skill speech

User: "Transcribe audio files with SpeechAnalyzer" → Invoke: /skill speech

User: "What's MLTensor and how do I use it?" → Invoke: /skill coreml-ref

axiom-ios-ml