local-tts
Local TTS Skill
Generate high-quality speech audio locally using Apple Silicon MLX acceleration and the Kokoro-82M model. No API keys or recurring costs.
Quick Start
# Generate MP3 from text
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--text "Hello, this is a test." \
--output ~/Desktop/test.mp3
# Generate from file
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--file /tmp/script.txt \
--voice af_heart \
--output ~/Desktop/podcast.mp3
# List available voices
uv run --with mlx-audio skills/local-tts/scripts/list_voices.py
Parameters
| Parameter | Required | Default | Description |
|---|---|---|---|
--text |
One of text/file | - | Text to convert |
--file |
One of text/file | - | Path to text file |
--voice |
No | af_heart |
Voice preset |
--output |
Yes | - | Output file path (.mp3, .wav) |
--model |
No | Kokoro-82M-bf16 |
Model to use |
--list-voices |
No | - | Show available voices |
Voice Presets
American English Female (prefix: af_)
af_heart- Warm, friendly (default)af_bella- Soft, calmaf_nova- Clear, professionalaf_river- Clear, confidentaf_sarah- Soft, expressive
American English Male (prefix: am_)
am_adam- Clear, professionalam_echo- Deep, smootham_liam- Articulate, conversationalam_michael- Soft, measured
British English (prefix: bf_, bm_)
bf_emma- Clear, refined femalebm_daniel- Clear, professional malebm_george- Distinguished male
See references/voices.md for full list.
Output Format
{
"success": true,
"file": "/Users/hagelk/Desktop/podcast.mp3",
"voice": "af_heart",
"model": "Kokoro-82M-bf16",
"characters": 9824,
"chunks": 20,
"duration_seconds": 612.5,
"generation_time": 45.2
}
Performance
| Hardware | Speed | Notes |
|---|---|---|
| M3 Pro 36GB | ~3-4x realtime | First run slower (model loading) |
| M1/M2 Mac Mini 8GB | ~1.5x realtime | Works well for briefings |
| M1/M2 Mac Mini 16GB | ~2x realtime | Comfortable headroom |
Technical Details
- Model: Kokoro-82M-bf16 (~200MB download on first run)
- Sample rate: 24kHz mono
- Chunking: Text split at ~400 chars per chunk for quality
- Concatenation: Chunks joined seamlessly via pydub
- Formats: MP3, WAV, M4A, OGG
Important Notes
-
MUST use
--withflags - Do not use PEP 723 inline deps. mlx-audio requires uv's cached environment. -
First run is slower - Model downloads ~200MB and espeak dependencies initialize.
-
Model cached at:
~/.cache/huggingface/hub/models--mlx-community--Kokoro-82M-bf16/
Integration with Morning Briefing
The morning-briefing skill uses this for podcast generation:
uv run --with mlx-audio --with pydub skills/local-tts/scripts/generate_audio.py \
--file /tmp/morning_briefing_podcast.txt \
--voice af_heart \
--output ~/Desktop/morning_briefing.mp3
More from krishagel/geoffrey
morning-briefing
Generate comprehensive morning briefing with calendar, tasks, tickets, news, and weather. Saves to Obsidian, sends email with audio podcast attached.
142google-workspace
Unified Google Workspace integration for managing email, calendar, files, and communication across multiple accounts
30omnifocus-manager
Manage OmniFocus tasks, projects, and inbox with proper tagging and organization
24personal-strategic-planning
Annual strategic review and goal-setting interview for personal life/work domains with quarterly progress check-ins
22browser-control
Full browser control for authenticated web interactions using Playwright scripts
21pdf-to-markdown
Convert PDF to clean Markdown with image content described as text. Use when user wants to convert a PDF to markdown, extract content from PDF, or prepare PDF content for AI tools.
18