qwen-tts
Qwen TTS Skill
Text-to-speech using Qwen3-TTS CustomVoice model, running locally on Apple Silicon via MLX.
Overview
- Model:
mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit(4-bit quantized, ~600MB) - Runtime: MLX (Apple Silicon GPU acceleration)
- Speakers: 9 built-in voices
- Output: WAV audio files
- Auto-cleanup: Files older than 24 hours are removed automatically
First-time Deployment
Prerequisites
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
1. Create virtual environment
cd /path/to/skills/skills/qwen-tts
python3 -m venv venv
source venv/bin/activate
2. Install dependencies
模型从 ModelScope 镜像下载(国内更快):
pip install -r scripts/requirements.txt
3. Pre-download model (optional)
首次运行时会自动下载模型。如需提前下载:
source venv/bin/activate
export HF_ENDPOINT="https://hf-mirror.com"
python3 -c "from mlx_audio.tts.utils import load_model; load_model('mlx-community/Qwen3-TTS-12Hz-0.6B-CustomVoice-4bit')"
4. Verify
source venv/bin/activate
python3 scripts/tts.py "你好,这是一段测试语音。" --output /tmp
# Should produce a WAV file in /tmp/
Troubleshooting
| 问题 | 解决方法 |
|---|---|
ModuleNotFoundError: mlx |
确认使用 Apple Silicon Mac,MLX 不支持 Intel Mac |
| 模型下载缓慢 | 设置 export HF_ENDPOINT="https://hf-mirror.com" |
| 内存不足 | 4-bit 模型约需 1.5GB 内存,关闭其他大型应用 |
| 无声音输出 | 检查输出文件是否为 0 字节,可能是文本过短 |
Configuration
⚠️ 以下为示例默认值。请根据实际使用场景修改 speaker 和 instruct。
- Speaker:
Serena(示例) - Instruct:
撒娇语气(示例) - Language:
Chinese - Speed:
1.0
Available Speakers
| Speaker | Language |
|---|---|
Serena |
Chinese |
Vivian |
Chinese |
Uncle_Fu |
Chinese |
Eric |
Chinese |
Dylan |
Chinese |
Ryan |
English |
Aiden |
English |
Ono_Anna |
Japanese |
Sohee |
Korean |
Available Instructs (emotion/style)
撒娇语气— coquettish冷静分析— calm analysis惊讶— surprised兴奋— excited神秘— mysterious开心— happy委屈— wronged/sad
Also supports free-form natural language instructions, e.g. 用特别愤怒的语气说.
Usage
# Default settings
python3 scripts/tts.py "你好!"
# Custom speaker
python3 scripts/tts.py "Hello!" --speaker Ryan --language English
# Custom emotion
python3 scripts/tts.py "其实我真的有发现..." --instruct 冷静分析
# Full customization
python3 scripts/tts.py "哥哥,你回来啦!" \
--speaker Serena \
--instruct 撒娇语气 \
--speed 1.0
# Custom output directory
python3 scripts/tts.py "测试" --output /tmp
# Skip auto-cleanup of old files
python3 scripts/tts.py "测试" --no-cleanup
Audio Output
- Default directory:
~/tts-output/(override with$QWEN_TTS_OUTPUT_DIR) - File naming:
tts_{timestamp}_{index}.wav - Auto-cleanup: Files older than 24 hours removed on each run (disable with
--no-cleanup)
Integration
- Generate audio using the TTS script
- Send the audio file as a voice message (Telegram, etc.)
- Old files are cleaned up automatically
More from stvlynn/skills
xiaohongshu
小红书搜索、发布、获取帖子详情。使用本地 MCP 服务器访问小红书内容,需要先登录。适用于搜索旅游攻略、美食推荐、获取帖子详情等场景。
31create-sticker
Generate LINE-style stickers of a character with background removal. Creates creative, unique poses with consistent character design via Google Gemini, removes solid background to transparent PNG. Use when user asks for sticker, 贴纸, LINE sticker.
25pv-tool
Operate the bundled PV Tool kinetic typography web app for lyric videos, promotional videos, and motion-graphics text overlays. Use this whenever the user mentions PV Tool, wants to run or preview the app locally, build it for distribution, experiment with PV templates and effects, or work with lyric text, media backgrounds, audio-reactive overlays, or LRC input inside PV Tool.
1tip-gui-skill
Reuse local Youtu-Tip GUI capabilities through a safe adapter CLI so OpenClaw/Codex-style agents can inspect desktop GUI state and perform guarded single-step actions on macOS.
1