qwen-asr

SKILL.md

Qwen ASR Skill

Speech-to-text using Qwen3-ASR model, running locally on Apple Silicon via a FastAPI service.

Overview

  • Model: mlx-community/Qwen3-ASR-0.6B-4bit (4-bit quantized, ~400MB)
  • Runtime: MLX (Apple Silicon GPU acceleration via MPS)
  • Service: FastAPI on http://localhost:8100
  • Languages: Chinese, English, Japanese, Korean, and more

First-time Deployment

Prerequisites

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.10+
  • Docker is not required (runs natively)

1. Create virtual environment

cd /path/to/skills/skills/qwen-asr
python3 -m venv venv
source venv/bin/activate

2. Install dependencies

模型从 ModelScope/HuggingFace 镜像下载(国内更快):

pip install -r service/requirements.txt

3. Start the service

bash service/start.sh

首次启动时会自动从 hf-mirror.com 下载模型(约 400MB),后续启动使用本地缓存。

4. Verify

# 检查服务健康状态
curl http://localhost:8100/health

# 查看模型信息
curl http://localhost:8100/info

# 测试转录(使用在线音频)
curl -X POST "http://localhost:8100/transcribe_url?audio_url=https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-ASR-Repo/asr_en.wav"

5. (Optional) 设为系统服务

macOS 上可以使用 launchd 设置开机自启:

# 创建 plist(自行修改路径)
cat > ~/Library/LaunchAgents/com.qwen.asr.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.qwen.asr</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>-c</string>
        <string>cd /path/to/skills/skills/qwen-asr && bash service/start.sh</string>
    </array>
    <key>RunAtLoad</key>
    <true/>
    <key>KeepAlive</key>
    <true/>
    <key>StandardOutPath</key>
    <string>/tmp/qwen-asr.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/qwen-asr.err</string>
</dict>
</plist>
PLIST

launchctl load ~/Library/LaunchAgents/com.qwen.asr.plist

Troubleshooting

问题 解决方法
端口 8100 被占用 lsof -i :8100 找到占用进程,或修改 start.sh 中端口号
模型下载缓慢 脚本已自动设置 HF_ENDPOINT=https://hf-mirror.com
503 Model not loaded 模型仍在加载中,首次约需 10-30 秒
ModuleNotFoundError: mlx 确认使用 Apple Silicon Mac
转录结果为空 检查音频格式(支持 wav/mp3/ogg/flac),或音频可能太短

Service Endpoints

Endpoint Method Description
/health GET Health check
/info GET Model information
/transcribe POST Transcribe uploaded audio file
/transcribe_url POST Transcribe audio from URL

Usage

CLI Client

# Transcribe local file
python3 scripts/asr.py audio.wav

# Transcribe from URL
python3 scripts/asr.py --url "https://example.com/audio.wav"

# Specify language
python3 scripts/asr.py audio.wav --lang zh

# Check service status
python3 scripts/asr.py --check

curl

# Upload file
curl -X POST "http://localhost:8100/transcribe" \
  -H "Content-Type: multipart/form-data" \
  -F "file=@audio.wav"

# From URL with language
curl -X POST "http://localhost:8100/transcribe_url?audio_url=<URL>&language=zh"

Python

import requests

# From URL
response = requests.post(
    "http://localhost:8100/transcribe_url",
    params={"audio_url": "https://example.com/audio.wav", "language": "zh"}
)
text = response.json()["text"]

# From file
with open("audio.wav", "rb") as f:
    response = requests.post("http://localhost:8100/transcribe", files={"file": f})
text = response.json()["text"]

Output Format

{
  "text": "transcribed text here",
  "chunks": [],
  "processing_time": 0.123
}

Service Management

# Start
bash service/start.sh

# Check status
python3 scripts/asr.py --check

# Stop (find and kill process)
lsof -ti :8100 | xargs kill
Weekly Installs
9
Repository
stvlynn/skills
GitHub Stars
46
First Seen
12 days ago
Installed on
opencode9
gemini-cli9
github-copilot9
codex9
kimi-cli9
amp9