use-local-whisper
Use Local Whisper
Switches voice transcription from OpenAI's Whisper API to local whisper.cpp. Runs entirely on-device — no API key, no network, no cost.
Channel support: Currently WhatsApp only. The transcription module (src/transcription.ts) uses Baileys types for audio download. Other channels (Telegram, Discord, etc.) would need their own audio-download logic before this skill can serve them.
Note: The Homebrew package is whisper-cpp, but the CLI binary it installs is whisper-cli.
Prerequisites
voice-transcriptionskill must be applied first (WhatsApp channel)- macOS with Apple Silicon (M1+) recommended
whisper-cppinstalled:brew install whisper-cpp(provides thewhisper-clibinary)ffmpeginstalled:brew install ffmpeg- A GGML model file downloaded to
data/models/
Phase 1: Pre-flight
Check if already applied
Check if src/transcription.ts already uses whisper-cli:
grep 'whisper-cli' src/transcription.ts && echo "Already applied" || echo "Not applied"
If already applied, skip to Phase 3 (Verify).
Check dependencies are installed
whisper-cli --help >/dev/null 2>&1 && echo "WHISPER_OK" || echo "WHISPER_MISSING"
ffmpeg -version >/dev/null 2>&1 && echo "FFMPEG_OK" || echo "FFMPEG_MISSING"
If missing, install via Homebrew:
brew install whisper-cpp ffmpeg
Check for model file
ls data/models/ggml-*.bin 2>/dev/null || echo "NO_MODEL"
If no model exists, download the base model (148MB, good balance of speed and accuracy):
mkdir -p data/models
curl -L -o data/models/ggml-base.bin "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-base.bin"
For better accuracy at the cost of speed, use ggml-small.bin (466MB) or ggml-medium.bin (1.5GB).
Phase 2: Apply Code Changes
Ensure WhatsApp fork remote
git remote -v
If whatsapp is missing, add it:
git remote add whatsapp https://github.com/qwibitai/nanoclaw-whatsapp.git
Merge the skill branch
git fetch whatsapp skill/local-whisper
git merge whatsapp/skill/local-whisper || {
git checkout --theirs package-lock.json
git add package-lock.json
git merge --continue
}
This modifies src/transcription.ts to use the whisper-cli binary instead of the OpenAI API.
Validate
npm run build
Phase 3: Verify
Ensure launchd PATH includes Homebrew
The NanoClaw launchd service runs with a restricted PATH. whisper-cli and ffmpeg are in /opt/homebrew/bin/ (Apple Silicon) or /usr/local/bin/ (Intel), which may not be in the plist's PATH.
Check the current PATH:
grep -A1 'PATH' ~/Library/LaunchAgents/com.nanoclaw.plist
If /opt/homebrew/bin is missing, add it to the <string> value inside the PATH key in the plist. Then reload:
launchctl unload ~/Library/LaunchAgents/com.nanoclaw.plist
launchctl load ~/Library/LaunchAgents/com.nanoclaw.plist
Build and restart
npm run build
launchctl kickstart -k gui/$(id -u)/com.nanoclaw
Test
Send a voice note in any registered group. The agent should receive it as [Voice: <transcript>].
Check logs
tail -f logs/nanoclaw.log | grep -i -E "voice|transcri|whisper"
Look for:
Transcribed voice message— successful transcriptionwhisper.cpp transcription failed— check model path, ffmpeg, or PATH
Configuration
Environment variables (optional, set in .env):
| Variable | Default | Description |
|---|---|---|
WHISPER_BIN |
whisper-cli |
Path to whisper.cpp binary |
WHISPER_MODEL |
data/models/ggml-base.bin |
Path to GGML model file |
Troubleshooting
"whisper.cpp transcription failed": Ensure both whisper-cli and ffmpeg are in PATH. The launchd service uses a restricted PATH — see Phase 3 above. Test manually:
ffmpeg -f lavfi -i anullsrc=r=16000:cl=mono -t 1 -f wav /tmp/test.wav -y
whisper-cli -m data/models/ggml-base.bin -f /tmp/test.wav --no-timestamps -nt
Transcription works in dev but not as service: The launchd plist PATH likely doesn't include /opt/homebrew/bin. See "Ensure launchd PATH includes Homebrew" in Phase 3.
Slow transcription: The base model processes ~30s of audio in <1s on M1+. If slower, check CPU usage — another process may be competing.
Wrong language: whisper.cpp auto-detects language. To force a language, you can set WHISPER_LANG and modify src/transcription.ts to pass -l $WHISPER_LANG.
More from qwibitai/nanoclaw
debug
Debug container agent issues. Use when things aren't working, container fails, authentication problems, or to understand how the container system works. Covers logs, environment variables, mounts, and common issues.
20add-whatsapp
Add WhatsApp channel via native Baileys adapter. Direct connection — no Chat SDK bridge. Uses QR code or pairing code for authentication.
12add-telegram
Add Telegram channel integration via Chat SDK.
12update-nanoclaw
Efficiently bring upstream NanoClaw updates into a customized install, with preview, selective cherry-pick, and low token usage.
11customize
Add new capabilities or modify NanoClaw behavior. Use when user wants to add channels (Telegram, Slack, email input), change triggers, add integrations, modify the router, or make any other customizations. This is an interactive skill that asks questions to understand what the user wants.
10qodo-pr-resolver
Review and resolve PR issues with Qodo - get AI-powered code review issues and fix them interactively (GitHub, GitLab, Bitbucket, Azure DevOps)
10