parakeet
Parakeet Dictation Skill
Local speech-to-text powered by NVIDIA Parakeet TDT 0.6B V3 (~600MB model, 100% offline).
Two Modes
1. Handy App (Primary — Push-to-Talk into Any Text Field)
Handy is a free, open-source Tauri app (Rust + React) providing push-to-talk dictation with Parakeet V3 built in. Inference via transcribe-rs (ONNX Runtime, int8 quantized).
brew install --cask handy
- Default hotkey: ⌥Space (Option-Space) on macOS, Ctrl-Space on Windows/Linux
- Modes: Push-to-talk (hold) or toggle (press to start/stop)
- Select Parakeet V3 in Settings → Models (auto-downloads ~478MB)
- Grant microphone + accessibility permissions
- Includes VAD (Silero), model management UI
- Additional models: Whisper (Small/Medium/Turbo/Large), Moonshine, SenseVoice
- Models stored at
~/Library/Application Support/com.pais.handy/models/
2. CLI Scripts (Claude Code File Transcription & Terminal Dictation)
CLI scripts remain for headless/terminal use within Claude Code. These use NeMo/PyTorch.
Performance
| System | Speed | Engine |
|---|---|---|
| Handy (M4 Max) | ~30x realtime | transcribe-rs / ONNX int8 |
| Handy (Zen 3) | ~20x realtime | transcribe-rs / ONNX int8 |
| Handy (Skylake i5) | ~5x realtime | transcribe-rs / ONNX int8 |
| NeMo CLI (MPS) | Varies | NeMo / PyTorch |
- Accuracy: 6.05% WER (Word Error Rate)
- Languages: 25 European languages with automatic detection (no prompting)
- Privacy: 100% local processing, no cloud API
- License: CC BY 4.0 (model), MIT (Handy app)
Commands
Transcribe Audio File
/parakeet path/to/audio.wav
/parakeet ~/recordings/interview.mp3
/parakeet meeting.m4a
Supported formats: .wav, .mp3, .m4a, .flac, .ogg, .aac
Live Dictation (Terminal)
/parakeet
/parakeet dictate
Record from microphone until Enter is pressed, then transcribe.
Check Installation
/parakeet check
Verify Parakeet is properly installed and model can load.
Setup
Handy (Push-to-Talk UI)
brew install --cask handy
Launch from Applications, select Parakeet V3 model, configure hotkey.
CLI Scripts (Prerequisites)
- Parakeet Dictate repo at
~/Programming/parakeet-dictate/with Python venv - Install dependencies:
cd ~/Programming/parakeet-dictate uv venv && uv pip install -r requirements.txt - (Optional) Set custom path:
export PARAKEET_HOME=/path/to/parakeet-dictate
Implementation
When this skill is invoked:
-
For audio files: Run the transcription script
cd ~/.claude/skills/parakeet/scripts && \ ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python transcribe.py "<filepath>" -
For live dictation: Run the dictation script
cd ~/.claude/skills/parakeet/scripts && \ ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python dictate.py -
For checking setup: Run the check script
cd ~/.claude/skills/parakeet/scripts && \ ${PARAKEET_HOME:-~/Programming/parakeet-dictate}/.venv/bin/python check_setup.py
Model Caches
| System | Cache Location | Size | Engine |
|---|---|---|---|
| Handy | ~/Library/Application Support/com.pais.handy/models/ |
~478MB | transcribe-rs (ONNX int8) |
| NeMo CLI | ~/.cache/nemo/ |
~1.2GB | NeMo / PyTorch |
Model caches are separate. Handy's Parakeet V3 int8 model structure:
parakeet-tdt-0.6b-v3-int8/
├── encoder-model.int8.onnx
├── decoder_joint-model.int8.onnx
├── nemo128.onnx (audio preprocessor)
└── vocab.txt
Troubleshooting
"No module named nemo"
Use the Parakeet virtual environment. Scripts automatically use the correct Python.
"MPS not available"
Apple Silicon Metal acceleration requires PyTorch 2.0+. Falls back to CPU automatically.
"Permission denied: microphone"
Grant microphone access in System Preferences → Privacy & Security → Microphone.
Model download slow
The Parakeet model downloads on first use (~478MB for Handy, ~1.2GB for NeMo). Subsequent runs use cache.
Configuration
| Variable | Default | Description |
|---|---|---|
PARAKEET_HOME |
~/Programming/parakeet-dictate |
Parakeet Dictate installation path |
Dependencies
Handy: brew install --cask handy (standalone, no other deps)
CLI scripts require:
- Parakeet Dictate repo at
$PARAKEET_HOME(default:~/Programming/parakeet-dictate) - Python virtual environment at
$PARAKEET_HOME/.venv - NeMo toolkit with ASR support (
nemo_toolkit[asr]>=2.0.0) - PyTorch 2.0+ (for MPS/CUDA acceleration)
- soundfile and sounddevice for audio handling
More from tdimino/claude-code-minoan
academic-research
Search academic papers, build literature reviews, and synthesize research findings — combines Exa MCP (research_paper category, arxiv filtering) with arxiv-mcp-server for paper discovery, download, and deep analysis. Triggers on academic paper, literature review, research synthesis, arxiv, find papers, scholarly search.
69travel-requirements-expert
Plan a trip, create an itinerary, or research a destination through a structured 5-phase workflow---discovery questions, Exa/Firecrawl research, expert detail gathering, and a day-by-day requirements spec. This skill should be used when a user says "plan a trip," "create an itinerary," "help me visit [place]," or needs travel research with specific venues, safety protocols, and dietary accommodations.
67twilio-api
Use this skill when working with Twilio communication APIs for SMS/MMS messaging, voice calls, phone number management, TwiML, webhook integration, two-way SMS conversations, bulk sending, or production deployment of telephony features. Includes official Twilio patterns, production code examples from Twilio-Aldea (provider-agnostic webhooks, signature validation, TwiML responses), and comprehensive TypeScript examples.
65figma-mcp
Convert Figma designs into production-ready code using MCP server tools. Use this skill when users provide Figma URLs, request design-to-code conversion, ask to implement Figma mockups, or need to extract design tokens and system values from Figma files. Works with frames, components, and entire design files to generate HTML, CSS, React, or other frontend code.
61firecrawl
Scrape web pages to clean markdown using Firecrawl v2 — handles JS-heavy pages, site crawls, URL mapping, document parsing (PDF/DOCX/XLSX), LLM-powered extraction, autonomous agent scraping, and post-scrape browser interaction (Interact API). Prefer over WebFetch for quality and completeness. Triggers on scrape URL, fetch page, crawl site, extract content, parse document, web to markdown, DeepWiki, Firecrawl.
51scrapling
Scrape pages locally with anti-bot bypass, TLS impersonation, and adaptive element tracking — no API keys, no cloud. Handles Cloudflare protection, CSS/XPath element extraction, and survives site redesigns. Complements firecrawl (cloud) with 100% local execution. Triggers on Cloudflare bypass, anti-bot scraping, stealth fetch, local scraping, Scrapling.
47