whisper-voice
Whisper Voice — Live Speech-to-Text Mac App
Goal
Build and run a native macOS menu bar app that captures live microphone audio, transcribes it offline using WhisperKit (on Apple Silicon), and auto-types the text wherever the cursor is.
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_size | string | No | Whisper model: tiny, base (default), small |
| language | string | No | "en" (default) or "hi" for Hindi mode |
| chunk_duration | float | No | Seconds per audio chunk (default: 3.0) |
Process
1. Build the app
cd AiwithDhruv_Voice/WhisperAiwithDhruv
swift build
2. Run the app
swift run WhisperAiwithDhruv
# Or open in Xcode: open Package.swift → Cmd+R
3. First launch setup
- Grant microphone permission when prompted
- Grant Accessibility in System Settings → Privacy → Accessibility
- Wait for model download (~140MB for base model)
4. Usage
- Cmd+Shift+Space — Toggle recording on/off
- Click mic icon in menu bar for controls
- Speak — text auto-types at cursor position
- Toggle Hindi mode for Hindi/Hinglish input
Outputs
| Name | Type | Description |
|---|---|---|
| transcribed_text | string | Live transcribed text typed at cursor |
| history | array | Last 50 transcription entries in menu bar |
Edge Cases
- No mic: Shows error in menu bar dropdown
- Accessibility denied: Auto-type disabled, manual copy from history
- Silence: VAD skips silent chunks (energy-based threshold)
- Hallucinations: Filters common Whisper artifacts ("Thank you.", "...")
- Model not downloaded: Shows download progress bar
Environment
- macOS 14+ (Sonoma)
- Apple Silicon (M1/M2/M3/M4)
- Xcode 15+ (for building)
- No API keys needed (fully offline)
Schema
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_size | string | No | tiny / base / small |
| language | string | No | en / hi |
| chunk_duration | float | No | 2.0 - 8.0 seconds |
| silence_threshold | float | No | 0.002 - 0.05 |
Outputs
| Name | Type | Description |
|---|---|---|
| transcription | string | Live text output |
| auto_typed | boolean | Whether text was injected at cursor |
Credentials
| Name | Source |
|---|---|
| None | Fully offline, no API keys |
Composable With
video-edit (add transcription captions), send-telegram (send transcriptions to phone)
Cost
Free — runs entirely on-device. Model download is one-time (~140MB for base).
More from aiagentwithdhruv/skills
image-to-video
Generate AI video from static images using Kling 3.0, Hailuo, Luma Ray3, Runway Gen-4.5, and 8 other tools. Covers free vs paid tools, prompt writing (motion-only), camera control, and face stability. Use when user asks to animate an image, create AI video, or convert photo to video.
91mac-control
MCP server for AI-powered macOS control — apps, display, audio, files, screenshots, clipboard
60gmaps-leads
Scrape Google Maps for B2B leads with deep website enrichment and contact extraction. Use when user asks to find local businesses, scrape Google Maps, generate contractor lists, or build local service business databases.
42excalidraw-visuals
Use when someone asks for a hand-drawn visual, PNG image, rendered diagram, visual explanation, or says "excalidraw image" or "excalidraw visual". This generates PNG images, not editable files.
34video-edit
Complete video editing toolkit - silence removal, auto-captions, vertical crop, YouTube clipping, 3D transitions, and social media compression. Use when user asks to edit video, remove silences, add captions/subtitles, crop to vertical/shorts, download YouTube clips, compress video, or create video teasers.
29design-website
Generate a premium mockup website for a prospect using the buildinamsterdam.com template style. Use when user asks to design a website, create a mockup, or build a prospect website.
27