whisper-voice

Installation

SKILL.md

Whisper Voice — Live Speech-to-Text Mac App

Goal

Build and run a native macOS menu bar app that captures live microphone audio, transcribes it offline using WhisperKit (on Apple Silicon), and auto-types the text wherever the cursor is.

Inputs

Name	Type	Required	Description
model_size	string	No	Whisper model: tiny, base (default), small
language	string	No	"en" (default) or "hi" for Hindi mode
chunk_duration	float	No	Seconds per audio chunk (default: 3.0)

Process

1. Build the app

cd AiwithDhruv_Voice/WhisperAiwithDhruv
swift build

2. Run the app

swift run WhisperAiwithDhruv
# Or open in Xcode: open Package.swift → Cmd+R

3. First launch setup

Grant microphone permission when prompted
Grant Accessibility in System Settings → Privacy → Accessibility
Wait for model download (~140MB for base model)

4. Usage

Cmd+Shift+Space — Toggle recording on/off
Click mic icon in menu bar for controls
Speak — text auto-types at cursor position
Toggle Hindi mode for Hindi/Hinglish input

Outputs

Name	Type	Description
transcribed_text	string	Live transcribed text typed at cursor
history	array	Last 50 transcription entries in menu bar

Edge Cases

No mic: Shows error in menu bar dropdown
Accessibility denied: Auto-type disabled, manual copy from history
Silence: VAD skips silent chunks (energy-based threshold)
Hallucinations: Filters common Whisper artifacts ("Thank you.", "...")
Model not downloaded: Shows download progress bar

Environment

macOS 14+ (Sonoma)
Apple Silicon (M1/M2/M3/M4)
Xcode 15+ (for building)
No API keys needed (fully offline)

Schema

Inputs

Name	Type	Required	Description
model_size	string	No	tiny / base / small
language	string	No	en / hi
chunk_duration	float	No	2.0 - 8.0 seconds
silence_threshold	float	No	0.002 - 0.05

Outputs

Name	Type	Description
transcription	string	Live text output
auto_typed	boolean	Whether text was injected at cursor

Credentials

Name	Source
None	Fully offline, no API keys

Composable With

video-edit (add transcription captions), send-telegram (send transcriptions to phone)

Cost

Free — runs entirely on-device. Model download is one-time (~140MB for base).

Related skills

More from aiagentwithdhruv/skills

Installs

Repository

aiagentwithdhruv/skills

GitHub Stars

First Seen

Mar 6, 2026

Security Audits

Gen Agent Trust HubWarn

SocketPass

SnykPass

whisper-voice

Whisper Voice — Live Speech-to-Text Mac App

Goal

Inputs

Process

1. Build the app

2. Run the app

3. First launch setup

4. Usage

Outputs

Edge Cases

Environment

Schema

Inputs

Outputs

Credentials

Composable With

Cost

More from aiagentwithdhruv/skills

image-to-video

mac-control

gmaps-leads

excalidraw-visuals

video-edit

design-website