Speech-to-Text Transcription

You can transcribe audio files to text using the OpenAI Whisper CLI tool.

Prerequisites

The following must be installed on the system:

Verify installation: whisper --help

To transcribe an audio file:

whisper <audio_file> --model base --output_format txt --output_dir /tmp

The transcript will be saved as a .txt file in the output directory.

Option	Description	Values
`--model`	Model size (larger = more accurate but slower)	`tiny`, `base`, `small`, `medium`, `large`
`--language`	Source language (auto-detect if omitted)	`zh`, `en`, `ja`, `ko`, etc.
`--output_format`	Output format	`txt`, `srt`, `vtt`, `json`
`--output_dir`	Directory for output files	Any writable path

tiny/base: Fast, suitable for clear speech in common languages
small: Good balance of speed and accuracy
medium/large: Best accuracy, recommended for noisy audio or uncommon languages

When you receive a message containing an audio file path (e.g., [User sent a voice message: /path/to/audio.ogg]), use this skill to transcribe it:

If whisper is not installed, inform the user that speech-to-text requires installing openai-whisper and ffmpeg.