transcribe-audio

Installation
SKILL.md

Audio Transcription

Transcribe audio files to markdown and support post-processing (Q&A, action items, summaries).

Workflow

1. Identify Audio Files

Find audio files matching the user's request:

  • Single file: user specifies path directly
  • Batch: find <dir> -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.wav" -o -name "*.ogg" -o -name "*.m4a" -o -name "*.flac" -o -name "*.webm" \) | sort

2. Check for Existing Transcripts

For each audio file, check if a sibling .md file exists (e.g. meeting.mp3meeting.md):

  • Exists + user wants transcription: Ask whether to re-transcribe or use existing
  • Exists + user wants analysis: Read the existing .md directly — no need to transcribe
  • Does not exist: Proceed with transcription

3. Transcribe

Run the script for each file:

./scripts/transcribe.sh <audio-file> [custom-prompt] > <output.md>
  • Output file: same name as audio, with .md extension, same directory
  • Default prompt handles speaker identification, timestamps, summary, action items
  • Pass a custom prompt as second argument when the user requests different output or a focused transcription (see below)

The script outputs the transcript to stdout and progress to stderr. Capture stdout to the .md file.

Focused Transcription

When the user asks about a specific topic (e.g. "tell me about the Miro discussion", "what was said about budgets?"), pass a focused prompt as the second argument instead of doing a full transcription and then grep/reading:

./scripts/transcribe.sh <audio-file> "Focus on the parts of this audio that discuss <TOPIC>. Provide:
1. A detailed transcript of just those sections (with speaker labels and timestamps)
2. A summary of what was said about <TOPIC>
3. Any decisions, action items, or open questions related to <TOPIC>
Skip unrelated parts of the audio." > <output-focus.md>
  • Output file for focused transcripts: use a suffix to avoid overwriting the full transcript, e.g. meeting.focus-miro.md
  • When to use: The user asks about a specific topic AND there is no existing full transcript to search, OR the user explicitly asks to re-transcribe with a focus
  • When NOT to use: A full transcript already exists — just read it and answer the question directly

4. Post-Processing

After transcription (or when an existing transcript is available), support any follow-up:

  • Read the .md file and answer questions about the content
  • Extract action items or TODOs
  • Provide additional summaries or analysis
  • Compare across multiple transcripts

Key Details

  • Supported formats: .mp3, .wav, .ogg, .m4a, .flac, .webm
  • API: Gemini via Portkey (key from pass api/portkey-claude)
  • Timeout: 600s per file — long recordings take time
  • Max file size: 200MB per file

Script Execution: Scripts should be executed from the skill directory. All scripts use Nix shebangs so no manual dependency installation is required.

Related skills

More from markus1189/nixos-config

Installs
1
GitHub Stars
5
First Seen
Mar 29, 2026