parakeet

SKILL.md

Parakeet-MLX Audio Transcription

This skill provides instructions for using parakeet-mlx to convert audio files to text using NVIDIA's Parakeet automatic speech recognition model optimized for Apple's MLX framework.

Overview

Parakeet-MLX brings NVIDIA's Parakeet ASR model to Apple Silicon, enabling fast, high-quality on-device speech-to-text conversion similar to Whisper but optimized for Apple's MLX framework.

Key Features:

  • High-quality transcription with timestamps
  • Fast processing (e.g., 1+ hour audio in ~53 seconds)
  • On-device processing (no cloud API required)
  • Output in SRT subtitle format
  • No installation required - run directly with uvx

Basic Usage

Transcribe an Audio File

The simplest way to transcribe an audio file is:

uvx parakeet-mlx /path/to/audio/file.mp3

This will:

  1. Download the model on first run (~2.5GB)
  2. Process the audio file
  3. Generate an SRT subtitle file with timestamped transcription

Example:

# Transcribe a podcast episode
uvx parakeet-mlx podcast_episode.mp3

# Transcribe an interview
uvx parakeet-mlx interview.wav

# Transcribe a meeting recording
uvx parakeet-mlx meeting_recording.m4a

Important Notes

First Run

The first time you run parakeet-mlx, it will download a ~2.5GB model file. This may take several minutes depending on your internet connection. Subsequent runs will use the cached model and start immediately.

Performance

Parakeet-MLX is optimized for Apple Silicon and provides excellent performance:

  • A 65MB, 1 hour 1 minute 28 second podcast was transcribed in 53 seconds
  • Performance scales with audio duration, not file size
  • Processing happens entirely on-device

Output Format

The tool generates an SRT (SubRip Subtitle) file with the same name as your input file:

Input: podcast.mp3 Output: podcast.srt

The SRT format includes:

  • Sequential subtitle numbers
  • Timestamp ranges (start --> end)
  • Transcribed text for each segment

Example SRT output:

1
00:00:00,000 --> 00:00:03,500
Welcome to the podcast. Today we're discussing...

2
00:00:03,500 --> 00:00:08,200
The latest developments in machine learning...

Supported Audio Formats

While the documentation explicitly mentions MP3, parakeet-mlx likely supports common audio formats including:

  • MP3
  • WAV
  • M4A
  • FLAC
  • OGG

If you encounter format issues, consider converting your audio file to MP3 or WAV first using tools like ffmpeg.

Working with Output

View the Transcription

# View the entire transcription
cat output.srt

# View just the text (remove timestamps)
grep -v "^[0-9]*$" output.srt | grep -v "^[0-9][0-9]:[0-9][0-9]:[0-9][0-9]"

Convert SRT to Plain Text

If you need plain text without timestamps:

# Using the helper script (recommended)
python3 srt_to_text.py output.srt > transcript.txt

# Or with awk
awk 'NF && !/^[0-9]+$/ && !/^[0-9]{2}:[0-9]{2}:[0-9]{2}/' output.srt > transcript.txt

Helper Scripts

srt_to_text.py

This skill includes a Python helper script for converting SRT files to plain text. Located at skills/parakeet/srt_to_text.py.

Usage:

# Basic conversion (outputs to stdout)
python3 srt_to_text.py transcript.srt

# Save to file
python3 srt_to_text.py transcript.srt --output transcript.txt

# Preserve paragraph breaks between subtitle blocks
python3 srt_to_text.py transcript.srt --paragraphs

# Pipe to other tools
python3 srt_to_text.py transcript.srt | llm "Summarize this"

Features:

  • Strips subtitle numbers and timestamps
  • Joins text into continuous prose (or preserves paragraphs with --paragraphs)
  • Supports stdin input for piping
  • Clean output suitable for LLM processing

See also: The llm skill includes an audio-to-article.yaml template designed to clean up transcripts from this helper into polished articles.

Use with Other Tools

The SRT format is widely supported and can be:

  • Imported into video editing software
  • Used with subtitle players
  • Converted to other formats (VTT, ASS, etc.)
  • Processed with text analysis tools

Common Workflows

Transcribe Multiple Files

# Process all MP3 files in a directory
for file in *.mp3; do
    echo "Processing $file..."
    uvx parakeet-mlx "$file"
done

Transcribe and Extract Text

# Transcribe and immediately extract plain text
uvx parakeet-mlx audio.mp3
awk 'NF && !/^[0-9]+$/ && !/^[0-9]{2}:[0-9]{2}:[0-9]{2}/' audio.srt > audio.txt

Check Transcription Quality

After transcription, review the SRT file to verify:

  • Proper segmentation of speech
  • Accurate timestamps
  • Text quality and accuracy

Troubleshooting

Model Download Issues

If the model download fails or is interrupted:

  1. Check your internet connection
  2. Ensure you have ~2.5GB of free disk space
  3. Try running the command again (it may resume download)

Performance Issues

If transcription is slow:

  • Ensure you're running on Apple Silicon (M1/M2/M3)
  • Close other resource-intensive applications
  • For very long files, consider splitting them into smaller segments

Audio Format Not Supported

If you get an error about unsupported format:

# Convert to MP3 using ffmpeg
ffmpeg -i input.m4a -acodec libmp3lame -ab 192k output.mp3
uvx parakeet-mlx output.mp3

Use Cases

Parakeet-MLX is ideal for:

  • Podcast transcription - Generate searchable text from episodes
  • Interview documentation - Convert recorded interviews to text
  • Meeting notes - Transcribe meeting recordings for reference
  • Content creation - Generate captions or show notes
  • Accessibility - Create subtitles for audio/video content
  • Research - Analyze spoken content at scale

Advantages Over Other Tools

vs. Whisper:

  • Optimized specifically for Apple MLX framework
  • Potentially faster on Apple Silicon
  • Similar quality output

vs. Cloud APIs:

  • No API costs
  • Complete privacy (on-device processing)
  • No internet required after model download
  • No file size limits

vs. Manual Transcription:

  • Dramatically faster (hours to minutes)
  • Consistent quality
  • Includes precise timestamps

Tips for Best Results

  1. Audio Quality Matters: Clear audio with minimal background noise produces better results
  2. Speaker Clarity: Single speaker or well-separated multi-speaker audio works best
  3. Review Output: Always review the transcription for accuracy, especially for technical terms or names
  4. Use Timestamps: The SRT format's timestamps are valuable for referencing specific moments
  5. Batch Processing: Process multiple files in sequence for efficiency

References

Notes

  • Parakeet-MLX runs entirely on-device, ensuring privacy
  • The tool is designed for Apple Silicon; performance on Intel Macs may vary
  • First run requires internet for model download
  • Output quality is reported as "very high" for clear audio
  • The tool is optimized for speech recognition, not music or other audio types
Weekly Installs
8
GitHub Stars
1
First Seen
Jan 24, 2026
Installed on
gemini-cli6
claude-code5
antigravity5
codex5
cursor5
opencode5