Transcript Fixer

Two-phase correction pipeline: deterministic dictionary rules (instant, free) followed by AI-powered error detection. Corrections accumulate in ~/.transcript-fixer/corrections.db, improving accuracy over time.

What each phase is actually good at (calibration, not a rule): the dictionary shines on recurring errors — product names, common homophones, anything you've corrected before — at zero cost and zero latency. But on a fresh database, on high-quality ASR (e.g. transcripts from a strong engine like Whisper, Otter, or Feishu / Tencent-Meeting), or in specialized domains (finance, medical, legal), the dictionary often matches almost nothing — the errors that remain are proper nouns and domain terms it has never seen. There, the AI pass does essentially all the real work. Treat Stage 1 as a cheap pre-filter for known repeats, not as the primary corrector, and don't be alarmed when it changes only a handful of lines on a clean transcript.

Prerequisites

All scripts use PEP 723 inline metadata — uv run auto-installs dependencies. Requires uv (install guide).

Quick Start

# First time: Initialize database
uv run scripts/fix_transcription.py --init