to-markdown

Installation
SKILL.md

Convert files to Markdown using markitdown — Microsoft's utility that extracts and structures content from many file formats.

Supported Formats

Format Notes
PDF Text extracted; table structure may be approximate
Word (.docx) Clean conversion including tables
Excel (.xlsx) Sheets as markdown tables
PowerPoint (.pptx) Slide text and structure
HTML Cleaned readable content
Images OCR (requires LLM vision for best results)
Audio Transcription via SpeechRecognition
CSV / JSON / XML Structured text
YouTube URLs Transcript extraction
EPub Chapter text

Usage

# Convert to stdout
markitdown input.pdf 2>/dev/null

# Save to file
markitdown input.pdf -o output.md 2>/dev/null

# Or redirect
markitdown input.docx 2>/dev/null > output.md

Always use 2>/dev/null to suppress noisy font/parser warnings that don't affect output quality.

Workflow

  1. Check if markitdown is installed:

    which markitdown || echo "not installed"
    
  2. Install if missing (with all format support):

    pip install 'markitdown[all]'
    
  3. Run the conversion with stderr suppressed:

    markitdown "$INPUT_FILE" 2>/dev/null
    
  4. Handle the output based on user intent:

    • Saving to the knowledge base → write to appropriate .md file
    • Quick review → show in conversation
    • Multiple files → loop and convert each

Output Quality Notes

  • PDFs: Text is extracted faithfully but table cells may land on separate lines (PDF doesn't encode table structure). If the user needs clean tables from a PDF, note this limitation.
  • Word/Excel: Usually clean output with proper table formatting.
  • Complex layouts: Multi-column PDFs or heavily formatted documents may have scrambled reading order.
  • Scanned PDFs: Image-only PDFs produce no text without OCR/LLM vision integration.

Common Use Cases

Import a document into knowledge base:

markitdown report.pdf 2>/dev/null > knowledge/competitive/raw/report.md

Convert a Word doc someone sent you:

markitdown meeting-notes.docx 2>/dev/null > notes.md

Batch convert a directory:

for f in docs/*.pdf; do
  markitdown "$f" 2>/dev/null > "${f%.pdf}.md"
done

Check what a PDF contains before deciding what to do with it:

markitdown document.pdf 2>/dev/null | head -50
Related skills

More from steveclarke/dotfiles

Installs
21
GitHub Stars
32
First Seen
Apr 15, 2026