markitdown

Installation
SKILL.md

markitdown

Convert any file or document to markdown using Microsoft's markitdown CLI. It produces LLM-optimized markdown output — clean, structured, and ready to reason over.

Supported Formats

Format Extensions Notes
PDF .pdf Text extraction; scanned PDFs may be limited
Word .docx, .doc Full text + headings
Excel .xlsx, .xls Tables per sheet
PowerPoint .pptx, .ppt Slide text + notes
HTML .html, .htm Rendered text
CSV .csv Table format
JSON / XML .json, .xml Structured data
Images .jpg, .png, .gif, .webp EXIF metadata + OCR description
Audio .wav, .mp3, .m4a Transcription
EPub .epub E-book content
Outlook MSG .msg Email content
ZIP .zip Extracts and converts contents
YouTube URL URL Transcript extraction

Step 0: Ensure dependencies are installed

markitdown (Python package):

which markitdown || pip install 'markitdown[all]'

The [all] extras cover PDF, DOCX, XLSX, PPTX, audio, and image Python dependencies in one shot — worth the one-time cost to avoid "unsupported format" errors.

System binariespip cannot install these; they must come from the OS package manager. Only install what the file type requires:

File type Needs Check Install (Debian/Ubuntu)
Audio (.wav, .mp3, .m4a) ffmpeg which ffmpeg apt-get install -y ffmpeg
Images (OCR text) tesseract which tesseract apt-get install -y tesseract-ocr

Check before attempting audio or image conversion — missing system binaries produce empty output with no error message, which is confusing. If on macOS, use brew install ffmpeg / brew install tesseract instead.

Step 1: Identify the file

Confirm the file path from the user's message. If the path is ambiguous or relative, resolve it (e.g., ~/Downloads/report.pdf expands to the full path). Verify it exists before running.

Step 2: Convert

Read inline — output markdown to stdout, use directly in the conversation:

markitdown "path/to/file.pdf"

Save to file — useful for large documents or when the user wants a persistent .md file:

markitdown "path/to/file.pdf" -o "path/to/output.md"

Batch convert — convert all files of a type in a directory:

for f in /path/to/dir/*.docx; do
  markitdown "$f" -o "${f%.docx}.md"
done

Always quote file paths to handle spaces and special characters correctly.

Step 3: Present results

After conversion:

  • Answer the user's question from the content — don't dump raw markdown unless they explicitly ask for it. If they asked "what's in this spreadsheet?", synthesize the answer.
  • For saved files, confirm the output path and approximate size.
  • For large output (200+ pages, very large spreadsheets): summarize the document structure first ("This PDF has 5 sections: Executive Summary, Financials, Appendices..."), then ask which sections to focus on. Dumping a 200-page PDF into context degrades response quality.

Error Handling

Error Fix
command not found: markitdown Run Step 0 to install
File not found Check path; ask user to confirm location
Unsupported file format Check extension; try markitdown[all] if not installed with extras
Password-protected file markitdown cannot decrypt; ask user to provide an unlocked copy
Empty output from audio file ffmpeg not installed — run apt-get install -y ffmpeg
Empty output from image file tesseract not installed — run apt-get install -y tesseract-ocr
Empty output from PDF File is likely a scanned image PDF with no text layer; inform user
YouTube URL fails Network issue or transcript disabled; try downloading audio first

Important Notes

  • markitdown is designed for LLM consumption — the output prioritizes structure and readability over pixel-perfect fidelity.
  • For local files only: URLs to remote documents must be downloaded first (e.g., curl -o /tmp/file.pdf "URL"), then converted.
  • The -o flag writes directly to disk without printing to stdout — useful when you don't want to flood the context window.
Related skills

More from thangphistudent/myskills

Installs
1
First Seen
Mar 26, 2026