markitdown

Convert any file or document to markdown using Microsoft's markitdown CLI. It produces LLM-optimized markdown output — clean, structured, and ready to reason over.

Supported Formats

Format	Extensions	Notes
PDF	.pdf	Text extraction; scanned PDFs may be limited
Word	.docx, .doc	Full text + headings
Excel	.xlsx, .xls	Tables per sheet
PowerPoint	.pptx, .ppt	Slide text + notes
HTML	.html, .htm	Rendered text
CSV	.csv	Table format
JSON / XML	.json, .xml	Structured data
Images	.jpg, .png, .gif, .webp	EXIF metadata + OCR description
Audio	.wav, .mp3, .m4a	Transcription
EPub	.epub	E-book content
Outlook MSG	.msg	Email content
ZIP	.zip	Extracts and converts contents
YouTube URL	URL	Transcript extraction

Step 0: Ensure dependencies are installed

markitdown (Python package):

which markitdown || pip install 'markitdown[all]'

The [all] extras cover PDF, DOCX, XLSX, PPTX, audio, and image Python dependencies in one shot — worth the one-time cost to avoid "unsupported format" errors.

System binaries — pip cannot install these; they must come from the OS package manager. Only install what the file type requires:

File type	Needs	Check	Install (Debian/Ubuntu)
Audio (.wav, .mp3, .m4a)	`ffmpeg`	`which ffmpeg`	`apt-get install -y ffmpeg`
Images (OCR text)	`tesseract`	`which tesseract`	`apt-get install -y tesseract-ocr`

Check before attempting audio or image conversion — missing system binaries produce empty output with no error message, which is confusing. If on macOS, use brew install ffmpeg / brew install tesseract instead.

Step 1: Identify the file

Confirm the file path from the user's message. If the path is ambiguous or relative, resolve it (e.g., ~/Downloads/report.pdf expands to the full path). Verify it exists before running.

Step 2: Convert

Read inline — output markdown to stdout, use directly in the conversation:

markitdown "path/to/file.pdf"

Save to file — useful for large documents or when the user wants a persistent .md file:

markitdown "path/to/file.pdf" -o "path/to/output.md"

Batch convert — convert all files of a type in a directory:

for f in /path/to/dir/*.docx; do
  markitdown "$f" -o "${f%.docx}.md"
done

Always quote file paths to handle spaces and special characters correctly.

Step 3: Present results

After conversion:

Answer the user's question from the content — don't dump raw markdown unless they explicitly ask for it. If they asked "what's in this spreadsheet?", synthesize the answer.
For saved files, confirm the output path and approximate size.
For large output (200+ pages, very large spreadsheets): summarize the document structure first ("This PDF has 5 sections: Executive Summary, Financials, Appendices..."), then ask which sections to focus on. Dumping a 200-page PDF into context degrades response quality.

Error Handling

Error	Fix
`command not found: markitdown`	Run Step 0 to install
`File not found`	Check path; ask user to confirm location
`Unsupported file format`	Check extension; try `markitdown[all]` if not installed with extras
Password-protected file	markitdown cannot decrypt; ask user to provide an unlocked copy
Empty output from audio file	`ffmpeg` not installed — run `apt-get install -y ffmpeg`
Empty output from image file	`tesseract` not installed — run `apt-get install -y tesseract-ocr`
Empty output from PDF	File is likely a scanned image PDF with no text layer; inform user
YouTube URL fails	Network issue or transcript disabled; try downloading audio first

Important Notes

markitdown is designed for LLM consumption — the output prioritizes structure and readability over pixel-perfect fidelity.
For local files only: URLs to remote documents must be downloaded first (e.g., curl -o /tmp/file.pdf "URL"), then converted.
The -o flag writes directly to disk without printing to stdout — useful when you don't want to flood the context window.

markitdown

markitdown

Supported Formats

Step 0: Ensure dependencies are installed

Step 1: Identify the file

Step 2: Convert

Step 3: Present results

Error Handling

Important Notes

More from thangphistudent/myskills

archon

sysmedic