markitdown
SKILL.md
MarkItDown
Purpose
Convert a wide variety of file formats into Markdown text using Microsoft's markitdown CLI. Useful for extracting text from documents for LLM analysis, summarization, or ingestion into knowledge bases.
Supported Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, PPTX, XLSX, XLS |
| Web & Data | HTML, CSV, JSON, XML |
| Media | Images (EXIF + OCR), Audio (metadata + transcription) |
| eBooks | EPub |
| Archives | ZIP (iterates over contents) |
| Other | YouTube URLs, Outlook messages |
Basic Usage
# Convert a file (output to stdout)
markitdown path/to/file.pdf
# Save output to a file
markitdown path/to/file.pdf -o output.md
# Pipe from stdin
cat path/to/file.pdf | markitdown
Options
| Flag | Description |
|---|---|
-o <file> |
Write output to a file instead of stdout |
-d |
Use Azure Document Intelligence for conversion |
-e "<endpoint>" |
Azure Document Intelligence endpoint URL |
--use-plugins |
Enable third-party plugins |
--list-plugins |
Show installed plugins |
Workflow
Single File Conversion
# Convert and capture the result
result=$(markitdown document.pdf)
# Convert and save
markitdown document.pdf -o document.md
Batch Conversion
# Convert all PDFs in a directory
for f in *.pdf; do
markitdown "$f" -o "${f%.pdf}.md"
done
Pipe into Other Tools
# Convert and count words
markitdown document.pdf | wc -w
# Convert and search for a term
markitdown document.pdf | grep -i "search term"
Agent Usage Notes
- Output goes to stdout by default. Capture it in a variable or redirect to a file.
- For large files, prefer saving to a file with
-orather than capturing stdout. - Image conversion extracts EXIF metadata and OCR text. For richer image descriptions, use the Python API with an LLM client instead.
- ZIP files are automatically extracted and each contained file is converted.
- If conversion fails for a format, check that the corresponding optional dependency is installed (e.g.,
markitdown[pdf]for PDF support).
Weekly Installs
20
Repository
yutakobayashidev/dotnixGitHub Stars
3
First Seen
12 days ago
Security Audits
Installed on
cursor20
claude-code20
codex20
kiro-cli20
mcpjam17
gemini-cli17