markitdown
Installation
SKILL.md
MarkItDown
Purpose
Convert a wide variety of file formats into Markdown text using Microsoft's markitdown CLI. Useful for extracting text from documents for LLM analysis, summarization, or ingestion into knowledge bases.
Supported Formats
| Category | Formats |
|---|---|
| Documents | PDF, DOCX, PPTX, XLSX, XLS |
| Web & Data | HTML, CSV, JSON, XML |
| Media | Images (EXIF + OCR), Audio (metadata + transcription) |
| eBooks | EPub |
| Archives | ZIP (iterates over contents) |
| Other | YouTube URLs, Outlook messages |
Basic Usage
# Convert a file (output to stdout)
markitdown path/to/file.pdf
# Save output to a file
markitdown path/to/file.pdf -o output.md
# Pipe from stdin
cat path/to/file.pdf | markitdown
Options
| Flag | Description |
|---|---|
-o <file> |
Write output to a file instead of stdout |
-d |
Use Azure Document Intelligence for conversion |
-e "<endpoint>" |
Azure Document Intelligence endpoint URL |
--use-plugins |
Enable third-party plugins |
--list-plugins |
Show installed plugins |
Workflow
Single File Conversion
# Convert and capture the result
result=$(markitdown document.pdf)
# Convert and save
markitdown document.pdf -o document.md
Batch Conversion
# Convert all PDFs in a directory
for f in *.pdf; do
markitdown "$f" -o "${f%.pdf}.md"
done
Pipe into Other Tools
# Convert and count words
markitdown document.pdf | wc -w
# Convert and search for a term
markitdown document.pdf | grep -i "search term"
Agent Usage Notes
- Output goes to stdout by default. Capture it in a variable or redirect to a file.
- For large files, prefer saving to a file with
-orather than capturing stdout. - Image conversion extracts EXIF metadata and OCR text. For richer image descriptions, use the Python API with an LLM client instead.
- ZIP files are automatically extracted and each contained file is converted.
- If conversion fails for a format, check that the corresponding optional dependency is installed (e.g.,
markitdown[pdf]for PDF support).
Related skills
More from yutakobayashidev/dotnix
dce
Detect and eliminate dead code in TypeScript projects using ts-remove-unused (tsr). Use when the user wants to find unused exports, unused files, or clean up dead code.
1gha-lint
Lint and secure GitHub Actions workflows using pinact, actionlint, ghalint, and zizmor. Use when the user adds, updates, or reviews GitHub Actions workflows and wants to check correctness, security, or pin action versions.
1