markitdown
MarkItDown
Convert a document, image, audio file, or YouTube URL to Markdown using Microsoft's markitdown CLI. The skill validates the input, composes the right flags, optionally saves the result under .claude/output/markitdown/<slug>/, and reports a one-line summary.
The deterministic work — install check, validation, slug derivation, save path, command composition — happens in scripts/markitdown.sh. The skill parses $ARGUMENTS, hands them to the script, and turns the script's RESULT: lines into a human report.
Install
pip install 'markitdown[all]'
For a smaller install, pick only what you need:
| Group | Adds |
|---|---|
[pdf] |
PDF parsing |
[docx] |
Word documents |
[pptx] |
PowerPoint |
[xlsx] [xls] |
Excel |
[outlook] |
Outlook .msg |
[audio-transcription] |
MP3/WAV via local Whisper |
[youtube-transcription] |
YouTube transcripts |
[az-doc-intel] |
Azure Document Intelligence backend |
For Azure Document Intelligence, also export MARKITDOWN_DOCINTEL_ENDPOINT=https://<resource>.cognitiveservices.azure.com/ before invoking with -d.
Parameters
| Flag | Default | Effect |
|---|---|---|
-s |
off | Save Markdown to .claude/output/markitdown/<slug>/<stem>.md |
-S |
off | Force no-save (override an ambient save mode) |
-d |
off | Use Azure Document Intelligence (needs MARKITDOWN_DOCINTEL_ENDPOINT) |
-p |
off | Enable installed third-party markitdown plugins |
-k |
off | Keep data URIs (base64 images) inline in the output |
-l |
— | List installed plugins and exit |
<slug> is the kebab-cased input basename, max 5 words. Pipeline-friendly — typical downstream: /spec -s -f <path> decomposes the extracted content into workstreams; /apex -f <path> implements from it; any skill accepting -f can consume.
Workflow
-
Empty
$ARGUMENTS→ ask for a file path or URL and stop. Do not guess. -
Run the helper:
bash ${CLAUDE_SKILL_DIR}/scripts/markitdown.sh $ARGUMENTS -
The script emits
RESULT: key=valuelines — keys:bytes,slug,saved, pluspathwhen saving (order is not guaranteed; parse by key) — followed either by the converted Markdown (no-save mode, after a---separator) or nothing (save mode — the file is on disk). -
Parse the
RESULT:lines and produce the report below. -
If the script exits with
ERR: markitdown not installed(exit 127) → print the install command from## Installand stop. Never auto-install on the user's behalf. -
If the script exits with another
ERR:(file not found, missing endpoint, unknown flag) → relay the message verbatim and stop.
Output
markitdown: <input> → <bytes> bytes of Markdown
saved: <path> # only when -s
When saving, just report. When not saving, also stream the converted Markdown back to the user; if it exceeds ~80 lines, show the first 80 and tell the user to re-run with -s to capture the full output.
Examples
/markitdown ~/Downloads/report.pdf # convert, print to terminal
/markitdown -s ~/Downloads/report.pdf # convert + save under .claude/output/markitdown/report/
/markitdown -s -p deck.pptx # use third-party plugins (e.g. markitdown-ocr)
/markitdown -d invoice.pdf # Azure Document Intelligence
/markitdown -k brand.html # keep base64 images inline
/markitdown https://youtu.be/dQw4w9WgXcQ # YouTube transcript
/markitdown -l # list installed plugins, then exit
Notes
- YouTube URLs are detected by the
https?://prefix and passed straight tomarkitdown. The slug is derived from the URL's last path segment, so saved paths look like.claude/output/markitdown/dqw4w9wgxcq/dQw4w9WgXcQ.md. - Audio transcription uses local Whisper via the
[audio-transcription]extra. It's CPU-bound — warn the user before kicking off a long podcast. - Image OCR without the
markitdown-ocrplugin only reads embedded EXIF text. For pixel-level OCR,pip install markitdown-ocrand pass-p. - No silent overwrites —
markitdownitself overwrites with-o, but the slug-namespaced save path makes collisions predictable, not surprising.
Why the wrapper
markitdown is already a great CLI; this skill exists to (a) follow the repo's -s/-S/-f convention so other skills can chain on the output, (b) translate "extract this pdf" into the right invocation without forcing the user to remember -x, -m, -d, -e, and (c) emit a uniform one-line report so terminals don't render multi-MB Markdown by accident.