to-markdown
Convert files to Markdown using markitdown — Microsoft's utility that extracts and structures content from many file formats.
Supported Formats
| Format | Notes |
|---|---|
| Text extracted; table structure may be approximate | |
| Word (.docx) | Clean conversion including tables |
| Excel (.xlsx) | Sheets as markdown tables |
| PowerPoint (.pptx) | Slide text and structure |
| HTML | Cleaned readable content |
| Images | OCR (requires LLM vision for best results) |
| Audio | Transcription via SpeechRecognition |
| CSV / JSON / XML | Structured text |
| YouTube URLs | Transcript extraction |
| EPub | Chapter text |
Usage
# Convert to stdout
markitdown input.pdf 2>/dev/null
# Save to file
markitdown input.pdf -o output.md 2>/dev/null
# Or redirect
markitdown input.docx 2>/dev/null > output.md
Always use 2>/dev/null to suppress noisy font/parser warnings that don't affect output quality.
Workflow
-
Check if markitdown is installed:
which markitdown || echo "not installed" -
Install if missing (with all format support):
pip install 'markitdown[all]' -
Run the conversion with stderr suppressed:
markitdown "$INPUT_FILE" 2>/dev/null -
Handle the output based on user intent:
- Saving to the knowledge base → write to appropriate
.mdfile - Quick review → show in conversation
- Multiple files → loop and convert each
- Saving to the knowledge base → write to appropriate
Output Quality Notes
- PDFs: Text is extracted faithfully but table cells may land on separate lines (PDF doesn't encode table structure). If the user needs clean tables from a PDF, note this limitation.
- Word/Excel: Usually clean output with proper table formatting.
- Complex layouts: Multi-column PDFs or heavily formatted documents may have scrambled reading order.
- Scanned PDFs: Image-only PDFs produce no text without OCR/LLM vision integration.
Common Use Cases
Import a document into knowledge base:
markitdown report.pdf 2>/dev/null > knowledge/competitive/raw/report.md
Convert a Word doc someone sent you:
markitdown meeting-notes.docx 2>/dev/null > notes.md
Batch convert a directory:
for f in docs/*.pdf; do
markitdown "$f" 2>/dev/null > "${f%.pdf}.md"
done
Check what a PDF contains before deciding what to do with it:
markitdown document.pdf 2>/dev/null | head -50
More from steveclarke/dotfiles
md-to-pdf
Convert markdown files to PDF using Chrome. Use when user wants to render markdown to PDF, print a document, or create a shareable PDF from markdown. Triggers on "markdown to pdf", "render to pdf", "pdf from markdown", "print this markdown".
75bruno-endpoint-creation
Create Bruno REST API endpoint configurations with proper authentication, environment setup, and documentation. Use when setting up API testing with Bruno, creating new endpoints, or configuring collection-level authentication. Triggers on "create Bruno endpoint", "Bruno API testing", "set up Bruno collection".
68readme-writer
Write and revise READMEs and technical documentation for software projects. Scores readability with Flesch-Kincaid and vocabulary profiling. Use when writing, revising, or reviewing a README, README.md, or project documentation. Triggers on "write readme", "improve readme", "readme review", "documentation writing".
56time-tracking
Manage time tracking with Toggl or Clockify. Use when user asks about time tracking, timers, timesheets, logging hours, starting/stopping work, checking what's running, viewing time entries, or creating manual entries. Triggers on "toggl", "clockify", "time tracking", "timer", "timesheet", "log time", "track time", "hours worked".
52feature-requirements
Creates structured requirements documents through guided discovery, practical scoping, and consolidated output. Produces a single requirements.md with entities, workflows, constraints, and acceptance criteria following the established feature development process.
491password
Fetch secrets and create/manage 1Password items via CLI. Use when needing API keys, tokens, or credentials, or when storing new secrets. Ask user for the 1Password secret reference (op://Vault/Item/field format) rather than the actual secret.
49