mistral-pdf-to-markdown
SKILL.md
Mistral PDF to Markdown Converter
Convert PDF documents to Markdown format using Mistral's OCR API. Automatically extracts text, formatting, and images.
When to Use
- Converting research papers or documents to Markdown
- Extracting text from scanned PDFs (OCR capability)
- Preserving document structure with headers and formatting
- Extracting embedded images from PDFs
Quick Start
Use the conversion script from this skill's directory:
# Convert entire PDF
python scripts/convert_pdf_to_markdown.py input.pdf output.md
# Convert specific pages
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1-5"
python scripts/convert_pdf_to_markdown.py input.pdf output.md --pages "1,3,5"
Output Structure
Output/PDFConversions/
├── document.md # Markdown with text and image references
└── images/
├── img-0.jpeg # Extracted images
├── img-1.jpeg
└── ...
Usage in Code
from pathlib import Path
import subprocess
# Run conversion script
result = subprocess.run([
"python",
".claude/skills/mistral-pdf-to-markdown/scripts/convert_pdf_to_markdown.py",
"input.pdf",
"Output/PDFConversions/output.md",
"--pages", "1-10"
], capture_output=True, text=True)
print(result.stdout)
Key Features
- Markdown formatting: Preserves headers, lists, and structure
- Image extraction: Saves images to
images/subfolder automatically - Page selection: Extract specific pages or ranges
- Scanned PDF support: True OCR capability for image-based PDFs
- Relative paths: Image references use

Requirements
The script requires:
- Mistral API key in
Notes/.env(line 2:mistral_api_key=...) - Python packages:
mistralai,python-dotenv,pypdf
Common Use Cases
Convert Research Paper
python scripts/convert_pdf_to_markdown.py \
"Data/papers/research.pdf" \
"Notes/Paper Markdown/research.md"
Extract Specific Sections
# Extract pages 10-20 (introduction and methods)
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/intro_methods.md" \
--pages "10-20"
Extract Figures Only
# Extract pages with figures
python scripts/convert_pdf_to_markdown.py \
"paper.pdf" \
"Notes/Paper Markdown/figures.md" \
--pages "25,27,30,35"
Error Handling
API Key Not Found:
Error: Mistral API key not found in Notes/.env
→ Add mistral_api_key=YOUR_KEY to line 2 of Notes/.env
Page Out of Range:
Warning: Page 100 out of range, skipping
→ Check PDF page count and adjust page selection
API Rate Limit: → Wait a moment and retry, or reduce page count per request
Notes
- Images are saved as JPEG files in
images/subfolder - Markdown image references are automatically updated to
images/img-X.jpeg - Large PDFs may take longer to process due to API limits
- For simple text extraction without OCR, consider using the
pdfskill instead - Scanned PDFs benefit most from this skill's OCR capability
See Also
pdfskill - For local PDF manipulation without API callsreference.md- Additional details about the Mistral OCR API
Weekly Installs
26
Repository
fuzhiyu/researc…templateGitHub Stars
6
First Seen
Jan 22, 2026
Security Audits
Installed on
codex20
opencode20
gemini-cli17
claude-code15
cursor13
github-copilot12