pdf-process-mineru
SKILL.md
Tool List
1. pdf_to_markdown
Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.
Description: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.
Parameters:
file_path(string, required): Absolute path to the PDF fileoutput_dir(string, required): Absolute path to the output directorybackend(string, optional): Parsing backend, options:hybrid-auto-engine(default),pipeline,vlm-auto-enginelanguage(string, optional): OCR language code, such asen(English),ch(Chinese),ja(Japanese), etc., defaults to auto-detectionenable_formula(boolean, optional): Whether to enable formula recognition, defaults to trueenable_table(boolean, optional): Whether to enable table extraction, defaults to truestart_page(integer, optional): Start page number (starting from 0), defaults to 0end_page(integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages
Return Value:
{
"success": true,
"output_path": "/path/to/output",
"markdown_content": "Converted Markdown content...",
"images": ["List of image paths"],
"tables": ["List of table information"],
"formula_count": 10
}
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'
# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'
2. pdf_to_json
Convert PDF documents to JSON format, including detailed layout and structural information.
Description: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.
Parameters:
file_path(string, required): Absolute path to the PDF fileoutput_dir(string, required): Absolute path to the output directorybackend(string, optional): Parsing backend, options:hybrid-auto-engine(default),pipeline,vlm-auto-enginelanguage(string, optional): OCR language code, such asen(English),ch(Chinese),ja(Japanese), etc., defaults to auto-detectionenable_formula(boolean, optional): Whether to enable formula recognition, defaults to trueenable_table(boolean, optional): Whether to enable table extraction, defaults to truestart_page(integer, optional): Start page number (starting from 0), defaults to 0end_page(integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages
Return Value:
{
"success": true,
"output_path": "/path/to/output.json",
"pages": [
{
"page_no": 0,
"page_size": [595, 842],
"blocks": [
{
"type": "text",
"text": "Text content",
"bbox": [x, y, x, y]
}
],
"images": [],
"tables": [],
"formulas": []
}
],
"metadata": {
"total_pages": 10,
"author": "Author",
"title": "Title"
}
}
Examples:
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'
# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
'{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'
Installation Instructions
1. Install MinerU
# Update pip and install uv
pip install --upgrade pip
pip install uv
# Install MinerU (including all features)
uv pip install -U "mineru[all]"
2. Verify Installation
# Check if MinerU is installed successfully
mineru --version
# Test basic functionality
mineru --help
3. System Requirements
- Python Version: 3.10-3.13
- Operating System: Linux / Windows / macOS 14.0+
- Memory:
- Using
pipelinebackend: minimum 16GB, recommended 32GB+ - Using
hybrid/vlmbackend: minimum 16GB, recommended 32GB+
- Using
- Disk Space: minimum 20GB (SSD recommended)
- GPU (optional):
pipelinebackend: supports CPU-onlyhybrid/vlmbackend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon
Use Cases
- Academic Paper Parsing: Extract structured content such as formulas, tables, and images
- Technical Document Conversion: Convert PDF documents to Markdown for version control and online publishing
- OCR Processing: Process scanned PDFs and garbled PDFs
- Multilingual Documents: Supports OCR recognition for 109 languages
- Batch Processing: Batch convert multiple PDF documents
Backend Selection Recommendations
- hybrid-auto-engine (default): Balanced accuracy and speed, suitable for most scenarios
- pipeline: Suitable for CPU-only environments, best compatibility
- vlm-auto-engine: Highest accuracy, requires GPU acceleration
Notes
- File Paths: All paths must be absolute paths
- Output Directory: Non-existent directories will be created automatically
- Performance: Using GPU can significantly improve parsing speed
- Page Numbers: Page numbers start counting from 0
- Memory: Processing large documents may consume more memory
Troubleshooting
Common Issues
-
Installation Failure:
- Ensure using Python 3.10-3.13
- Windows only supports Python 3.10-3.12 (ray does not support 3.13)
- Using
uv pip installcan resolve most dependency conflicts
-
Insufficient Memory:
- Use
pipelinebackend - Limit parsing pages:
start_pageandend_page - Reduce virtual memory allocation
- Use
-
Slow Parsing Speed:
- Enable GPU acceleration
- Use
hybrid-auto-enginebackend - Disable unnecessary features (formulas, tables)
-
Low OCR Accuracy:
- Specify the correct document language
- Ensure the backend supports OCR (use
pipelineorhybrid-*)
Related Resources
- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
- MinerU GitHub: https://github.com/opendatalab/MinerU
- Online Demo: https://mineru.net/
Weekly Installs
11
Source
skills.volces.c…b/baokuiFirst Seen
5 days ago
Installed on
openclaw10
amp1
cline1
opencode1
cursor1
kimi-cli1