Sh_Sci_Fig
Sh_Sci_Fig — Scientific Figure Extractor
Precisely extract figures and sub-figures from academic PDF papers.
Script Directory
Scripts in scripts/ subdirectory. Replace ${SKILL_DIR} with this SKILL.md's directory path.
| Script | Purpose |
|---|---|
scripts/extract_figure.py |
Main CLI for figure extraction |
Preferences (EXTEND.md)
Use Bash to check EXTEND.md existence (priority order):
# Check project-level first
test -f .baoyu-skills/Sh_Sci_Fig/EXTEND.md && echo "project"
# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/Sh_Sci_Fig/EXTEND.md" && echo "user"
EXTEND.md Supports: Default DPI | Default output format | Tesseract path
Usage
python ${SKILL_DIR}/scripts/extract_figure.py <input.pdf> [options]
Options
| Option | Short | Description | Default |
|---|---|---|---|
<input> |
PDF file path | Required | |
--figure |
-f |
Figure number (1, 2, 3...) | Required (except --list/--all) |
--subfigure |
-s |
Sub-figure label (a, b, c...) | None (returns whole figure) |
--output |
-o |
Output directory | Current directory |
--dpi |
-d |
Output resolution | 600 |
--list |
-l |
List all available figure numbers | false |
--all |
Extract all figures | false | |
--format |
Output format | png |
Examples
# Extract Figure 2, sub-figure c
python ${SKILL_DIR}/scripts/extract_figure.py paper.pdf -f 2 -s c
# Extract entire Figure 3
python ${SKILL_DIR}/scripts/extract_figure.py paper.pdf -f 3
# List all available figures in a PDF
python ${SKILL_DIR}/scripts/extract_figure.py paper.pdf --list
# Extract all figures
python ${SKILL_DIR}/scripts/extract_figure.py paper.pdf --all
# Custom output directory and DPI
python ${SKILL_DIR}/scripts/extract_figure.py paper.pdf -f 2 -s c -o ./output/ -d 300
Output:
Extracted: figure_2c.png (1920x1080, 600 DPI)
Error Handling
| Scenario | Behavior |
|---|---|
| Figure number not found | Error + list all available figure numbers |
| OCR recognition failed | Return entire figure region |
| Sub-figure split failed | Return entire figure region |
| No sub-figure labels found | Return entire figure region |
Tech Stack
| Library | Role |
|---|---|
| pdfplumber | Text + coordinate extraction (locate "Figure X" labels) |
| PyMuPDF (fitz) | PDF → high-quality image rendering (600 DPI) |
| opencv-python | Boundary detection, contour analysis |
| Pillow | Final cropping, format conversion |
| pytesseract | OCR for sub-figure label recognition |
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
More from shzhao27208/aut_sci_write
sci-ppt
Generate professional academic PowerPoint (PPTX) presentations from paper PDFs, structured outlines, or plain text. Use for thesis defense, seminar reports, literature presentations, and graduate school applications. Supports automatic figure extraction, LaTeX formula rendering, and bilingual (Chinese/English) layouts.
36sci-search
Academic paper search and metrics analysis. Searches arXiv, PubMed, and Web of Science simultaneously with journal impact factor data. Triggers on requests to search for papers or find literature.
35sci-zotero
Interact with your Zotero library to sync references, add citations by DOI/ISBN/PMID, and manage PDFs. Triggers on Zotero-related requests.
35sci-extract
Read a research paper end to end ,professional extraction of core insights, figures, and metadata from scientific PDF papers,produce a single Heilmeier-style analysis that doubles as both summary and critique. Use this skill whenever the user shares a research paper (PDF upload, arXiv link, arXiv ID, DOI, or pasted paper text) and asks anything that resembles "read this", "summarize this paper", "what does this paper do", "analyze this paper", "give me a Heilmeier analysis", "review this paper", or simply drops a paper into the chat with little explanation. Triggers on any request to analyze, extract, digest, summarize, review, or critically assess an academic or scientific paper — even when the user does not explicitly say "Heilmeier." Do NOT use this skill for non-academic articles, blog posts, or news.
35sci-review
Specialized workflows for drafting, refining, and responding to academic literature reviews and peer review feedback. Triggers on requests like "draft literature review on...", "respond to reviewers", "refine paper rebuttal", or "summarize research gaps". Based on the 4-stage systematic literature review structure and professional rebuttal guides for top-tier CS venues (NeurIPS, ICLR, ICML, etc.).
35sci-figure
Extracts figures and sub-figures from academic PDF papers. Supports Fig/Figure, Scheme, Chart, Supplementary Figure, Extended Data Figure (Nature), and Chinese equivalents (图/方案/示意图/附图/补充图). Sub-figure label recognition supports (a)/(A)/a)/(i)/(1)/a. formats. High-quality PNG output at configurable DPI. Use when user asks to "extract figure", "截取文献图片", "提取子图", "get figure from paper", "Scheme", "方案图", "补充图", "Supplementary Figure", or "Extended Data".
24