PDF Skill — Action Routing
CRITICAL: Decide Your Action FIRST
Before doing anything, classify the user's request and follow the MANDATORY action:
| User wants to... | MANDATORY action |
|---|---|
| Convert .md to .pdf / Generate PDF from markdown / 把 md 转成 pdf | MUST use bash to run md_to_pdf.py script (see below) |
| Create a new PDF from scratch | Use bash to run a Python script with reportlab |
| Read/extract text from a PDF | Use read_file or bash with pdfplumber |
| Merge/split/rotate/encrypt PDFs | Use bash with pypdf |
| Extract tables from a PDF | Use bash with pdfplumber |
| Fill a PDF form | Read FORMS.md first |
| OCR / read text from scanned PDF | Convert to images → python_repl with vision LLM (see OCR below) |
Markdown → PDF Conversion (MOST COMMON)
ALWAYS use the bash tool to run the built-in script. NEVER use read_file for this task.
bash: python <absolute_path_to_skill>/scripts/md_to_pdf.py <input.md> <output.pdf>
The script path will be listed under "Available Scripts" in the prompt. Use that absolute path directly.
Example:
bash: python /path/to/skills/pdf/scripts/md_to_pdf.py /workspace/report.md /workspace/report.pdf
Features: CJK support, headings, lists, code blocks, tables, bold/italic, blockquotes, horizontal rules.
If reportlab is not installed: bash: pip install reportlab
Reading/Extracting from Existing PDFs
Extract text
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
print(page.extract_text())
Extract tables
with pdfplumber.open("document.pdf") as pdf:
for page in pdf.pages:
for table in page.extract_tables():
for row in table:
print(row)
Merge / Split / Rotate
Merge
from pypdf import PdfWriter, PdfReader
writer = PdfWriter()
for f in ["a.pdf", "b.pdf"]:
for page in PdfReader(f).pages:
writer.add_page(page)
with open("merged.pdf", "wb") as out:
writer.write(out)
Split
reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
w = PdfWriter()
w.add_page(page)
with open(f"page_{i+1}.pdf", "wb") as out:
w.write(out)
Rotate
reader = PdfReader("input.pdf")
writer = PdfWriter()
page = reader.pages[0]
page.rotate(90)
writer.add_page(page)
with open("rotated.pdf", "wb") as out:
writer.write(out)
Password Protection
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.encrypt("userpassword", "ownerpassword")
with open("encrypted.pdf", "wb") as out:
writer.write(out)
Create PDF from Scratch (reportlab)
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer
from reportlab.lib.styles import getSampleStyleSheet
doc = SimpleDocTemplate("output.pdf", pagesize=letter)
styles = getSampleStyleSheet()
story = [Paragraph("Title", styles['Title']), Spacer(1, 12), Paragraph("Body text", styles['Normal'])]
doc.build(story)
IMPORTANT: Never use Unicode subscript/superscript characters in ReportLab. Use <sub> and <super> tags instead.
OCR Scanned PDFs (via Vision LLM)
When pdfplumber.extract_text() returns empty or garbled text, the PDF is likely scanned/image-based. Use python_repl with a vision-capable LLM to analyze the converted images:
Step 1: Convert all PDF pages to images:
bash: python <skill_path>/scripts/convert_pdf_to_images.py <input.pdf> <output_dir>
Step 2: Use python_repl to call the vision LLM for OCR on the images:
import base64, os, json
image_dir = "<output_dir>"
images = sorted([f for f in os.listdir(image_dir) if f.lower().endswith(('.png','.jpg','.jpeg'))])
results = []
for img_file in images:
path = os.path.join(image_dir, img_file)
with open(path, "rb") as f:
b64 = base64.b64encode(f.read()).decode()
# Call your vision-capable LLM here (e.g. via API)
# Return the extracted text for each image
results.append({"file": img_file, "text": extracted_text})
print(json.dumps(results, ensure_ascii=False, indent=2))
Images are batched into minimal LLM calls automatically (up to 8 per call).
Quick Reference
| Task | Tool | Method |
|---|---|---|
| Markdown → PDF | bash + md_to_pdf.py | python scripts/md_to_pdf.py in.md out.pdf |
| Extract text | pdfplumber | page.extract_text() |
| Extract tables | pdfplumber | page.extract_tables() |
| Merge/split/rotate | pypdf | PdfReader + PdfWriter |
| Create from scratch | reportlab | SimpleDocTemplate |
| Fill forms | see FORMS.md | — |
| OCR scanned | python_repl + vision LLM |
Convert pages to images → call vision LLM via python_repl |
References
- FORMS.md — PDF form filling
- REFERENCE.md — Advanced features, JS libraries, troubleshooting
More from memento-teams/memento-skills
filesystem
Direct filesystem operations (read, write, edit, list, search files). Use for any file manipulation tasks.
12docx
Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of \"Word doc\", \"word document\", \".docx\", or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a \"report\", \"memo\", \"letter\", \"template\", or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
9web-search
Web search and content fetching. Use when the user needs to search the web for information or fetch content from URLs.
9skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
9pptx
Use this skill any time a .pptx file is involved in any way — as input, output, or both. This includes: creating slide decks, pitch decks, or presentations; reading, parsing, or extracting text from any .pptx file (even if the extracted content will be used elsewhere, like in an email or summary); editing, modifying, or updating existing presentations; combining or splitting slide files; working with templates, layouts, speaker notes, or comments. Trigger whenever the user mentions \"deck,\" \"slides,\" \"presentation,\" or references a .pptx filename, regardless of what they plan to do with the content afterward. If a .pptx file needs to be opened, created, or touched, use this skill.
8xlsx
Open, create, read, analyze, edit, or validate Excel/spreadsheet files (.xlsx, .xlsm, .csv, .tsv). Use when the user asks to create, build, modify, analyze, read, validate, or format any Excel spreadsheet, financial model, pivot table, or tabular data file. Covers: creating new xlsx from scratch, reading and analyzing existing files, editing existing xlsx with zero format loss, formula recalculation and validation, and applying professional financial formatting standards. Triggers on 'spreadsheet', 'Excel', '.xlsx', '.csv', 'pivot table', 'financial model', 'formula', or any request to produce tabular data in Excel format.
8