pdf

SKILL.md

PDF Processing Guide

Reference Files

Detailed guides for specific tasks and libraries:

  • python-libraries.md - Comprehensive Python library examples (pypdf, pdfplumber, reportlab)
  • cli-tools.md - Command-line tools reference (pdftotext, qpdf, pdftk)
  • reference.md - Advanced features (pypdfium2, pdf-lib JavaScript, OCR)
  • forms.md - Complete workflow for filling PDF forms

Quick Start

Extract Text

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for page in pdf.pages:
        print(page.extract_text())

Extract Tables

import pdfplumber

with pdfplumber.open("document.pdf") as pdf:
    for i, page in enumerate(pdf.pages):
        tables = page.extract_tables()
        for table in tables:
            print(table)

Merge PDFs

from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in ["doc1.pdf", "doc2.pdf"]:
    reader = PdfReader(pdf_file)
    for page in reader.pages:
        writer.add_page(page)

with open("merged.pdf", "wb") as output:
    writer.write(output)

Split PDF into Pages

from pypdf import PdfWriter, PdfReader

reader = PdfReader("input.pdf")
for i, page in enumerate(reader.pages):
    writer = PdfWriter()
    writer.add_page(page)
    with open(f"page_{i+1}.pdf", "wb") as output:
        writer.write(output)

Create PDF

from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas

c = canvas.Canvas("hello.pdf", pagesize=letter)
c.drawString(100, 750, "Hello World!")
c.save()

Common Workflows

Fill Out a Form

PDF forms can be fillable (with form fields) or non-fillable (requiring manual positioning). For complete step-by-step instructions, see forms.md.

Extract and Analyze Data

Combine text extraction with JSON export for downstream processing:

import pdfplumber
import json

# Extract all text
with pdfplumber.open("document.pdf") as pdf:
    full_text = "\n".join(
        page.extract_text() or "" for page in pdf.pages
    )

# Extract tables as structured data
data = []
with pdfplumber.open("document.pdf") as pdf:
    for page_num, page in enumerate(pdf.pages, 1):
        for table in (page.extract_tables() or []):
            data.append({"page": page_num, "data": table})

with open("output.json", "w") as f:
    json.dump(data, f, indent=2)

Process Scanned PDFs (OCR)

Extract text from image-based PDFs using OCR. See reference.md for detailed OCR examples.

Add Password Protection

from pypdf import PdfReader, PdfWriter

reader = PdfReader("input.pdf")
writer = PdfWriter()

for page in reader.pages:
    writer.add_page(page)

writer.encrypt("password")

with open("encrypted.pdf", "wb") as output:
    writer.write(output)

Tool Selection Guide

Task Recommended Tool See Also
Extract text/tables pdfplumber python-libraries.md
Merge/split/rotate pypdf python-libraries.md
Create PDFs reportlab python-libraries.md
Command-line operations qpdf/pdftotext cli-tools.md
Fill forms pypdf/pdf-lib forms.md
Scanned PDFs/OCR pytesseract reference.md
Advanced rendering pypdfium2 reference.md
JavaScript context pdf-lib reference.md

Next Steps

Weekly Installs
2
GitHub Stars
9
First Seen
Jan 25, 2026
Installed on
opencode2
claude-code2
antigravity2
gemini-cli2
windsurf1
kiro-cli1