pdf-processing
SKILL.md
PDF Processing
This skill provides tools and guidance for extracting content from PDF documents.
Quick Start
Use pdfplumber to extract text:
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
Installation
Install the required dependencies:
pip install pdfplumber
Basic Text Extraction
For simple text extraction from a PDF:
import pdfplumber
def extract_text(pdf_path):
"""Extract all text from a PDF file."""
text = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
page_text = page.extract_text()
if page_text:
text.append(page_text)
return "\n\n".join(text)
Table Extraction
For extracting tables from PDFs:
import pdfplumber
def extract_tables(pdf_path):
"""Extract all tables from a PDF file."""
tables = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
page_tables = page.extract_tables()
tables.extend(page_tables)
return tables
Form Filling
For filling PDF forms, see references/FORMS.md.
Advanced Table Extraction
For complex tables with merged cells, see references/TABLES.md and run scripts/extract.py.
Weekly Installs
3
Repository
fredkschott/astro-skillsGitHub Stars
10
First Seen
12 days ago
Security Audits
Installed on
opencode3
antigravity3
qwen-code3
claude-code3
github-copilot3
goose3