SKILL.md
PDF Processing Skill
When working with PDF files, follow these guidelines:
1. Reading & Extracting from PDFs
For text extraction:
# Extract all text
pdftotext input.pdf output.txt
# Extract specific pages
pdftotext -f 1 -l 10 input.pdf output.txt
# Preserve layout
pdftotext -layout input.pdf output.txt
For extracting images:
# Extract all images
pdfimages -all input.pdf output_prefix
# Extract as PNG
pdfimages -png input.pdf images/page
For metadata:
# Get PDF info
pdfinfo document.pdf
# Get detailed metadata
exiftool document.pdf
2. Creating PDFs
From text/markdown:
# From markdown using pandoc
pandoc input.md -o output.pdf
# From text with formatting
enscript input.txt -o - | ps2pdf - output.pdf
From HTML:
# Using wkhtmltopdf
wkhtmltopdf input.html output.pdf
# With options
wkhtmltopdf --page-size A4 --margin-top 10mm input.html output.pdf
From images:
# Convert images to PDF
convert image1.png image2.png output.pdf
# Multiple images
img2pdf img1.jpg img2.jpg -o output.pdf
3. Merging PDFs
# Merge multiple PDFs (using pdftk)
pdftk file1.pdf file2.pdf file3.pdf cat output merged.pdf
# Using ghostscript
gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=merged.pdf file1.pdf file2.pdf
# Using pdfunite
pdfunite file1.pdf file2.pdf output.pdf
4. Splitting PDFs
# Split into individual pages
pdftk input.pdf burst output page_%02d.pdf
# Extract specific pages
pdftk input.pdf cat 1-5 output first-5-pages.pdf
# Extract page ranges
pdftk input.pdf cat 1-10 25-30 output selected.pdf
5. Converting PDFs
PDF to Images:
# To PNG (high quality)
pdftoppm -png -r 300 input.pdf output
# To JPG
pdftoppm -jpeg -r 150 input.pdf output
# Specific pages
pdftoppm -png -f 1 -l 5 input.pdf output
PDF to DOCX:
# Using libreoffice
libreoffice --headless --convert-to docx input.pdf
# Using pandoc
pandoc input.pdf -o output.docx
PDF to Text:
# Simple conversion
pdftotext input.pdf output.txt
# Maintain layout
pdftotext -layout input.pdf output.txt
6. PDF Analysis & Information
Get page count:
pdfinfo document.pdf | grep "Pages:" | awk '{print $2}'
Check PDF version:
pdfinfo document.pdf | grep "PDF version"
Analyze structure:
# Get detailed structure
mutool show input.pdf outline
# Extract fonts
pdffonts input.pdf
7. PDF Optimization
Compress PDF:
# Using ghostscript (screen quality - smallest)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf input.pdf
# Using ghostscript (ebook quality - medium)
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/ebook \
-dNOPAUSE -dQUIET -dBATCH -sOutputFile=compressed.pdf input.pdf
Remove password:
# If you know the password
pdftk secured.pdf input_pw PASSWORD output unsecured.pdf
8. Common Workflows
Extract tables from PDF
# Using tabula-py
tabula-py input.pdf --output-format csv --pages all
# Or use pdfplumber for complex tables
Add watermark
pdftk input.pdf stamp watermark.pdf output watermarked.pdf
Rotate pages
# Rotate all pages 90 degrees clockwise
pdftk input.pdf cat 1-endright output rotated.pdf
# Rotate specific pages
pdftk input.pdf cat 1-5 6right 7-end output rotated.pdf
Tools Required
Make sure these tools are installed:
poppler-utils(pdftotext, pdfinfo, pdftoppm, pdfunite)pdftkorpdftk-javaghostscript(gs)imagemagick(convert)pandoc(for conversions)img2pdf(for image to PDF)exiftool(for metadata)
Install on Ubuntu/Debian:
sudo apt-get install poppler-utils pdftk ghostscript imagemagick pandoc python3-img2pdf exiftool
Security Notes
- ✅ Always validate PDF file paths before processing
- ✅ Check file sizes to prevent resource exhaustion
- ✅ Sanitize output filenames
- ✅ Be cautious with password-protected PDFs
- ✅ Scan PDFs for malicious content if from untrusted sources
When to Use This Skill
Use /pdf when the user:
- Wants to read or extract text from a PDF
- Needs to create a PDF from other formats
- Wants to merge or split PDFs
- Needs to convert PDFs to images or other formats
- Asks to analyze PDF structure or metadata
- Wants to compress or optimize PDFs
Always confirm destructive operations before executing.
Weekly Installs
2
Repository
thechandanbhaga…e-skillsGitHub Stars
2
First Seen
Mar 1, 2026
Security Audits
Installed on
opencode2
gemini-cli2
codebuddy2
github-copilot2
codex2
kimi-cli2