PDF Processing Pro
PDF Processing Pro
Production-ready PDF processing toolkit with pre-built scripts, comprehensive error handling, and support for complex workflows.
Quick start
Extract text from PDF
import pdfplumber
with pdfplumber.open("document.pdf") as pdf:
text = pdf.pages[0].extract_text()
print(text)
Analyse PDF form (using included script)
python scripts/analyze_form.py input.pdf --output fields.json
# Returns: JSON with all form fields, types, and positions
Fill PDF form with validation
python scripts/fill_form.py input.pdf data.json output.pdf
# Validates all fields before filling, includes error reporting
Extract tables from PDF
python scripts/extract_tables.py report.pdf --output tables.csv
# Extracts all tables with automatic column detection
Features
Production-ready scripts
- Error handling with detailed messages and proper exit codes
- Input validation, type checking, and configurable logging
- Full type annotations and CLI interface (
--helpon all scripts)
Comprehensive workflows
- PDF forms, table extraction, OCR processing
- Batch operations, pre/post-processing validation
Advanced topics
PDF form processing
Complete form workflows including field analysis, dynamic filling, validation rules, multi-page forms, and checkbox/radio handling. See references/forms.md.
Table extraction
Complex table extraction including multi-page tables, merged cells, nested tables, custom detection, and CSV/Excel export. See references/tables.md.
OCR processing
Scanned PDFs and image-based documents including Tesseract integration, language support, image preprocessing, and confidence scoring. See references/ocr.md.
Included scripts
| Script | Purpose | Usage |
|---|---|---|
| analyze_form.py | Extract form field info | python scripts/analyze_form.py input.pdf [--output fields.json] [--verbose] |
| fill_form.py | Fill PDF forms with data | python scripts/fill_form.py input.pdf data.json output.pdf [--validate] |
| validate_form.py | Validate form data before filling | python scripts/validate_form.py data.json schema.json |
| extract_tables.py | Extract tables to CSV/Excel | python scripts/extract_tables.py input.pdf [--output tables.csv] [--format csv|excel] |
| extract_text.py | Extract text with formatting | python scripts/extract_text.py input.pdf [--output text.txt] [--preserve-formatting] |
| merge_pdfs.py | Merge multiple PDFs | python scripts/merge_pdfs.py file1.pdf file2.pdf --output merged.pdf |
| split_pdf.py | Split PDF into pages | python scripts/split_pdf.py input.pdf --output-dir pages/ |
| validate_pdf.py | Validate PDF integrity | python scripts/validate_pdf.py input.pdf |
Dependencies
All scripts require:
pip install pdfplumber pypdf pillow pytesseract pandas
Optional for OCR:
# macOS: brew install tesseract
# Ubuntu: apt-get install tesseract-ocr
# Windows: Download from GitHub releases
References
| File | Contents |
|---|---|
| references/forms.md | Complete form processing guide |
| references/tables.md | Advanced table extraction |
| references/ocr.md | Scanned PDF processing |
| references/workflows.md | Common workflows, error handling, performance tips, best practices |
| references/troubleshooting.md | Troubleshooting common issues and getting help |
More from henkisdabro/wookstar-claude-code-plugins
google-tagmanager
Comprehensive Google Tag Manager guide covering container setup, tags, triggers, variables, data layer, debugging, custom templates, and API automation. Use when working with GTM implementation, configuration, optimisation, troubleshooting, or any GTM-related tasks.
81google-apps-script
Comprehensive guide for Google Apps Script development covering all built-in services (SpreadsheetApp, DocumentApp, GmailApp, DriveApp, CalendarApp, FormApp, SlidesApp), triggers, authorization, error handling, and performance optimization. Use when automating Google Sheets operations, creating Google Docs, managing Gmail/email, working with Google Drive files, automating Calendar events, implementing triggers (time-based, event-based), building custom functions, creating add-ons, handling OAuth scopes, optimizing Apps Script performance, working with UrlFetchApp for API calls, using PropertiesService for persistent storage, or implementing CacheService for temporary data. Covers batch operations, error recovery, and JavaScript ES6+ runtime.
73shopify-theme-dev
Complete theme development guide including file structure, JSON templates, sections, snippets, settings schema, and Online Store 2.0 architecture. Use when creating Shopify themes, organizing theme files, building sections and blocks, working with .json template files, configuring settings_schema.json, creating snippets, or implementing theme customization features.
23shopify-developer
Complete Shopify development reference for Liquid templating, theme development (OS 2.0), GraphQL Admin API, Storefront API, custom app development, Shopify Functions, Hydrogen, performance optimisation, and debugging. Use when working with .liquid files, creating theme sections and blocks, writing GraphQL queries or mutations for Shopify, building Shopify apps with CLI and Polaris, implementing cart operations via Ajax API, optimising Core Web Vitals for Shopify stores, debugging Liquid or API errors, configuring settings_schema.json, accessing Shopify objects (product, collection, cart, customer), using Liquid filters, creating app extensions, working with webhooks, migrating from Scripts to Functions, or building headless storefronts with Hydrogen and React Router 7. Covers API version 2026-01.
17google-ads-scripts
Expert guidance for Google Ads Script development including AdsApp API, campaign management, ad groups, keywords, bidding strategies, performance reporting, budget management, automated rules, and optimization patterns. Use when automating Google Ads campaigns, managing keywords and bids, creating performance reports, implementing automated rules, optimizing ad spend, working with campaign budgets, monitoring quality scores, tracking conversions, pausing low-performing keywords, adjusting bids based on ROAS, or building Google Ads automation scripts. Covers campaign operations, keyword targeting, bid optimization, conversion tracking, error handling, and JavaScript-based automation in Google Ads editor.
12fifteen-factor-app
The Fifteen-Factor App methodology for modern cloud-native SaaS applications. This skill should be automatically invoked when planning SaaS tools, product software architecture, microservices design, PRPs/PRDs, or cloud-native application development. Extends the original Twelve-Factor App principles with three additional factors (API First, Telemetry, Security). Trigger keywords include "fifteen factor", "12 factor", "SaaS architecture", "cloud-native design", "application architecture", "microservices best practices", or when in a planning/architecture session.
11