document-processor-pdf
SKILL.md
PDF Document Processor Skill
🎯 PURPOSE
Official Anthropic skill for PDF document analysis and extraction. Supports:
- Text Extraction: Pull all text from PDFs (including scanned documents with OCR)
- Table Detection: Parse structured tables and convert to JSON/CSV
- Invoice Processing: Extract invoice data (amounts, dates, line items)
- Form Parsing: Extract form field values
- Image Extraction: Extract embedded images from PDFs
🔍 WHEN TO USE THIS SKILL
Trigger when users mention:
- "extract PDF", "read PDF", "parse document"
- "invoice processing", "extract invoice data"
- "OCR", "scan document", "digitize"
- "table extraction", "form data"
📋 CAPABILITIES
1. Text Extraction
extractPDF({
file: "invoice.pdf",
options: {
ocr: true, // Enable OCR for scanned docs
preserveLayout: true,
extractTables: true,
extractImages: false
}
});
// Returns
{
text: "Invoice #12345...",
tables: [...],
metadata: { pages: 3, author: "...", created: "..." }
}
2. Invoice Processing
Automatically extract:
- Invoice number
- Invoice date
- Due date
- Vendor/supplier info
- Line items (description, quantity, price)
- Subtotal, tax, total
- Payment terms
3. OCR Support
- Handles scanned documents
- Multiple languages (English, French)
- Handwriting recognition (limited)
- Image quality enhancement
💡 INTEGRATION EXAMPLE
User: "Extract data from this invoice PDF"
AI Response:
I've analyzed the invoice PDF. Here's what I found:
INVOICE DETAILS:
Invoice #: INV-2025-1024
Date: October 24, 2025
Due Date: November 24, 2025
VENDOR:
ABC Supplies Ltd
Douala, Cameroon
Tax ID: CM123456789
LINE ITEMS:
1. Office Supplies - 450,000 XAF
2. IT Equipment - 1,200,000 XAF
3. Furniture - 800,000 XAF
SUBTOTAL: 2,450,000 XAF
VAT (18%): 441,000 XAF
TOTAL: 2,891,000 XAF
Payment Terms: Net 30
Bank Details: [Extracted]
Would you like me to:
1. Save this to accounting system?
2. Create payment transaction?
3. Export to Excel?
🔧 TECHNICAL DETAILS
- Library: pdf-parse (Node.js), pdfplumber (Python)
- OCR Engine: Tesseract OCR
- Supported Formats: .pdf (including PDF/A)
- Max File Size: 50 MB
- Languages: English, French, German, Spanish
📚 DOCUMENTATION
Official Anthropic Documentation: https://docs.anthropic.com/en/docs/build-with-claude/agent-skills
Version: 1.0.0
Source: Anthropic Official Skills Library
Last Updated: October 24, 2025