NYC
skills/smithery/ai/document-processor-pdf

document-processor-pdf

SKILL.md

PDF Document Processor Skill

🎯 PURPOSE

Official Anthropic skill for PDF document analysis and extraction. Supports:

  • Text Extraction: Pull all text from PDFs (including scanned documents with OCR)
  • Table Detection: Parse structured tables and convert to JSON/CSV
  • Invoice Processing: Extract invoice data (amounts, dates, line items)
  • Form Parsing: Extract form field values
  • Image Extraction: Extract embedded images from PDFs

🔍 WHEN TO USE THIS SKILL

Trigger when users mention:

  • "extract PDF", "read PDF", "parse document"
  • "invoice processing", "extract invoice data"
  • "OCR", "scan document", "digitize"
  • "table extraction", "form data"

📋 CAPABILITIES

1. Text Extraction

extractPDF({
  file: "invoice.pdf",
  options: {
    ocr: true, // Enable OCR for scanned docs
    preserveLayout: true,
    extractTables: true,
    extractImages: false
  }
});

// Returns
{
  text: "Invoice #12345...",
  tables: [...],
  metadata: { pages: 3, author: "...", created: "..." }
}

2. Invoice Processing

Automatically extract:

  • Invoice number
  • Invoice date
  • Due date
  • Vendor/supplier info
  • Line items (description, quantity, price)
  • Subtotal, tax, total
  • Payment terms

3. OCR Support

  • Handles scanned documents
  • Multiple languages (English, French)
  • Handwriting recognition (limited)
  • Image quality enhancement

💡 INTEGRATION EXAMPLE

User: "Extract data from this invoice PDF"

AI Response:

I've analyzed the invoice PDF. Here's what I found:

INVOICE DETAILS:
Invoice #: INV-2025-1024
Date: October 24, 2025
Due Date: November 24, 2025

VENDOR:
ABC Supplies Ltd
Douala, Cameroon
Tax ID: CM123456789

LINE ITEMS:
1. Office Supplies - 450,000 XAF
2. IT Equipment - 1,200,000 XAF
3. Furniture - 800,000 XAF

SUBTOTAL: 2,450,000 XAF
VAT (18%): 441,000 XAF
TOTAL: 2,891,000 XAF

Payment Terms: Net 30
Bank Details: [Extracted]

Would you like me to:
1. Save this to accounting system?
2. Create payment transaction?
3. Export to Excel?

🔧 TECHNICAL DETAILS

  • Library: pdf-parse (Node.js), pdfplumber (Python)
  • OCR Engine: Tesseract OCR
  • Supported Formats: .pdf (including PDF/A)
  • Max File Size: 50 MB
  • Languages: English, French, German, Spanish

📚 DOCUMENTATION

Official Anthropic Documentation: https://docs.anthropic.com/en/docs/build-with-claude/agent-skills


Version: 1.0.0
Source: Anthropic Official Skills Library
Last Updated: October 24, 2025

Weekly Installs
1
Repository
smithery/ai
First Seen
13 days ago
Installed on
opencode1