pdf-extractor
PDF Extractor
Extract text, metadata, and content from PDF documents.
Features
- text: Extract all text content from PDF
- metadata: Extract document properties (title, author, creation date, etc.)
- pages: Get total page count
- selective: Extract specific page ranges
- info: Get PDF version and encryption status
Usage
# Extract all text
./scripts/pdf-extract.js --file ./document.pdf
# Extract specific pages
./scripts/pdf-extract.js --file ./document.pdf --pages "1-5"
# Extract metadata only
./scripts/pdf-extract.js --file ./document.pdf --metadata
# Get page count and info
./scripts/pdf-extract.js --file ./document.pdf --info
# Output as JSON
./scripts/pdf-extract.js --file ./document.pdf --format json
Examples
| Task | Command | Output |
|---|---|---|
| Extract text | pdf-extract.js --file doc.pdf |
Full text content |
| Pages 1-3 | pdf-extract.js --file doc.pdf --pages 1-3 |
Text from pages 1-3 |
| Metadata | pdf-extract.js --file doc.pdf --metadata |
Document properties |
| Info | pdf-extract.js --file doc.pdf --info |
Page count, version |
| JSON output | pdf-extract.js --file doc.pdf --json |
Structured JSON |
Notes
- Supports most PDF formats (PDF 1.0-1.7)
- Handles encrypted PDFs (prompts for password)
- Memory efficient for large documents
- Preserves text layout where possible
More from winsorllc/upgraded-carnival
vector-memory
Vector-based semantic memory using embeddings for intelligent recall. Store and search memories by meaning rather than keywords. Use when you need semantic search, similar document retrieval, or context-aware memory.
132model-router
Route requests between different LLM providers and models. Configure routing rules, fallback providers, and model-specific parameters inspired by ZeroClaw and OpenClaw model routing systems.
63rss-monitor
Monitor RSS/Atom feeds and blogs for new content using feedparser.
60rss-reader
Read and parse RSS/Atom feeds. Use when: user wants to subscribe to feeds, get latest articles, or monitor news sources.
55video-frames
Production-grade video frame extraction with thumbnail grids, GIF creation, and batch frame processing. Includes intelligent quality presets, progress tracking, and comprehensive error handling.
39elevenlabs-tts
Convert text to speech using ElevenLabs API. Use when you need to generate voice audio for messages, narrations, or accessibility.
25