pdf-reader
PDF Reader
Extract text, tables, and images from any PDF using pdfjs-dist scripts. All extraction is deterministic — no artificial inference.
Workflow for Large Documents
For documents over ~20 pages, follow this sequence. For short documents, skip to step 3.
1. Understand the document
node scripts/get-info.mjs <pdf>
Returns page count, metadata (title, author, dates), and table of contents.
2. Find relevant pages
node scripts/search-text.mjs <pdf> --query "search term"
Searches all pages. Returns JSON array of matches with page numbers and context.
Use --max-results 10 and --offset N for pagination.
3. Extract text from targeted pages
node scripts/extract-text.mjs <pdf> --pages 1-5,10,20
For large documents, --pages is required (unless --all is passed).
Use --output file.txt to write to a file instead of stdout.
4. Extract tables
node scripts/extract-tables.mjs <pdf> --page 20
Detects tables by analyzing text positions. Supports --format markdown (default), csv, or json.
Use --output file.json for large results.
5. Extract images
node scripts/extract-images.mjs <pdf> --pages 1-30 --list-only
node scripts/extract-images.mjs <pdf> --page 5 --output-dir ./images
Use --list-only first to scan, then extract specific pages. Saves images as PNG.
6. Raw structural analysis (advanced)
node scripts/extract-structure.mjs <pdf> --page 20
Returns every text item with exact x/y coordinates, font info, and dimensions. Use when table extraction doesn't capture a complex layout correctly.
Quick Reference
| Task | Script | Key flags |
|---|---|---|
| Document info | get-info.mjs |
|
| Search text | search-text.mjs |
--query, --max-results, --offset |
| Extract text | extract-text.mjs |
--pages, --all, --output |
| Extract tables | extract-tables.mjs |
--page, --format, --output |
| Extract images | extract-images.mjs |
--page, --output-dir, --list-only |
| Raw positions | extract-structure.mjs |
--page, --output |
All scripts support --help for full usage details.
If text extraction returns empty results
The PDF may be scanned (image-only). Use extract-images.mjs to extract page images instead.
If table extraction misses a table
Use extract-structure.mjs to get raw positional data and analyze the layout from coordinates.
More from bluebagai/skills
internal-comms
A set of resources to help me write all kinds of internal communications, using the formats that my company likes to use. You should use this skill whenever asked to write some sort of internal communications (status reports, leadership updates, 3P updates, company newsletters, FAQs, incident reports, project updates, etc.).
6brand-guidelines
Applies Anthropic's official brand colors and typography to any sort of artifact that may benefit from having Anthropic's look-and-feel. Use it when brand colors or style guidelines, visual formatting, or company design standards apply.
6algorithmic-art
Creating algorithmic art using p5.js with seeded randomness and interactive parameter exploration. Use this when users request creating art using code, generative art, algorithmic art, flow fields, or particle systems. Create original algorithmic art rather than copying existing artists' work to avoid copyright violations.
6skill-creator
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends an AI agent's capabilities with specialized knowledge, workflows, or tool integrations.
6theme-factory
Toolkit for styling artifacts with a theme. These artifacts can be slides, docs, reportings, HTML landing pages, etc. There are 10 pre-set themes with colors/fonts that you can apply to any artifact that has been creating, or can generate a new theme on-the-fly.
6slack-gif-creator
Knowledge and utilities for creating animated GIFs optimized for Slack. Provides constraints, validation tools, and animation concepts. Use when users request animated GIFs for Slack like "make me a GIF of X doing Y for Slack.
6