local-ocr
Local OCR Pipeline Skill
Robust Optical Character Recognition (OCR) pipeline driven by ocrmypdf and tesseract.
Handles scanned PDFs, rotated image inputs, and raw text extraction securely and locally without external APIs.
Why not GPU via PyTorch/EasyOCR? The
ocrmypdftool is the industry standard for producing searchable PDFs. It leveragestesseractfor pixel-accurate text placement. A pure-CPU pipeline is leaner (avoids a 1.5GB PyTorch payload) and reliably embeds text exactly where it appears in the scanned image.
Capabilities
- Searchable PDF Generation: Converts rasterized/scanned PDFs or raw images (
.jpg,.png, etc.) into PDFs with a selectable, searchable text layer. - Auto-Rotation & Deskew: Automatically detects incorrectly rotated text and straightens crooked scans.
- Idempotent In-Place Processing: Safely processes files in-place using
--skip-text, preventing double-processing of a PDF that already has embedded text. - Structured JSON Output: All commands output structured JSON, making failure states (like missing dependencies) parseable by agents.
- Raw Text Extraction: Raw string extraction fallback for when agents need text directly in-memory instead of a PDF file.
Setup
# Installs system dependencies (tesseract, ocrmypdf, ghostscript) and sets up isolated venv
bash skills/ocr/scripts/setup.sh
Usage
uv run --project ~/.local-ocr scripts/ocr.py <command>
1. Generate a Searchable PDF (pdf)
Produces a standard, layered PDF. If you give it an image, it wraps it in a PDF. If you give it a scanned PDF, it adds the invisible text layer.
# Overwrites the file in-place, skipping it safely if it already contains text
uv run --project ~/.local-ocr scripts/ocr.py pdf ./scanned_invoice.pdf
# Output to a different file
uv run --project ~/.local-ocr scripts/ocr.py pdf ./scan_001.png -o ./contract.pdf
# Force reprocessing (ignore existing text layer)
uv run --project ~/.local-ocr scripts/ocr.py pdf ./scanned_invoice.pdf --force
Note: By default, auto-rotate and deskew are enabled. Disable with --no-rotate or --no-deskew.
2. Batch Process a Directory (batch)
Recursively scans a directory for images and PDFs, applying OCR.
# Process all files. Skips already-OCRed PDFs.
uv run --project ~/.local-ocr scripts/ocr.py batch ./archives/
3. Extract Raw Text (text)
Does not create a PDF. Just reads the words off the page and returns them as a JSON string. Good for agents reading documents on the fly.
uv run --project ~/.local-ocr scripts/ocr.py text ./han_solo_invoice.png
Franchise Examples (Star Wars)
- Process the Death Star blueprints:
uv run --project ~/.local-ocr scripts/ocr.py pdf ./ds-1_schematics.pdf - Extract raw orders:
uv run --project ~/.local-ocr scripts/ocr.py text ./order_66_memo.jpg - Archive run:
uv run --project ~/.local-ocr scripts/ocr.py batch /archives/jedi_temple
Troubleshooting
- File already contains text: This is the most common "error", but it isn't an error.
ocrmypdfreturns exit code 6 when it skips a file that already has text. The wrapper script catches this and reports a JSON"status": "success"with a message noting the side-step. - Dependencies Missing: Run the
setup.shscript again if the agent complains about missingtesseractor Python modules.
More from baphomet480/claude-skills
kitchen-sink-design-system
Kitchen Sink design system workflow for Next.js and React projects, with secondary support for Astro, SvelteKit, Nuxt, and static HTML. Use when asked for a Kitchen Sink page, Design System, UI Audit, Style Guide, or Component Inventory, or when a project needs a component inventory plus component creation and a sink page implementation. Covers CVA variant architecture, Tailwind v3/v4 token systems, shadcn/ui integration, and TinaCMS content modeling.
40deep-research
Conduct comprehensive, multi-round research that produces rich visual reports. Use when asked for "deep research", "comprehensive analysis", "compare frameworks", "evaluate options", "research the state of X", or any task requiring investigation across 10+ sources. NOT for quick lookups — this is a 5-15 minute deep dive that produces a briefing-quality artifact with screenshots, diagrams, tables, and cited findings.
37design-lookup
Search and retrieve CSS components, SVG icons, design patterns, and visual inspiration from the web. Use when the user asks to find, look up, or search for CSS snippets, SVG icons, UI components, loading spinners, animations, design inspiration, or any visual/frontend design resource. Triggers on requests like "find me a CSS button", "look up an SVG spinner", "search for a card component", "find a wave divider SVG", or "get design inspiration for a dashboard".
34nextjs-tinacms
Build Next.js 16 + React 19 + TinaCMS sites with visual editing, blocks-based page builder, and complete SEO. Use this skill whenever the user mentions TinaCMS, Tina CMS, Next.js with a CMS, visual editing with Next.js, click-to-edit, content-managed Next.js site, blocks pattern page builder, or migrating to Next.js + TinaCMS. Also trigger for TinaCMS schema design, self-hosted TinaCMS, TinaCMS media configuration, or any TinaCMS troubleshooting. Covers Day 0-2 setup from scaffolding through production deployment on Vercel.
32cloudflare-pages
Deploy static sites to Cloudflare Pages with custom domains and CI/CD. Use when the user wants to deploy a site to Cloudflare Pages, add a custom domain to a Pages project, set up GitHub Actions CI/CD for Cloudflare Pages, roll back a deployment, or verify deployment status. Triggers on "deploy to Cloudflare", "Cloudflare Pages", "add custom domain", "pages deploy", or any Cloudflare Pages hosting workflow.
31hugo-sveltia-cms
Bootstrap new Hugo sites with Sveltia CMS and Basecoat UI, or convert existing sites (any SSG or CMS) to Hugo + Sveltia CMS. Use this skill whenever the user mentions Hugo, Sveltia CMS, Decap CMS migration, TinaCMS migration, static site CMS setup, headless CMS for Hugo, or wants to add a content management interface to a Hugo site. Also trigger when converting WordPress, Jekyll, Eleventy, TinaCMS, or other sites to Hugo, or when setting up Git-based content management. Covers the full workflow from scaffolding through Cloudflare Pages deployment with GitHub OAuth authentication.
14