ocr-recognition
Installation
SKILL.md
OCR Recognition
Quick Start
1. Check System Tesseract
which tesseract
tesseract --version
2. Docker Alternative (No Install Required)
# Pull image (one-time)
docker pull minidocks/tesseract:latest
# Download language pack (注意:-L 跟随重定向,-O 指定输出文件)
wget -L -O /tmp/eng.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
# 推荐:映射整个 tessdata 目录(更可靠)
mkdir -p /tmp/tessdata
mv /tmp/eng.traineddata /tmp/tessdata/
docker run --rm \
-v /path/to/image.png:/image.png:ro \
-v /tmp/tessdata:/usr/share/tessdata:ro \
minidocks/tesseract:latest \
tesseract /image.png stdout
3. Digits Only (for captcha)
docker run --rm \
-v /path/to/captcha.png:/captcha.png:ro \
-v /tmp/tessdata:/usr/share/tessdata:ro \
minidocks/tesseract:latest \
tesseract /captcha.png stdout --psm 6 -c tessedit_char_whitelist=0123456789
Decision Flow
See references/decision-flow.md for complete decision tree.
TL;DR
- System has tesseract? → Use it
- No sudo? → Use Docker
- Poor results? → Try preprocessing (see scripts/)
- Still poor (especially captcha)? → Use commercial solution (see references/commercial-solutions.md)
Common Tasks
Debug Steps (Always Run First)
# 1. Verify image is valid
file your_image.png
# 2. Verify Docker is available
docker --version
# 3. Verify traineddata exists and is valid
ls -la /tmp/tessdata/
file /tmp/tessdata/eng.traineddata # Should show "data" type
Extract text from screenshot
docker run --rm \
-v screenshot.png:/image.png:ro \
-v /tmp/tessdata:/usr/share/tessdata:ro \
minidocks/tesseract:latest tesseract /image.png stdout
Recognize captcha (digits only)
See references/captcha-guide.md for detailed captcha strategies.
Chinese text recognition
mkdir -p /tmp/tessdata
wget -L -O /tmp/tessdata/chi_sim.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata
docker run --rm \
-v image.png:/image.png:ro \
-v /tmp/tessdata:/usr/share/tessdata:ro \
minidocks/tesseract:latest tesseract /image.png stdout -l chi_sim
Scripts
Setup Tessdata Directory (One-time)
mkdir -p /tmp/tessdata
wget -L -O /tmp/tessdata/eng.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
# Add more languages as needed:
# wget -L -O /tmp/tessdata/chi_sim.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata
Preprocessing Pipeline
When OCR results are poor, use preprocessing:
# See scripts/preprocess.py for full preprocessing options
python3 scripts/preprocess.py --input captcha.png --output processed.png --method otsu
Available methods: threshold, otsu, adaptive, morphology
Quick OCR Command
# Make sure tessdata is set up first (see above)
# Then use the wrapper script:
./scripts/ocr.sh image.png
Parameters Reference
PSM Modes
| PSM | Description | Best For |
|---|---|---|
| 3 | Fully automatic | Default |
| 6 | Assume single block | Captcha |
| 7 | Treat as single line | Single row |
| 8 | Treat as single word | Spaced chars |
| 10 | Character mode | Single char |
White list
# Digits only
-c tessedit_char_whitelist=0123456789
# Letters only
-c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
# Alphanumeric
-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
When to Use This Skill
- Extract text from screenshots
- Read captcha/verification codes
- Convert images to text
- Batch OCR processing
- Any image-to-text task
Related skills
More from xiao0916/lm-skills
psd-slicer
将 Photoshop(.psd)文件的所有图层导出为独立的 PNG 图片。适用于从 PSD 文件提取图层图片、为网页开发生成切片、或为其他工具准备图层资源。自动处理图层命名、跳过不可见图层、递归导出嵌套图层组。
45psd-layer-reader
读取并导出 Photoshop(.psd)图层树为 JSON,包含图层元信息(名称、类型、可见性、bbox)以及详细的文本样式信息。当用户需要分析 PSD 结构、查找特定图层(如弹窗、按钮)、或准备 HTML/CSS 还原所需的数据时,务必使用此技能。即使涉及复杂的嵌套结构或需要精确的文本还原(字体、颜色、间距),此工具也能提供结构化的支撑。
36psd-to-preview
从 PSD 设计文件到预览页面 + React 组件 + Vue 组件的完整转换工作流。
12psd-json-preview
从 PSD 导出的 JSON 图层树和切片图片生成 HTML/CSS 预览。默认保留 PSD 的分组嵌套结构,用 --flatten 参数可切换为平铺模式。
6venv-manager
自动管理 Python 虚拟环境。当其他技能需要隔离的 Python 环境时,本技能帮助 AI 创建、激活和管理虚拟环境,确保依赖不冲突。
3psd-to-cocos
|
3