OCR Recognition

Quick Start

1. Check System Tesseract

which tesseract
tesseract --version

2. Docker Alternative (No Install Required)

# Pull image (one-time)
docker pull minidocks/tesseract:latest

# Download language pack (注意：-L 跟随重定向，-O 指定输出文件)
wget -L -O /tmp/eng.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata

# 推荐：映射整个 tessdata 目录（更可靠）
mkdir -p /tmp/tessdata
mv /tmp/eng.traineddata /tmp/tessdata/

docker run --rm \
  -v /path/to/image.png:/image.png:ro \
  -v /tmp/tessdata:/usr/share/tessdata:ro \
  minidocks/tesseract:latest \
  tesseract /image.png stdout

3. Digits Only (for captcha)

docker run --rm \
  -v /path/to/captcha.png:/captcha.png:ro \
  -v /tmp/tessdata:/usr/share/tessdata:ro \
  minidocks/tesseract:latest \
  tesseract /captcha.png stdout --psm 6 -c tessedit_char_whitelist=0123456789

Decision Flow

See references/decision-flow.md for complete decision tree.

TL;DR

System has tesseract? → Use it
No sudo? → Use Docker
Poor results? → Try preprocessing (see scripts/)
Still poor (especially captcha)? → Use commercial solution (see references/commercial-solutions.md)

Common Tasks

Debug Steps (Always Run First)

# 1. Verify image is valid
file your_image.png

# 2. Verify Docker is available
docker --version

# 3. Verify traineddata exists and is valid
ls -la /tmp/tessdata/
file /tmp/tessdata/eng.traineddata  # Should show "data" type

Extract text from screenshot

docker run --rm \
  -v screenshot.png:/image.png:ro \
  -v /tmp/tessdata:/usr/share/tessdata:ro \
  minidocks/tesseract:latest tesseract /image.png stdout

Recognize captcha (digits only)

See references/captcha-guide.md for detailed captcha strategies.

Chinese text recognition

mkdir -p /tmp/tessdata
wget -L -O /tmp/tessdata/chi_sim.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata

docker run --rm \
  -v image.png:/image.png:ro \
  -v /tmp/tessdata:/usr/share/tessdata:ro \
  minidocks/tesseract:latest tesseract /image.png stdout -l chi_sim

Scripts

Setup Tessdata Directory (One-time)

mkdir -p /tmp/tessdata
wget -L -O /tmp/tessdata/eng.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/eng.traineddata
# Add more languages as needed:
# wget -L -O /tmp/tessdata/chi_sim.traineddata https://github.com/tesseract-ocr/tessdata/raw/main/chi_sim.traineddata

Preprocessing Pipeline

When OCR results are poor, use preprocessing:

# See scripts/preprocess.py for full preprocessing options
python3 scripts/preprocess.py --input captcha.png --output processed.png --method otsu

Available methods: threshold, otsu, adaptive, morphology

Quick OCR Command

# Make sure tessdata is set up first (see above)
# Then use the wrapper script:
./scripts/ocr.sh image.png

Parameters Reference

PSM Modes

PSM	Description	Best For
3	Fully automatic	Default
6	Assume single block	Captcha
7	Treat as single line	Single row
8	Treat as single word	Spaced chars
10	Character mode	Single char

White list

# Digits only
-c tessedit_char_whitelist=0123456789

# Letters only
-c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

# Alphanumeric
-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz

When to Use This Skill

Extract text from screenshots
Read captcha/verification codes
Convert images to text
Batch OCR processing
Any image-to-text task

ocr-recognition