PP-OCRv5 API Skill

When to Use This Skill

Invoke this skill in the following situations:

Extract text from images (screenshots, photos, scans, charts)
Read text from PDF or document images
Perform OCR on any visual content containing text
Parse structured documents (invoices, receipts, forms, tables)
Recognize text in photos taken by mobile phones
Extract text from URLs pointing to images or PDFs

Do not use this skill in the following situations:

Plain text files that can be read directly with the Read tool
Code files or markdown documents
Tasks that do not involve image-to-text conversion

How to Use This Skill

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

ONLY use PP-OCRv5 API - Execute the script python scripts/ppocrv5/ocr_caller.py
NEVER use Claude's built-in vision - Do NOT read images yourself
NEVER offer alternatives - Do NOT suggest "I can try to read it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt OCR any other way

If the script execution fails (API not configured, network error, etc.):

Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try reading it?"
Simply stop and wait for user to fix the configuration

Basic Workflow

Identify the input source:
- User provides URL: Use the --file-url parameter
- User provides local file path: Use the --file-path parameter
- User uploads image: Save it first, then use --file-path

Execute OCR:

python scripts/ppocrv5/ocr_caller.py --file-url "URL provided by user" --pretty

Or for local files:

python scripts/ppocrv5/ocr_caller.py --file-path "file path" --pretty

Save result to file (recommended):

python scripts/ppocrv5/ocr_caller.py --file-url "URL" --output result.json --pretty

The script will display: Result saved to: /absolute/path/to/result.json
This message appears on stderr, the JSON is saved to the file
Tell the user the file path shown in the message

Parse JSON response:
- Check the ok field: true means success, false means error
- Extract text: result.full_text contains all recognized text
- Get quality: quality.quality_score indicates recognition confidence (0.0-1.0)
- Handle errors: If ok is false, display error.message
Present results to user:
- Display extracted text in a readable format
- If quality score is low (<0.5), alert the user
- If structured output is needed, use result.pages[].items[] to get line-by-line data

IMPORTANT: Complete Output Display

CRITICAL: Always display the COMPLETE recognized text to the user. Do NOT truncate or summarize the OCR results.

The script returns the full JSON with complete text content in result.full_text
You MUST display the entire full_text content to the user, no matter how long it is
Do NOT use phrases like "Here's a summary" or "The text begins with..."
Do NOT truncate with "..." unless the text truly exceeds reasonable display limits
The user expects to see ALL the recognized text, not a preview or excerpt

Correct approach:

I've extracted the text from the image. Here's the complete content:

[Display the entire result.full_text here]

Quality Score: 0.85 / 1.00 (Good quality recognition)

Incorrect approach ❌:

I found some text in the image. Here's a preview:
"The quick brown fox..." (truncated)

Mode Selection

Always use --mode auto (default) unless the user explicitly requests otherwise:

User Request	Use Mode	Command Flag
Default/unspecified	Auto (adaptive)	`--mode auto` (or omit)
"Quick recognition" / "fast"	Fast	`--mode fast`
"High precision" / "accurate"	Quality	`--mode quality`

Auto mode (recommended): Automatically tries 1-3 times, progressively increasing correction levels, returning the best result.

Usage Mode Examples

Mode 1: Simple URL OCR

python scripts/ppocrv5/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty

Mode 2: Local File OCR

python scripts/ppocrv5/ocr_caller.py --file-path "./document.pdf" --pretty

Mode 3: Fast Mode for Clear Images

python scripts/ppocrv5/ocr_caller.py --file-url "URL" --mode fast --pretty

Understanding the Output

The script outputs JSON structure as follows:

{
  "ok": true,
  "result": {
    "full_text": "All recognized text here...",
    "pages": [...]
  },
  "quality": {
    "quality_score": 0.85,
    "text_items": 42
  }
}

Key fields to extract:

result.full_text: Complete text for the user
quality.quality_score: 0.72+ is good, <0.5 is poor
error.message: If ok is false, provides error description

First-Time Configuration

When API is not configured:

The error will show:

Configuration error: API not configured. Get your API at: https://aistudio.baidu.com/paddleocr/task

Auto-configuration workflow:

Show the exact error message to user (including the URL)

Tell user to provide credentials:

Please visit the URL above to get your API_URL and TOKEN.
Once you have them, send them to me and I'll configure it automatically.

When user provides credentials (accept any format):
- API_URL=https://xxx.aistudio-app.com/ocr, TOKEN=abc123...
- Here's my API: https://xxx and token: abc123
- Copy-pasted code format
- Any other reasonable format
Parse credentials from user's message:
- Extract API_URL value (look for URLs with aistudio-app.com or similar)
- Extract TOKEN value (long alphanumeric string, usually 40+ chars)

Configure automatically:

python scripts/ppocrv5/configure.py --api-url "PARSED_URL" --token "PARSED_TOKEN"

If configuration succeeds:
- Inform user: "Configuration complete! Running OCR now..."
- Retry the original OCR task
If configuration fails:
- Show the error
- Ask user to verify the credentials

IMPORTANT: The error message format is STRICT and must be shown exactly as provided by the script. Do not modify or paraphrase it.

Authentication failed (403):

error_code: PROVIDER_AUTH_ERROR

→ Token is invalid, reconfigure with correct credentials

Quota exceeded (429):

error_code: PROVIDER_QUOTA_EXCEEDED

→ Daily API quota exhausted, inform user to wait or upgrade

No text detected:

quality_score: 0.0, text_items: 0

→ Image may be blank, corrupted, or contain no text

Quality Interpretation

When presenting results to users, consider the quality score:

Quality Score	Explanation to User
0.90 - 1.00	Excellent recognition quality
0.72 - 0.89	Good recognition quality (default target)
0.50 - 0.71	Fair recognition quality, may have some errors
0.00 - 0.49	Poor recognition quality or no text detected

If quality is below 0.5, mention to the user and suggest:

Try using --mode quality for better accuracy
Check if the image is clear and contains text
Provide a higher resolution image if possible

Advanced Options

Use only when explicitly requested by the user:

Include raw provider response (for debugging):

python scripts/ppocrv5/ocr_caller.py --file-url "URL" --return-raw-provider

Request visualization (show detection regions):

python scripts/ppocrv5/ocr_caller.py --file-url "URL" --visualize

Adjust auto mode parameters:

python scripts/ppocrv5/ocr_caller.py --file-url "URL" \
  --max-attempts 2 \
  --quality-target 0.80 \
  --budget-ms 20000

Reference Documentation

For in-depth understanding of the OCR system, refer to:

references/ppocrv5/agent_policy.md - Auto mode strategy and quality scoring
references/ppocrv5/normalized_schema.md - Complete output schema specification
references/ppocrv5/provider_api.md - Provider API contract details

Load these reference documents into context when:

Debugging complex issues
User asks about quality scoring algorithm
Need to understand adaptive retry mechanism
Customizing auto mode parameters

Testing the Skill

To verify the skill is working properly:

python scripts/ppocrv5/smoke_test.py

This tests configuration and API connectivity.