ppocrv5
PP-OCRv5 API Skill
When to Use This Skill
Invoke this skill in the following situations:
- Extract text from images (screenshots, photos, scans, charts)
- Read text from PDF or document images
- Perform OCR on any visual content containing text
- Parse structured documents (invoices, receipts, forms, tables)
- Recognize text in photos taken by mobile phones
- Extract text from URLs pointing to images or PDFs
Do not use this skill in the following situations:
- Plain text files that can be read directly with the Read tool
- Code files or markdown documents
- Tasks that do not involve image-to-text conversion
How to Use This Skill
Basic Workflow
-
Identify the input source:
- User provides URL: Use the
--file-urlparameter - User provides local file path: Use the
--file-pathparameter - User uploads image: Save it first, then use
--file-path
- User provides URL: Use the
-
Execute OCR:
python scripts/ocr_caller.py --file-url "URL provided by user" --prettyOr for local files:
python scripts/ocr_caller.py --file-path "file path" --pretty -
Parse JSON response:
- Check the
okfield:truemeans success,falsemeans error - Extract text:
result.full_textcontains all recognized text - Get quality:
quality.quality_scoreindicates recognition confidence (0.0-1.0) - Handle errors: If
okis false, displayerror.message
- Check the
-
Present results to user:
- Display extracted text in a readable format
- If quality score is low (<0.5), alert the user
- If structured output is needed, use
result.pages[].items[]to get line-by-line data
Mode Selection
Always use --mode auto (default) unless the user explicitly requests otherwise:
| User Request | Use Mode | Command Flag |
|---|---|---|
| Default/unspecified | Auto (adaptive) | --mode auto (or omit) |
| "Quick recognition" / "fast" | Fast | --mode fast |
| "High precision" / "accurate" | Quality | --mode quality |
Auto mode (recommended): Automatically tries 1-3 times, progressively increasing correction levels, returning the best result.
Usage Mode Examples
Mode 1: Simple URL OCR
python scripts/ocr_caller.py --file-url "https://example.com/invoice.jpg" --pretty
Mode 2: Local File OCR
python scripts/ocr_caller.py --file-path "./document.pdf" --pretty
Mode 3: Fast Mode for Clear Images
python scripts/ocr_caller.py --file-url "URL" --mode fast --pretty
Understanding the Output
The script outputs JSON structure as follows:
{
"ok": true,
"result": {
"full_text": "All recognized text here...",
"pages": [...]
},
"quality": {
"quality_score": 0.85,
"text_items": 42
}
}
Key fields to extract:
result.full_text: Complete text for the userquality.quality_score: 0.72+ is good, <0.5 is poorerror.message: Ifokis false, provides error description
First-Time Configuration
If the user has not configured API credentials, run:
python scripts/configure.py
This will prompt for:
API_URL: Paddle AI Studio endpointPADDLE_OCR_TOKEN: User's access token
Configuration is saved to the .env file, only needs to be configured once.
Error Handling
Configuration missing:
Error: API_URL not configured
→ Run python scripts/configure.py
Authentication failed (403):
error_code: PROVIDER_AUTH_ERROR
→ Token is invalid, reconfigure with correct credentials
Quota exceeded (429):
error_code: PROVIDER_QUOTA_EXCEEDED
→ Daily API quota exhausted, inform user to wait or upgrade
No text detected:
quality_score: 0.0, text_items: 0
→ Image may be blank, corrupted, or contain no text
Quality Interpretation
When presenting results to users, consider the quality score:
| Quality Score | Explanation to User |
|---|---|
| 0.90 - 1.00 | Excellent recognition quality |
| 0.72 - 0.89 | Good recognition quality (default target) |
| 0.50 - 0.71 | Fair recognition quality, may have some errors |
| 0.00 - 0.49 | Poor recognition quality or no text detected |
If quality is below 0.5, mention to the user and suggest:
- Try using
--mode qualityfor better accuracy - Check if the image is clear and contains text
- Provide a higher resolution image if possible
Advanced Options
Use only when explicitly requested by the user:
Include raw provider response (for debugging):
python scripts/ocr_caller.py --file-url "URL" --return-raw-provider
Request visualization (show detection regions):
python scripts/ocr_caller.py --file-url "URL" --visualize
Adjust auto mode parameters:
python scripts/ocr_caller.py --file-url "URL" \
--max-attempts 2 \
--quality-target 0.80 \
--budget-ms 20000
Reference Documentation
For in-depth understanding of the OCR system, refer to:
references/agent_policy.md- Auto mode strategy and quality scoringreferences/normalized_schema.md- Complete output schema specificationreferences/provider_api.md- Provider API contract details
Load these reference documents into context when:
- Debugging complex issues
- User asks about quality scoring algorithm
- Need to understand adaptive retry mechanism
- Customizing auto mode parameters
Testing the Skill
To verify the skill is working properly:
python scripts/smoke_test.py
This tests configuration and API connectivity.
More from zephyrwang6/myskill
web-scraper
Fetch and extract content from web pages, converting HTML to clean markdown. Use when users want to read web articles, extract information from URLs, scrape web content, or when the built-in WebFetch tool fails due to network restrictions. Trigger when user provides URLs to read, asks to fetch web content, or needs to extract text from websites.
246rss-aggregator
Aggregates and summarizes recent updates from a predefined list of RSS feeds. Use when the user asks for "recent updates", "what's new", or "RSS updates" within a specific timeframe.
196youtube-transcript-cn
|
108content-topic-generator
从文章、推文、社交媒体内容生成多角度选题,包括推文选题(140字完整内容)和公众号选题(含详细大纲)。支持延伸、反驳、扩充、热点结合四种策略。当用户需要基于现有内容创作新选题、分析文章生成衍生内容、或进行内容再创作时使用。适用场景:(1) 分析推文/文章并生成选题,(2) 创建公众号/社交媒体内容策划,(3) 将长文拆解为多个传播点,(4) 内容营销和话题策划。
99topic-collector
AI热点采集工具。从Twitter/X、Product Hunt、Reddit、Hacker News、博客等采集AI相关热点内容。当用户说"开始今日选题"、"采集热点"、"看看今天有什么新闻"、"今日AI热点"时触发。聚焦领域:Vibe Coding、Claude Skill、AI知识管理、AI模型更新、AI新产品、海外热点。
76topic-generator
AI选题生成工具。从采集的热点中筛选TOP10,生成完整选题方案。当用户说"生成选题"、"筛选热点"、"哪些值得写"时触发。输出包含:事件描述、核心角度、标题、写作方式。
71