feishu-doc-reader
Feishu Document Reader
This skill enables reading and extracting content from Feishu (Lark) documents using the official Feishu Open API.
Configuration
Set Up the Skill
- Create the configuration file at
./reference/feishu_config.jsonwith your Feishu app credentials:
{
"app_id": "your_feishu_app_id_here",
"app_secret": "your_feishu_app_secret_here"
}
- Make sure the scripts are executable:
chmod +x scripts/read_doc.sh
chmod +x scripts/get_blocks.sh
Security Note: The configuration file should be kept secure and not committed to version control. Consider using proper file permissions (chmod 600 ./reference/feishu_config.json).
Usage
Basic Document Reading
To read a Feishu document, you need the document token (found in the URL: https://example.feishu.cn/docx/DOC_TOKEN).
Using the shell script (recommended):
# Make sure environment variables are set first
./scripts/read_doc.sh "your_doc_token_here"
# Or specify document type explicitly
./scripts/read_doc.sh "docx_token" "doc"
./scripts/read_doc.sh "sheet_token" "sheet"
Get Detailed Document Blocks (NEW)
For complete document structure with all blocks, use the dedicated blocks script:
# Get full document blocks structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block by ID
./scripts/get_blocks.sh "docx_token" "block_id"
# Get blocks with children
./scripts/get_blocks.sh "docx_token" "" "true"
Using Python directly for blocks:
python scripts/get_feishu_doc_blocks.py --doc-token "your_doc_token_here"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --block-id "block_id"
python scripts/get_feishu_doc_blocks.py --doc-token "docx_token" --include-children
Supported Document Types
- Docx documents (new Feishu docs): Full content extraction with blocks, metadata, and structure
- Doc documents (legacy): Basic metadata and limited content
- Sheets: Full spreadsheet data extraction with sheet navigation
- Slides: Basic metadata (content extraction requires additional permissions)
Features
Enhanced Content Extraction
- Structured output: Clean JSON with document metadata, content blocks, and hierarchy
- Complete blocks access: Full access to all document blocks including text, tables, images, headings, lists, etc.
- Block hierarchy: Proper parent-child relationships between blocks
- Text extraction: Automatic text extraction from complex block structures
- Table support: Proper table parsing with row/column structure
- Image handling: Image URLs and metadata extraction
- Link resolution: Internal and external link extraction
Block Types Supported
- text: Plain text and rich text content
- heading1/2/3: Document headings with proper hierarchy
- bullet/ordered: List items with nesting support
- table: Complete table structures with cells and formatting
- image: Image blocks with tokens and metadata
- quote: Block quotes
- code: Code blocks with language detection
- equation: Mathematical equations
- divider: Horizontal dividers
- page: Page breaks (in multi-page documents)
Error Handling & Diagnostics
- Detailed error messages: Clear explanations for common issues
- Permission validation: Checks required permissions before making requests
- Token validation: Validates document tokens before processing
- Retry logic: Automatic retries for transient network errors
- Rate limiting: Handles API rate limits gracefully
Security Features
- Secure credential storage: Supports both environment variables and secure file storage
- No credential logging: Credentials never appear in logs or output
- Minimal permissions: Uses only required API permissions
- Access token caching: Efficient token reuse to minimize API calls
Command Line Options
Main Document Reader
# Python script options
python scripts/read_feishu_doc.py --help
# Shell script usage
./scripts/read_doc.sh <doc_token> [doc|sheet|slide]
Blocks Reader (NEW)
# Get full document blocks
./scripts/get_blocks.sh <doc_token>
# Get specific block
./scripts/get_blocks.sh <doc_token> <block_id>
# Include children blocks
./scripts/get_blocks.sh <doc_token> "" true
# Python options
python scripts/get_feishu_doc_blocks.py --help
API Permissions Required
Your Feishu app needs the following permissions:
docx:document:readonly- Read document contentdoc:document:readonly- Read legacy document contentsheets:spreadsheet:readonly- Read spreadsheet content
Error Handling
Common errors and solutions:
- 403 Forbidden: Check app permissions and document sharing settings
- 404 Not Found: Verify document token is correct and document exists
- Token expired: Access tokens are valid for 2 hours, refresh as needed
- App ID/Secret invalid: Double-check your credentials in Feishu Open Platform
- Insufficient permissions: Ensure your app has the required API permissions
- 99991663: Application doesn't have permission to access the document
- 99991664: Document doesn't exist or has been deleted
- 99991668: Token expired, need to refresh
Examples
Extract document with full structure
# Read document
./scripts/read_doc.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
Get complete document blocks (NEW)
# Get all blocks with full structure
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv"
# Get specific block details
./scripts/get_blocks.sh "docx_AbCdEfGhIjKlMnOpQrStUv" "blk_xxxxxxxxxxxxxx"
Process spreadsheet data
./scripts/read_doc.sh "sheet_XyZ123AbCdEfGhIj" "sheet"
Extract only text content (Python script)
python scripts/read_feishu_doc.py --doc-token "docx_token" --extract-text-only
Security Notes
- Never commit credentials: Keep app secrets out of version control
- Use minimal permissions: Only request permissions your use case requires
- Secure file permissions: Set proper file permissions on secret files (
chmod 600) - Environment isolation: Use separate apps for development and production
- Audit access: Regularly review which documents your app can access
Troubleshooting
Authentication Issues
- Verify your App ID and App Secret in Feishu Open Platform
- Ensure the app has been published with required permissions
- Check that environment variables or config files are properly set
- Test with the
test_auth.pyscript to verify credentials
Document Access Issues
- Ensure the document is shared with your app or in an accessible space
- Verify the document token format (should start with
docx_,doc_, orsheet_) - Check if the document requires additional sharing permissions
Network Issues
- Ensure your server can reach
open.feishu.cn - Check firewall rules if running in restricted environments
- The script includes retry logic for transient network failures
Blocks-Specific Issues
- Empty blocks response: Document might be empty or have no accessible blocks
- Missing block types: Some block types require additional permissions
- Incomplete hierarchy: Use
--include-childrenflag for complete block tree
References
More from zephyrwang6/myskill
web-scraper
Fetch and extract content from web pages, converting HTML to clean markdown. Use when users want to read web articles, extract information from URLs, scrape web content, or when the built-in WebFetch tool fails due to network restrictions. Trigger when user provides URLs to read, asks to fetch web content, or needs to extract text from websites.
246rss-aggregator
Aggregates and summarizes recent updates from a predefined list of RSS feeds. Use when the user asks for "recent updates", "what's new", or "RSS updates" within a specific timeframe.
196youtube-transcript-cn
|
108content-topic-generator
从文章、推文、社交媒体内容生成多角度选题,包括推文选题(140字完整内容)和公众号选题(含详细大纲)。支持延伸、反驳、扩充、热点结合四种策略。当用户需要基于现有内容创作新选题、分析文章生成衍生内容、或进行内容再创作时使用。适用场景:(1) 分析推文/文章并生成选题,(2) 创建公众号/社交媒体内容策划,(3) 将长文拆解为多个传播点,(4) 内容营销和话题策划。
99topic-collector
AI热点采集工具。从Twitter/X、Product Hunt、Reddit、Hacker News、博客等采集AI相关热点内容。当用户说"开始今日选题"、"采集热点"、"看看今天有什么新闻"、"今日AI热点"时触发。聚焦领域:Vibe Coding、Claude Skill、AI知识管理、AI模型更新、AI新产品、海外热点。
76topic-generator
AI选题生成工具。从采集的热点中筛选TOP10,生成完整选题方案。当用户说"生成选题"、"筛选热点"、"哪些值得写"时触发。输出包含:事件描述、核心角度、标题、写作方式。
71