byted-tos-doc-process
Bytedance TOS Document Process Skill
This skill provides document processing functions for files in Bytedance's TOS via the doc-preview feature, implemented by generating pre-signed URLs with the Volcengine TOS SDK.
Note: This approach is necessary because the SDK's get_object method does not directly support doc_* keyword arguments. All document processing parameters must be passed as query parameters in a pre-signed URL.
Quick Start
1. Client Initialization
import os
import tos
from tos.enum import HttpMethodType
from urllib.request import urlopen
def create_client() -> tos.TosClientV2:
"""Initializes a TosClientV2 from environment variables."""
try:
# ... (full implementation in scripts)
return tos.TosClientV2(
ak=os.getenv('TOS_ACCESS_KEY'),
sk=os.getenv('TOS_SECRET_KEY'),
endpoint=os.getenv('TOS_ENDPOINT'),
region=os.getenv('TOS_REGION'),
security_token=os.getenv('TOS_SECURITY_TOKEN'),
)
except Exception as e:
print(f"Error initializing client: {e}")
return None
client = create_client()
2. Basic Workflow (Pre-signed URL)
# (Assumes 'client' is initialized and 'bucket_name', 'object_key' are set)
# 1. Preview document as a PDF and save locally
try:
# Build query params for doc-preview
pdf_params = {
"x-tos-process": "doc-preview",
"x-tos-doc-dst-type": "pdf"
}
presigned_pdf = client.pre_signed_url(
HttpMethodType.Http_Method_Get,
bucket_name,
object_key,
query=pdf_params
)
# Download the content from the pre-signed URL
with urlopen(presigned_pdf.signed_url) as response, open("local_preview.pdf", "wb") as f_out:
f_out.write(response.read())
print("PDF preview saved to local_preview.pdf")
except Exception as e:
print(f"Error converting to PDF: {e}")
# 2. Preview page 3 as a PNG image
try:
png_params = {
"x-tos-process": "doc-preview",
"x-tos-doc-dst-type": "png",
"x-tos-doc-page": "3",
"x-tos-doc-image-dpi": "150"
}
presigned_png = client.pre_signed_url(
HttpMethodType.Http_Method_Get,
bucket_name,
object_key,
query=png_params
)
with urlopen(presigned_png.signed_url) as response, open("page_3.png", "wb") as f_out:
f_out.write(response.read())
print("Page 3 saved as page_3.png")
except Exception as e:
print(f"Error converting to PNG: {e}")
# 3. Get total page count from response headers
try:
presigned_head = client.pre_signed_url(
HttpMethodType.Http_Method_Get,
bucket_name,
object_key,
query={"x-tos-process": "doc-preview", "x-tos-doc-dst-type": "pdf"}
)
with urlopen(presigned_head.signed_url) as response:
total_pages = response.headers.get("x-tos-total-page")
print(f"Document has {total_pages} pages.")
except Exception as e:
print(f"Error getting page count: {e}")
Core Operations
All document processing is achieved by generating a pre-signed URL with process=\"doc-preview\" and other x-tos-doc-* parameters in the query string.
1. Convert to PDF (x-tos-doc-dst-type='pdf')
Converts an entire document into a single PDF file.
# See Quick Start example
2. Convert to Image (x-tos-doc-dst-type='png' or 'jpg')
Converts a specific page of a document into an image.
# See Quick Start example
# Use query params like "x-tos-doc-page", "x-tos-doc-image-dpi", etc.
3. Convert to HTML (x-tos-doc-dst-type='html')
Fetches a temporary HTML page containing a token for the final preview URL. This requires a second step to parse the HTML and decode the token.
# Step 1: Get the HTML content via a pre-signed URL
html_params = {"x-tos-process": "doc-preview", "x-tos-doc-dst-type": "html"}
presigned_html = client.pre_signed_url(HttpMethodType.Http_Method_Get, bucket_name, object_key, query=html_params)
with urlopen(presigned_html.signed_url) as response:
html_content = response.read().decode('utf-8')
# Step 2: Parse and decode (see scripts/doc_preview_html_url.py for full logic)
# ... logic to extract and base64-decode the token ...
# final_url = decode_preview_url(token)
4. Batch Export Pages (image-mode=1)
Exports a range of pages as images directly to a TOS bucket.
# Use query params: "image-mode", "start-page", "end-page", "x-tos-save-bucket", "x-tos-save-object"
batch_params = {
"x-tos-process": "doc-preview",
"x-tos-doc-dst-type": "jpg",
"image-mode": "1",
"start-page": "2",
"end-page": "5",
"x-tos-save-bucket": "output-bucket",
"x-tos-save-object": "exported/page_{Page}.jpg" # {Page} is a placeholder
}
presigned_batch = client.pre_signed_url(HttpMethodType.Http_Method_Get, bucket_name, object_key, query=batch_params)
# The response body (from urlopen) contains JSON metadata about the batch job
Authorization
Authentication is handled by tos.TosClientV2. Provide credentials via environment variables.
Required Environment Variables
TOS_ACCESS_KEYTOS_SECRET_KEYTOS_ENDPOINTTOS_REGION
Optional for STS
TOS_SECURITY_TOKEN
Best Practices
- Error Handling: Always wrap HTTP requests in
try...exceptblocks forHTTPErrorandURLError. - Parameter Reference: Refer to
REFERENCE.mdfor a mapping ofdoc_preview_params.pyarguments tox-tos-*query keys and to the official TOS documentation for authoritative details. - HTML Preview: Be aware of the two-step process and the custom domain requirement for recent buckets.
- Total Pages Header: The
x-tos-total-pageheader is a convenient way to get the page count.
Additional Resources
- For detailed parameters, see REFERENCE.md.
- For end-to-end examples, see WORKFLOWS.md.
- For executable Python examples, see the
scripts/directory. - For the definitive list of all processing parameters, always consult the official Volcengine TOS Document Preview documentation.
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
369byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
183byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
163byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
109byted-data-search
|
106