byted-byteplus-vod-video-enhancement
VOD_video enhancement
Uploads video/audio to a BytePlus VOD space (from a local file or a public URL) and returns a vid://vxxxx reference. Additionally provides AI-based comprehensive quality restoration that removes compression artifacts, noise, and scratches from ingested videos, improving overall clarity and color rendition.
Prerequisites
- Environment variables (required, can be configured via a
.envfile in the working directory — the scripts will load it automatically):BYTEPLUS_ACCESSKEY— BytePlus Access KeyBYTEPLUS_SECRETKEY— BytePlus Secret KeyVOD_SPACE_NAME— VOD space name
- Execution: examples use
uv run python ...(if the host environment can run Python directly,python scripts/...also works).
Workflow Overview
Upload pipeline (local file):
[S1_APPLY] ApplyUploadInfo → returns TOS upload address + SessionKey
[S2_TOS] PUT file to TOS (direct or chunked)
[S3_COMMIT] CommitUploadInfo → returns Vid
Output: { Vid, Source, PlayURL, FileName, SpaceName, SourceUrl }
Upload pipeline (URL):
[S1_UPLOAD] Submit URL upload job (UploadMediaByUrl) → returns JobId
[S2_POLL] Poll QueryUploadTaskInfo → returns Vid
Output: { Vid, Source, PlayURL, FileName, SpaceName, SourceUrl, JobId }
Quality restoration pipeline:
[S3_ENHANCE] Submit restoration job (StartExecution/enhanceVideo) → returns RunId
[S4_POLL] Poll GetExecution → returns the restored file
Output: { Status, SpaceName, VideoUrls[{ FileId, DirectUrl, Source }] }
Quick Self-Check (recommended)
Before running any script, confirm the following (avoid unrelated Python/uv version checks):
.envor environment variables contain:BYTEPLUS_ACCESSKEY+BYTEPLUS_SECRETKEYVOD_SPACE_NAME
Once verified, pick the corresponding pipeline based on user intent:
| User intent | Pipeline | Entry script |
|---|---|---|
| Upload video to VOD | Upload pipeline | scripts/upload.py |
| Quality restoration / denoise / remove compression artifacts | Quality restoration pipeline | scripts/quality_enhance.py |
S1_UPLOAD & S2_POLL: Upload and Obtain Vid
Calling Convention
Run from the Skill root directory (byted-byteplus-vod-video-enhancement/):
# Local file upload (synchronous — returns Vid when complete)
uv run python scripts/upload.py "/path/to/video.mp4" [space_name]
# URL upload (automatically polls until a Vid is returned)
uv run python scripts/upload.py "<https://example.com/video.mp4>" [space_name]
# Example: specifying the space
uv run python scripts/upload.py "https://example.com/sample.mp4" my_space
- First argument: either a local file path or a public
http:///https://link. The script auto-detects which mode to use. - Second argument (optional): the VOD space name; when omitted it is read from the environment variable
VOD_SPACE_NAME. - The file / URL must carry a file extension (such as
.mp4,.mov,.mp3), otherwise an error is raised.
Upload Flow
Local file upload (synchronous, three-step):
- Call
ApplyUploadInfo(API Version: 2023-01-01) to obtain the TOS upload address, authentication token, and SessionKey. - PUT the file to TOS (direct upload for files < 20 MiB, chunked upload otherwise).
- Call
CommitUploadInfo(API Version: 2023-01-01) with the SessionKey; returns theVid.
URL upload (two-phase asynchronous):
- Call
UploadMediaByUrl(API Version: 2023-01-01) to submit the pull job; returns aJobId. - Poll
QueryUploadTaskInfountil the job completes, with a maximum wait of 30 minutes (360 × 5s). - Once the job is complete, return the
Vid.
Output Format
On success, a JSON line is printed to stdout:
{
"Vid": "v0d123abc",
"Source": "vid://v0d123abc",
"PlayURL": "https://example.cdn.com/xxx.m3u8",
"PosterUri": "",
"FileName": "uuid-filename.mp4",
"SpaceName": "my_space",
"SourceUrl": "https://example.com/video.mp4",
"JobId": "job-xxx"
}
Source: avid://-formatted reference that can be passed directly to follow-up skills such asbyted-mediakit.- The host agent should save the
Sourcefield for use in subsequent processing steps.
Timeout Handling
If polling times out (30 minutes), the output is:
{
"error": "Polling timed out (360 attempts × 5s); the URL pull upload is still processing",
"resume_hint": {
"description": "The URL upload has not finished yet; retry with the command below",
"command": "uv run python scripts/upload.py \"<original URL>\" [space_name]"
},
"JobIds": "job-xxx",
"State": "running"
}
S3_ENHANCE & S4_POLL: AI Comprehensive Quality Restoration
Calling Convention
Run from the Skill root directory (byted-byteplus-vod-video-enhancement/):
# Submit after the user has explicitly selected both config and repair_style
uv run python scripts/quality_enhance.py '{"type":"Vid","video":"v0310abc","config":"common","repair_style":1}'
# Example: a vid:// prefix is also accepted (the script strips it automatically)
uv run python scripts/quality_enhance.py '{"type":"Vid","video":"vid://v0d225gxxx","config":"common","repair_style":1}' production_space
# Pass parameters via @file.json (recommended — avoids shell escaping issues)
uv run python scripts/quality_enhance.py @params.json
# Resume polling after a timeout
uv run python scripts/poll_execution.py '<RunId>' [space_name]
Parameter Reference
| Parameter | Type | Required | Description |
|---|---|---|---|
type |
string | ✅ | Vid (video ID) or DirectUrl (VOD storage FileName) |
video |
string | ✅ | The video Vid or FileName (a vid:// prefix is accepted and automatically stripped) |
config |
string | ✅ | VolcMoeEnhanceParam Config; one of common, ugc, short_series, aigc, old_film. If the user explicitly asks for defaults, use common. |
repair_style |
integer | ✅ | VolcMoeEnhanceParam VideoStrategy.RepairStyle; 1 = Standard, 2 = Pro. If the user explicitly asks for defaults, use 1. |
Before quality restoration, you MUST ask the user to choose both required enhancement parameters if either config or repair_style is missing. Do not silently use defaults. Only use config=common and repair_style=1 when the user explicitly asks for default/recommended settings. When asking the user, use plain product language only; do not show internal parameter names or values such as config=..., repair_style=..., common, or short_series in the question text or option labels.
Suggested prompt:
Video enhancement may take some time. Choosing the right template usually gives better results.
What type of video is it?
- General video
- Short video / UGC
- Short drama / short series
- AI-generated content
- Old film / classic footage that needs restoration
Which video enhancement tier would you like to use?
- Standard: balanced visual improvement and processing speed
- Pro: cinematic-grade restoration with longer processing time; allowlist access may be required
If the user asks for a default recommendation, use config=common and repair_style=1. Otherwise, wait for the user's selections before running scripts/quality_enhance.py.
Internal mapping: General video -> config=common; Short video / UGC -> config=ugc; Short drama / short series -> config=short_series; AI-generated content -> config=aigc; Old film / classic footage -> config=old_film; Standard -> repair_style=1; Pro -> repair_style=2. Do not expose these parameter names or values in the question unless the user asks for implementation details.
Special handling for Pro: if the user chooses repair_style=2 and the StartExecution/GetExecution response returns HTTP status 403, or any error message contains Permission denied, explain that Pro is only available to users on the allowlist. Ask the user to submit a ticket to apply: https://console.byteplus.com/workorder/create
Output Format
On success, a JSON line is printed to stdout:
{
"Status": "Success",
"SpaceName": "my_space",
"VideoUrls": [
{
"FileId": "xxx",
"DirectUrl": "path/to/output.mp4",
"Source": "directurl://path/to/output.mp4",
"Url": "https://example.cdn.com/path/to/output.mp4?auth_key=..."
}
],
"AudioUrls": [],
"Texts": []
}
VideoUrls[0].Url: a directly accessible/downloadable URL (the script signs it based on the space's domain/auth rules).VideoUrls[0].Source(directurl://...) can be passed directly to downstream skills.
Timeout Handling
If polling times out (30 minutes), the output is:
{
"error": "Polling timed out (360 attempts × 5s); the job is still processing",
"resume_hint": {
"description": "The job has not finished yet; resume polling with the command below",
"command": "uv run python scripts/poll_execution.py '<RunId>' [space_name]"
}
}
Environment Variables
| Name | Description | Required |
|---|---|---|
BYTEPLUS_ACCESSKEY |
BytePlus Access Key | Yes |
BYTEPLUS_SECRETKEY |
BytePlus Secret Key | Yes |
VOD_SPACE_NAME |
VOD space name | Yes (or via CLI argument) |
VOD_POLL_INTERVAL |
Polling interval (seconds, default 5) | No |
VOD_POLL_MAX |
Maximum polling attempts (default 360) | No |
VOD_URL_EXPIRE_MINUTES |
Signed URL expiration (minutes, default 60) | No |
VOD_PLAY_DOMAIN |
Force the use of a specific play domain (optional, highest priority) | No |
Error Output Format
All errors share the same format:
{"error": "error description"}
References
- BytePlus VOD Python SDK
- Quality restoration parameter reference
- API:
ApplyUploadInfo(Version: 2023-01-01) - API:
CommitUploadInfo(Version: 2023-01-01) - API:
UploadMediaByUrl(Version: 2023-01-01) - API:
QueryUploadTaskInfo(Version: 2023-01-01) - API:
StartExecution(Version: 2025-07-01) - API:
GetExecution(Version: 2025-07-01)
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
369byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
163byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-data-search
|
106byted-las-vlm-video
Analyzes and understands video content using Volcengine LAS Doubao vision-language models (VLM). Multimodal AI video analysis, video comprehension, and visual understanding of video clips and footage. Performs video question answering (video Q&A) — ask questions about what happens in a video and get AI answers. Scene recognition and scene description, object recognition and object detection, action recognition and action detection from video frames. Generates video descriptions, video captions, video summaries, video annotations, and content summarization. Visual frame analysis for identifying people, objects, actions, and events in video. Auto-compresses video to 50MB before inference. Synchronous single-call processing. Use this skill when the user wants to analyze or understand video content using VLM/AI, do video Q&A (ask questions about a video), describe what happens in a video, recognize objects/actions/scenes in video frames, generate video captions/descriptions/summaries, annotate or label video content, get AI-powered visual understanding of video clips, or perform multimodal video analysis with vision-language models.
97byted-text-to-speech
将文本合成为语音(TTS)。使用火山引擎豆包语音合成 API,支持流式合成、多种音色、语速/音调/音量调节、Markdown 过滤和 LaTeX 公式播报。当用户需要把文字转成语音、生成朗读音频、配音、旁白、播报,或提到「文字转语音」「TTS」「语音合成」「朗读」「配音」时使用本技能。
93