video-generate
Video Generate Skill
This skill generates videos using Doubao Seedance 1.0/1.5 models.
Trigger Conditions
- User wants to generate videos from text descriptions
- User wants to create videos based on images (first/last frame)
- User wants to create videos with reference materials (images, videos, audio)
- User asks for video generation capabilities
Usage
Environment Variables
Before using this skill, ensure the following environment variables are set:
ARK_API_KEYorMODEL_VIDEO_API_KEYorMODEL_AGENT_API_KEY: API key for the video generation serviceMODEL_VIDEO_API_BASE: API base URL (optional, has default)MODEL_VIDEO_NAME: Model name (optional, has default)
Function Signature
async def video_generate(
params: list,
batch_size: int = 10,
max_wait_seconds: int = 1200,
model_name: str = None,
) -> Dict:
Parameters
params (list[dict])
A list of video generation requests. Each item is a dict with the following fields:
Required per item:
video_name(str): Name/identifier of the output video fileprompt(str): Text describing the video to generate. Supports Chinese and English.
Optional per item - Input Materials:
first_frame(str): URL for the first frame imagelast_frame(str): URL for the last frame imagereference_images(list[str]): 1-4 reference image URLs for style/content guidancereference_videos(list[str]): 0-3 reference video URLs (mp4/mov, 2-15s each, total ≤15s)reference_audios(list[str]): 0-3 reference audio URLs (mp3/wav, 2-15s each, total ≤15s)
Optional per item - Video Output Parameters:
ratio(str): Aspect ratio. Options: "16:9" (default), "9:16", "4:3", "3:4", "1:1", "2:1", "21:9", "adaptive"duration(int): Video length in seconds. Range: 2-12s depending on modelresolution(str): Video resolution. Options: "480p", "720p", "1080p"frames(int): Total frame count. Must be in [29, 289] and follow format 25 + 4ncamera_fixed(bool): Lock camera movement. Default: falseseed(int): Random seed for reproducibility. Range: [-1, 2^32-1]watermark(bool): Whether to add watermark. Default: falsegenerate_audio(bool): Whether to generate audio. Only Seedance 1.5 supports thistools(list[dict]): Tool configuration, e.g.,[{"type": "web_search"}]
Input Modes
- Text-to-Video: Only provide prompt, no images/videos
- First Frame Guidance: Provide first_frame for starting image
- First + Last Frame Guidance: Provide both for transition video
- Reference Images: Provide reference_images for style/content guidance
- Multimodal Reference: Combine reference_images, reference_videos, reference_audios
Return Value
Script Return Info
The video_generate.py script will return these info:
{
"status": "success" | "partial_success" | "error",
"success_list": [{"video_name": "video_url"}],
"error_list": ["video_name"],
"error_details": [{"video_name": "...", "error": {...}}],
"pending_list": [{"video_name": "...", "task_id": "cgt-xxx", ...}]
}
Based on the script return info, the final response returned to the user consists of a description of the video generation task and the video URL(s). You may download the video from the URL, but the video URL should still be provided to the user for viewing and downloading.
Note: the URL is the 'url' in the success_list of script return info. The URL must return in two ways:
Final Return Info
You must return three types of information:
-
File format, return both file (if you have some other methods to send the video file) and local path, for example: /root/.openclaw/workspace/skills/video-generate/xxx.mp4
-
After generation, present list of video URL in Markdown format, for example:
<video src="https://example.com/video1.mp4" width="640" controls>video-1</video>
<video src="https://example.com/video2.mp4" width="640" controls>video-2</video>
Code Implementation
See scripts/video_generate.py for the full implementation.
Example Usage
# Text-to-Video
python scripts/video_generate.py -p "小猫骑着滑板穿过公园" -n cat_park -r 16:9 -d 5 --resolution 720p
# First Frame Guidance
python scripts/video_generate.py -p "小猫跳起来" -n cat_jump -f "https://example.com/cat.png" -r adaptive -d 5
# First + Last Frame Guidance
python scripts/video_generate.py -p "平滑过渡动画" -n transition \
-f "https://example.com/start.png" \
-l "https://example.com/end.png" \
-d 6
# Reference Images (style/content guidance)
python scripts/video_generate.py -p "[图1]戴着眼镜的男生和[图2]柯基小狗坐在草坪上" -n styled \
--ref-images "https://example.com/boy.png" "https://example.com/dog.png" \
-r 16:9 -d 5
# Multimodal Reference (video + audio)
python scripts/video_generate.py -p "将视频中的人物换成[图1]中的男孩" -n multimodal \
--ref-images "https://example.com/boy.png" \
--ref-videos "https://example.com/source.mp4" \
--ref-audios "https://example.com/voice.wav" \
-d 5
# With Audio Generation (Seedance 1.5 only)
python scripts/video_generate.py -p "女孩抱着狐狸,可以听到风声和树叶沙沙声" -n with_audio \
-f "https://example.com/girl_fox.png" \
--generate-audio \
-m doubao-seedance-1-5-pro-251215 \
-d 6 --resolution 1080p
# Query task status
python scripts/video_generate.py -q "cgt-20260222165751-wsnw8"
# Use specific model
python scripts/video_generate.py -p "A futuristic city" -m doubao-seedance-1-5-pro-251215
# No watermark
python scripts/video_generate.py -p "A beautiful landscape" --no-watermark
Command Line Options
| Option | Short | Description |
|---|---|---|
--prompt |
-p |
Text description of the video (required) |
--name |
-n |
Video name identifier (default: video) |
--model |
-m |
Model name (default: doubao-seedance-1-0-pro-250528) |
--ratio |
-r |
Aspect ratio (default: 16:9) |
--duration |
-d |
Video duration in seconds (2-12) |
--resolution |
Video resolution: 480p, 720p, 1080p | |
--first-frame |
-f |
First frame image URL |
--last-frame |
-l |
Last frame image URL |
--ref-images |
Reference image URLs (space-separated, 1-4 images) | |
--ref-videos |
Reference video URLs (space-separated, 0-3 videos) | |
--ref-audios |
Reference audio URLs (space-separated, 0-3 audios) | |
--generate-audio |
Generate audio (Seedance 1.5 only) | |
--seed |
Random seed for reproducibility | |
--no-watermark |
Disable watermark | |
--timeout |
-t |
Max wait time in seconds (default: 1200) |
--query-task |
-q |
Query task status by task_id |
Model Fallback
If you encounter a model-related error (like ModelNotOpen), you can downgrade to these models:
doubao-seedance-1-5-pro-251215doubao-seedance-1-0-pro-250528
Error Handling
- IF the script raises the error "PermissionError: ARK_API_KEY or MODEL_VIDEO_API_KEY or MODEL_AGENT_API_KEY not found in environment variables", inform the user that they need to provide the
ARK_API_KEYorMODEL_VIDEO_API_KEYorMODEL_AGENT_API_KEYenvironment variable. Write it to the environment variable file in the workspace. If the file already exists, append it to the end. Ensure the environment variable format is correct, make the environment variable effective, and retry the video generation task that just failed.
Notes
- Keep prompt concise (recommended ≤ 500 characters)
- For first/last frame, ensure aspect ratios match your chosen ratio
- Reference images: 1-4 images, formats: jpeg/png/webp/bmp/tiff/gif
- Reference videos: 0-3 videos, formats: mp4/mov, total duration ≤ 15s
- Reference audios: 0-3 audios, formats: mp3/wav, total duration ≤ 15s
- Multimodal requires at least one image or video (audio-only not supported)
- Audio generation is only supported by Seedance 1.5 pro
- If polling times out, use
--query-taskwith the returned task_id
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
368byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
182byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
162byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
108byted-data-search
|
106