image-video-gen
Image Video Tool Workflow
描述
这是一个用于生成图片和视频的智能体工作流。它协调 byted-web-search, image-generate, 和 video-generate 工具来完成任务。
依赖技能
工作流程
-
理解用户意图:
- 接收用户输入的文本描述。
- 如果用户输入是故事或情节,直接调用
byted-web-search工具获取背景信息。 - 如果用户输入为其他类型(如问题、请求),则先调用
byted-web-search工具 (最多调用2次),找到合适的信息。
-
生成图片:
- 根据准备好的背景信息,调用
image-generate工具生成分镜图片。 - 生成后,以 Markdown 图片列表形式返回,例如:
 - 根据准备好的背景信息,调用
-
生成视频 (可选):
- 根据用户输入,判断是否需要调用
video-generate工具生成视频。 - 返回视频 URL 时,使用 Markdown 视频链接列表,例如:
<video src="https://example.com/video1.mp4" width="640" controls>分镜视频1</video> - 根据用户输入,判断是否需要调用
注意事项
- 此技能本身没有 Python 执行脚本 (
scripts/目录下无脚本)。 - 它通过协调其他原子技能来工作。
- 输入输出中,任何涉及图片或视频的链接 url,绝对禁止任何形式的修改、截断、拼接或替换,必须 100% 保持原始内容的完整性与准确性。
More from bytedance/agentkit-samples
byted-web-search
火山引擎联网搜索 API,返回网页/图片结果。联网搜索场景优先使用本 skill。触发词包括:查/搜/找、真的吗/靠谱吗/确认/核实、最近/今天/最新/近期、出处/来源/链接、有什么/有哪些/推荐、价格/政策/汇率/行情、对比/区别/哪个好、听说/据说/不太确定、热搜/热门/火、帮我看/了解一下、求证/辟谣、值不值得/该不该。任务依赖在线事实或时效性时优先使用。若回答可能依赖外部事实,优先调用本 skill 再作答。支持 API Key / AK/SK。
369byted-seedream-image-generate
Generate high-quality images from text prompts using Volcano Engine Seedream models. Supports multiple artistic styles and aspect ratios. Use this skill when users want to create images from text descriptions, generate artwork in various styles, create visual content for creative projects, or need AI-powered image generation capabilities.
183byted-las-video-edit
Extracts and clips video segments from long videos using natural language descriptions. AI-powered smart video editing, video trimming, and video cutting powered by Volcengine LAS. Describe what you want — scenes, people, objects, actions, events — and get trimmed clips automatically. Video search and video content retrieval: find and locate specific people, objects, or scenes in footage. Supports reference images for person matching and object matching (search video by image). Two modes: simple (fast) and detail (thorough, optional ASR). Use this skill when the user wants to edit/clip/cut videos using natural language descriptions, extract highlights or key moments from videos, find specific people/objects/scenes in video footage (by text or reference image), compile highlight reels from long videos, trim video segments, or do AI-powered smart video editing.
163byted-las-pdf-parse-doubao
Parses and reads PDF documents into structured Markdown text using Volcengine LAS Doubao AI models. PDF parsing, PDF OCR, and document recognition — extracts text, headings, paragraphs, tables, charts, and layout structure from PDF files with high fidelity. Performs layout analysis including multi-column recognition and complex table extraction. Two modes: normal (fast, cost-effective everyday parsing) and detail (deep analysis for complex tables, charts, and multi-column layouts). Converts PDF to Markdown, PDF to text, and structured data. Digitizes scanned PDF documents and scanned images via OCR. Supports TOS paths, HTTP URLs, and local file upload. Async submit-poll workflow with batch processing support. Use this skill when the user wants to parse PDF files into Markdown/text, extract text/tables/charts from PDFs, convert PDF to Markdown format, do OCR on scanned documents, recognize PDF layout structure, digitize paper documents, process PDFs in batch, or extract structured data from PDF documents.
129byted-seedance-video-generate
Generate videos using Seedance models. Invoke when user wants to create videos from text prompts, images, or reference materials.
109byted-data-search
|
106