Vidu Video and Image Generation Skill

Generate AI videos and images with Vidu via vidu-cli — text-to-image, text-to-video, image-to-video, start-end frame, reference-based generation, and material elements, up to 1080p/2K/4K.

Execution model: use vidu CLI

All execution is done via the vidu-cli CLI tool. Parameters are CLI flags (not raw JSON bodies).

Environment variables

VIDU_TOKEN (required) — Vidu API token
VIDU_BASE_URL (optional) — Default https://service.vidu.cn (mainland China); use https://service.vidu.com for overseas
VIDU_DEBUG (optional) — Set to 1 to print full response body to stderr for debugging

Stdout contract

Every command prints one line of JSON to stdout.
Success: {"ok": true, "trace_id": "...", ...} — exit code 0
Failure: {"ok": false, "error": {"type": "...", "http_status": ..., "code": "...", "message": "..."}} — exit code 1
trace_id appears on API-backed responses for support/debugging.
CRITICAL: Never guess why an error happened. Copy fields from error exactly. Full shapes and edge cases: references/parameters.md.

Error type values

http_error — API 4xx/5xx (http_status, code, message)
network_error — Connection failure or timeout
parse_error — Response is not valid JSON
client_error — Local issues (missing token, bad path, validation)

Main commands

Command	Purpose
`vidu-cli upload <image_path>`	Upload image → `upload_id`, `ssupload_uri`
`vidu-cli task submit --type ... --prompt ... [options]`	Submit task → `task_id`. `--image`: local path, URL, or `ssupload:?id=...` (auto-upload). `--video`: local path or `ssupload:?id=...` (character2video + 3.2_a only, auto-upload; URLs not supported). `--audio`: local path or `ssupload:?id=...` (character2video + 3.2_a only; URLs not supported).
`vidu-cli task get <task_id> [--output/-o <dir>]`	Query task → `state`, `type`, `model`; use `--output` to download media on success
`vidu-cli task compose --timeline <json> [--width N --height N] [--schedule-mode <mode>]`	Compose video from timeline → `task_id`. Query with `task get`. Supports `--schedule-mode` (auto-detected if omitted). MUST read references/compose.md before building the timeline JSON — do not guess the schema.
`vidu-cli task lip-sync --video <path> --text <text> [options]`	Lip-sync with text-to-speech → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task lip-sync --video <path> --audio <path>`	Lip-sync with audio file → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task lip-sync-voices`	List available lip-sync voices (90+, Chinese/English/Cantonese/Cartoon etc.)
`vidu-cli task tts --prompt ... --voice-id ...`	Text-to-speech → `task_id`. Supports `--schedule-mode` (auto-detected if omitted).
`vidu-cli task tts-voices`	List available TTS voices (300+, 20+ languages)
`vidu-cli task cost --type ... --model-version ... --duration ...`	Query credit cost for video/image tasks (estimate before submitting)
`vidu-cli task tts-cost --text ... --voice-id ...`	Query credit cost for TTS tasks (priced by character count; `--text` required)
`vidu-cli task lip-sync-cost --duration ... --voice-id ...`	Query credit cost for lip-sync tasks (defaults to voice `English_Aussie_Bloke` if omitted)
`vidu-cli quota pass`	Query claw-pass daily quota status
`vidu-cli quota credit`	Query user credit balance
`vidu-cli element create --name ... --image ... [--description ...] [--style ...]`	Create reference element (check → preprocess → create). Returns `id`, `version`.
`vidu-cli element check --name ...`	Check name availability
`vidu-cli element list [--keyword kw]`	List personal elements
`vidu-cli element search --keyword kw`	Search community elements

Smart input handling

--image (task submit, element create):

Local path → auto-upload (auto-compress when file is larger than 10MB)
http(s): URL → download then upload
ssupload:?id=... → use as-is

--video and --audio (task submit, character2video + 3.2_a only):

Local path → auto-upload
ssupload:?id=... → use as-is
http(s): URL → not supported (rejected with error)

Key Capabilities

text-to-image — Text-only image generation
text-to-video — Text-only video generation
image-to-video — One image + text → video
head-tail-image-to-video — Start + end frames + text
reference-to-image — Images + materials: 1–7 total; text prompt required; can be images-only, materials-only, or mixed; images-only needs no element create
reference-to-video — Same rule: 1–7 total; text prompt required; with 3.2_a model, also supports --video input (max 3, local files validated for size/dimensions/duration)
lip-sync — Drive video mouth movement with text-to-speech or audio file
text-to-speech — Convert text to speech audio via task tts
video-compose — Compose multi-track timeline (video/audio/subtitle/effect) into a single exported video via task compose
create-references — element create (single command)
search-community-references — element search
query-task — task get [--output <dir>]

Setup

npm install -g vidu-cli@latest (requires Node.js >=14; postinstall auto-downloads the platform binary)
Obtain VIDU_TOKEN (e.g. Vidu console).
Set VIDU_TOKEN environment variable (required); set VIDU_BASE_URL if not using default region.
Verify: vidu-cli task submit --help

Data usage and privacy (summary)

Content you send (prompts, images, task settings) goes to Vidu’s API. Confirm this meets your privacy and IP needs. Prefer least-privilege tokens for testing. Terms: https://www.vidu.com/terms (overseas), https://www.vidu.cn/terms (mainland China).

Async workflow (short)

Vidu generation is asynchronous: task submit → task_id → poll task get <task_id> until terminal state.
Model nicknames: Q1 → 3.0, Q2 → 3.1, Q2 Pro → 3.1_pro, Q3 → 3.2, Omni Video Pro → 3.2_a, 全能Video Pro → 3.2_a, Q3-A → 3.2_a (character2video supports --audio and --video; duration -1 or 4–15s). Additional variants exist: 全能Image 2 (GPT-Image 2) → 3.2_image_2 for multimodal visual generation with strong text rendering accuracy, plus 全能Q3 Fast → 3.2_fast_m, 全能Q3 Pro → 3.2_pro_m — see references/parameters.md for the complete per-task model version list.
Task-type summaries, task support matrix, copy-paste CLI examples, prompt tips, and element create/list/search details are in references/parameters.md.
Task lifecycle, retries, and polling guidance: references/errors_and_retry.md.

Implementation guide

For task submit (generation tasks)

Pick capability → map to --type and options using references/parameters.md (matrix + validation).
Always pass --resolution; default to 1080p unless the user explicitly requests a different supported value.
Prepare inputs: for reference2image / character2video, --image and/or --material so combined count is 1–7; for character2video with 3.2_a, also supports --video (max 3); optional [@name] in prompt per references/parameters.md.
(Optional) Query cost before submitting: use task cost, task tts-cost, or task lip-sync-cost to estimate credit usage and check eligibility.
vidu-cli task submit ... → store task_id and trace_id.
- schedule-mode auto-detection: if --schedule-mode is omitted, CLI queries claw-pass status and uses claw_pass when user has an active pass, otherwise normal. If submit fails with ClawPassExplicitModeRequired, tell the user their daily claw-pass quota is exhausted. Do not retry automatically — suggest re-submitting with --schedule-mode normal to use credits instead, or waiting for the next quota refresh.
vidu-cli task get <task_id> until success or failed; use --output <dir> to download media on success.
On success return downloaded_files (if --output used) or prompt user to re-run with --output; on task failure return err_code / err_msg; on CLI ok: false return error fields verbatim.

For task compose (video composition)

CRITICAL: Before constructing the --timeline JSON, you MUST read references/compose.md first. The timeline has a specific JSON schema with exact field names, nesting structure, and media_url rules. Do NOT guess the structure — always refer to compose.md for the complete schema, supported fields, and examples.

Read references/compose.md to understand the timeline JSON schema, media_url rules, and limits.
Build the timeline JSON following the exact structure: video_tracks[].video_track_clips[], audio_tracks[].audio_track_clips[], subtitle_tracks[].subtitle_track_clips[], effect_tracks[].effect_track_items[]. Every clip must include timeline_in and timeline_out (the CLI validates this and rejects timelines with missing values).
For media_url: use ssupload:?id=xxx, http URL, or local file path (auto-uploaded by CLI).
For file_url (subtitles): use ssupload:?id=xxx, http URL, or local .srt file path.
vidu-cli task compose --timeline <file_or_json> [--width N --height N] [--schedule-mode <mode>] → returns task_id.
- schedule-mode auto-detection: same as task submit — if omitted, CLI auto-detects from claw-pass status. If compose fails with ClawPassExplicitModeRequired, suggest --schedule-mode normal to use credits instead.
vidu-cli task get <task_id> to poll status, same as other tasks.

Output to the user

After submit: return task_id and trace_id; state that processing is in progress.
After query: if state is success, return downloaded_files (if --output was used) or the task_id with a note to re-run with --output <dir> to download; if failed, return err_code and err_msg exactly (note: response may still have ok: true while state is failed).
On CLI failure (ok: false): report error.type, http_status, code, message exactly — do not infer causes.

References (bundled)

File	Contents
references/parameters.md	Task matrix, CLI flags, examples, prompt tips, validation
references/errors_and_retry.md	States, retries, polling
references/compose.md	Timeline schema, media_url rules, clip compose examples

Fallback (no Node.js / npm)

If node / npm / vidu-cli cannot be installed, this skill cannot run. Require vidu-cli latest (via npm install -g vidu-cli@latest, Node.js >=14) and point users to references/parameters.md for parameter details.

vidu-skills