skills/skills.volces.com/image-with-comfyui

image-with-comfyui

SKILL.md

Image with ComfyUI

Call a local ComfyUI server to generate or edit images and videos. Four modes:

T2I (Text → Image) → Z-Image or SD3.5 Medium model
I2I (Image → Image / Edit) → Qwen Image Edit model
I2V (Image → Video) → Wan2.2 model

When to Use

User asks to generate images from text (Chinese: 绘图/生图/画图/生成图片)
User asks to edit an image (Chinese: 修图/改图/编辑图片/换装/换背景)
User asks to generate a video from an image + text (Chinese: 图生视频/动画化/生成视频)
User provides a description and wants visual output

Image-First Conversational Pattern (Image-First Mode)

Detection rules:

User sends only an image (no text, no other message in the same turn)

Installs

5

Source

skills.volces.c…hinejnjn

First Seen

Apr 25, 2026