skills/skills.volces.com/image-with-comfyui

image-with-comfyui

SKILL.md

Image with ComfyUI

Call a local ComfyUI server to generate or edit images and videos. Four modes:

  • T2I (Text → Image) → Z-Image or SD3.5 Medium model
  • I2I (Image → Image / Edit) → Qwen Image Edit model
  • I2V (Image → Video) → Wan2.2 model

When to Use

  • User asks to generate images from text (Chinese: 绘图/生图/画图/生成图片)
  • User asks to edit an image (Chinese: 修图/改图/编辑图片/换装/换背景)
  • User asks to generate a video from an image + text (Chinese: 图生视频/动画化/生成视频)
  • User provides a description and wants visual output

Image-First Conversational Pattern (Image-First Mode)

Detection rules:

  1. User sends only an image (no text, no other message in the same turn)
Installs
5
First Seen
Apr 25, 2026