baoyu-image-gen
Image Generation (AI SDK)
Official API-based image generation. Supports OpenAI, Google and DashScope (阿里通义万象) providers.
Script Directory
Agent Execution:
SKILL_DIR= this SKILL.md file's directory- Script path =
${SKILL_DIR}/scripts/main.ts
Preferences (EXTEND.md)
Use Bash to check EXTEND.md existence (priority order):
# Check project-level first
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "project"
# Then user-level (cross-platform: $HOME works on macOS/Linux/WSL)
test -f "$HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md" && echo "user"
┌──────────────────────────────────────────────────┬───────────────────┐ │ Path │ Location │ ├──────────────────────────────────────────────────┼───────────────────┤ │ .baoyu-skills/baoyu-image-gen/EXTEND.md │ Project directory │ ├──────────────────────────────────────────────────┼───────────────────┤ │ $HOME/.baoyu-skills/baoyu-image-gen/EXTEND.md │ User home │ └──────────────────────────────────────────────────┴───────────────────┘
┌───────────┬───────────────────────────────────────────────────────────────────────────┐ │ Result │ Action │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Found │ Read, parse, apply settings │ ├───────────┼───────────────────────────────────────────────────────────────────────────┤ │ Not found │ Use defaults │ └───────────┴───────────────────────────────────────────────────────────────────────────┘
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models
Schema: references/config/preferences-schema.md
Usage
# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal or OpenAI edits)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
# With reference images (explicit provider/model)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
# DashScope (阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
Options
| Option | Description |
|---|---|
--prompt <text>, -p |
Prompt text |
--promptfiles <files...> |
Read prompt from files (concatenated) |
--image <path> |
Output image path (required) |
--provider google|openai|dashscope |
Force provider (default: google) |
--model <id>, -m |
Model ID (--ref with OpenAI requires GPT Image model, e.g. gpt-image-1.5) |
--ar <ratio> |
Aspect ratio (e.g., 16:9, 1:1, 4:3) |
--size <WxH> |
Size (e.g., 1024x1024) |
--quality normal|2k |
Quality preset (default: 2k) |
--imageSize 1K|2K|4K |
Image size for Google (default: from quality) |
--ref <files...> |
Reference images. Supported by Google multimodal and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI |
--n <count> |
Number of images |
--json |
JSON output |
Environment Variables
| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key |
GOOGLE_API_KEY |
Google API key |
DASHSCOPE_API_KEY |
DashScope API key (阿里云) |
OPENAI_IMAGE_MODEL |
OpenAI model override |
GOOGLE_IMAGE_MODEL |
Google model override |
DASHSCOPE_IMAGE_MODEL |
DashScope model override (default: z-image-turbo) |
OPENAI_BASE_URL |
Custom OpenAI endpoint |
GOOGLE_BASE_URL |
Custom Google endpoint |
DASHSCOPE_BASE_URL |
Custom DashScope endpoint |
Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
Provider Selection
--refprovided + no--provider→ auto-select Google first, then OpenAI--providerspecified → use it (if--ref, must begoogleoropenai)- Only one API key available → use that provider
- Multiple available → default to Google
Quality Presets
| Preset | Google imageSize | OpenAI Size | Use Case |
|---|---|---|---|
normal |
1K | 1024px | Quick previews |
2k (default) |
2K | 2048px | Covers, illustrations, infographics |
Google imageSize: Can be overridden with --imageSize 1K|2K|4K
Aspect Ratios
Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
- Google multimodal: uses
imageConfig.aspectRatio - Google Imagen: uses
aspectRatioparameter - OpenAI: maps to closest supported size
Generation Mode
Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.
Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.
| Mode | When to Use |
|---|---|
| Sequential (default) | Normal usage, single images, small batches |
| Parallel | User explicitly requests, large batches (10+) |
Parallel Settings (when requested):
| Setting | Value |
|---|---|
| Recommended concurrency | 4 subagents |
| Max concurrency | 8 subagents |
| Use case | Large batch generation when user requests parallel |
Agent Implementation (parallel mode only):
# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete
Error Handling
- Missing API key → error with setup instructions
- Generation failure → auto-retry once
- Invalid aspect ratio → warning, proceed with default
- Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal or OpenAI GPT Image edits)
Extension Support
Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
More from azure12355/weilan-skills
browser-agent
AI 驱动的浏览器自动化工具集,包含 agent-browser(无障碍树提取)、actionbook(50+ 网站自动化食谱)、browser-use(Python 自动化库)。使用场景:(1) 抓取需要 JS 渲染的网页内容 (2) 从 X/Twitter、GitHub、Reddit 等平台获取数据 (3) 截图网页 (4) 自动化浏览器操作 (5) 获取网页的无障碍树结构。当用户需要访问动态网页、绕过反爬虫、或执行浏览器自动化时使用此技能。
25drawio-diagrams
专业的 DrawIO 图表生成工具,使用 Material Design 配色和圆角矩形风格。支持 (1) 算法/数据结构图 - DP 状态转移、递归树、排序过程、双指针/滑动窗口 (2) 架构图 - 系统架构、微服务、网络拓扑、组件依赖 (3) 流程图/时序图 - 业务流程、决策流程、审批流程 (4) UML/ER 图 - 类图、实体关系、用例图。当用户提到 "drawio"、"draw.io"、需要绘制流程图、架构图、UML 图、ER 图、DP 状态图、算法可视化时使用此技能。
6github-researcher
GitHub 开源项目深度调研工具。在 GitHub 上搜索、分析特定领域的开源项目,汇总生成结构化调研报告。触发场景:用户要求"调研 GitHub 上的 XXX 工具"、"搜索 XXX 开源项目"、"汇总 GitHub 仓库"、"找 XXX 的开源替代方案"、"对比 GitHub 上的 XXX 项目"、或需要批量分析开源项目并输出报告时使用此 skill。
5diagram-prompter
分析代码库结构并生成各种架构图、流程图、时序图等的 AI 绘图提示词。使用场景:当用户需要为任何代码项目生成可视化图表时,包括系统架构图、模块依赖关系、数据流图、时序图、状态机图、部署架构图等。支持多种图表类型如 Mermaid、PlantUML、C4 模型、UML 类图、ER 图等。适用于技术文档编写、架构设计、代码评审、学习理解新项目等场景。
4yt-dlp-downloader
下载视频和音频的通用工具。支持 YouTube、Bilibili、Twitter/X、抖音、快手等数千个网站。当用户提供视频链接时自动下载到 ~/Downloads 文件夹。
4technical-writer
|
4