baoyu-image-gen
Image Generation (AI SDK)
Official API-based image generation via AI SDK. Supports OpenAI (DALL-E, GPT Image) and Google (Imagen, Gemini multimodal).
Script Directory
Important: All scripts are located in the scripts/ subdirectory of this skill.
Agent Execution Instructions:
- Determine this SKILL.md file's directory path as
SKILL_DIR - Script path =
${SKILL_DIR}/scripts/<script-name>.ts - Replace all
${SKILL_DIR}in this document with the actual path
Script Reference:
| Script | Purpose |
|---|---|
scripts/main.ts |
CLI entry point for image generation |
Quick Start
# Basic generation (auto-detect provider)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image landscape.png --ar 16:9
# High quality (2k)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --provider openai
# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png
# With reference images (Google multimodal only)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
Commands
Basic Image Generation
# Generate with prompt
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A sunset over mountains" --image sunset.png
# Shorthand
npx -y bun ${SKILL_DIR}/scripts/main.ts -p "A cute robot" --image robot.png
Aspect Ratios
# Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A portrait" --image portrait.png --ar 3:4
# Or specify exact size
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Banner" --image banner.png --size 1792x1024
Reference Images (Google Multimodal)
# Image editing with reference
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make it blue" --image blue.png --ref original.png
# Multiple references
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Combine these styles" --image out.png --ref a.png b.png
Quality Presets
# Normal quality (default)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality normal
# High quality (2k resolution)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --quality 2k
Output Formats
# Plain output (prints saved path)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png
# JSON output
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png --json
Options
| Option | Description |
|---|---|
--prompt <text>, -p |
Prompt text |
--promptfiles <files...> |
Read prompt from files (concatenated) |
--image <path> |
Output image path (required) |
--provider google|openai |
Force provider (default: google) |
--model <id>, -m |
Model ID |
--ar <ratio> |
Aspect ratio (e.g., 16:9, 1:1, 4:3) |
--size <WxH> |
Size (e.g., 1024x1024) |
--quality normal|2k |
Quality preset (default: normal) |
--ref <files...> |
Reference images (Google multimodal only) |
--n <count> |
Number of images |
--json |
JSON output |
--help, -h |
Show help |
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | - |
GOOGLE_API_KEY |
Google API key | - |
OPENAI_IMAGE_MODEL |
OpenAI model | gpt-image-1.5 |
GOOGLE_IMAGE_MODEL |
Google model | gemini-3-pro-image-preview |
OPENAI_BASE_URL |
Custom OpenAI endpoint | - |
GOOGLE_BASE_URL |
Custom Google endpoint | - |
Load Priority: CLI args > process.env > <cwd>/.baoyu-skills/.env > ~/.baoyu-skills/.env
Provider & Model Strategy
Auto-Selection
- If
--providerspecified → use it - If only one API key available → use that provider
- If both available → default to Google (multimodal LLMs more versatile)
API Selection by Model Type
| Model Category | API Function | Example Models |
|---|---|---|
| Google Multimodal | generateText |
gemini-2.0-flash-exp-image-generation |
| Google Imagen | experimental_generateImage |
imagen-3.0-generate-002 |
| OpenAI | experimental_generateImage |
gpt-image-1, dall-e-3 |
Available Models
Google:
gemini-3-pro-image-preview- Default, multimodal generationgemini-2.0-flash-exp-image-generation- Gemini 2.0 Flashimagen-3.0-generate-002- Imagen 3
OpenAI:
gpt-image-1.5- Default, GPT Image 1.5gpt-image-1- GPT Image 1dall-e-3- DALL-E 3
Quality Presets
| Preset | OpenAI | Use Case | |
|---|---|---|---|
normal |
1024x1024 | Default | Covers, illustrations |
2k |
2048x2048 | "2048px" in prompt | Infographics, slides |
Aspect Ratio Handling
- Multimodal LLMs: Embedded in prompt (e.g.,
"... aspect ratio 16:9") - Image-only models: Uses
aspectRatioorsizeparameter - Common ratios: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1
Examples
Generate Cover Image
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "A minimalist tech illustration with blue gradients" \
--image cover.png --ar 2.35:1 --quality 2k
Generate Social Media Post
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Instagram post about coffee" \
--image post.png --ar 1:1
Edit Image with Reference
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--prompt "Change the background to sunset" \
--image edited.png --ref original.png --provider google
Batch Generation from Prompt File
# Create prompt file with detailed instructions
npx -y bun ${SKILL_DIR}/scripts/main.ts \
--promptfiles style-guide.md scene-description.md \
--image scene.png
Error Handling
- Missing API key: Clear error with setup instructions
- Generation failure: Auto-retry once, then error
- Invalid aspect ratio: Warning, proceed with default
- Reference images with image-only model: Warning, ignore refs
Extension Support
Custom configurations via EXTEND.md.
Check paths (priority order):
.baoyu-skills/baoyu-image-gen/EXTEND.md(project)~/.baoyu-skills/baoyu-image-gen/EXTEND.md(user)
If found, load before workflow. Extension content overrides defaults.
More from smallnest/langgraphgo
wechat-article-writer
专业的微信公众号文章创作助手。当用户提供网站链接、文本素材或图像,需要创作微信公众号文章时使用。支持通过搜索工具丰富内容、优化标题、调整语气为官方文案风格,帮助创作高质量的公众号推文。适用于企业宣传、品牌推广、资讯报道等官方内容创作场景。
119coze-api
调用扣子(Coze)智能体 API 进行对话、工作流执行等操作。当用户需要集成 Coze 智能体、调用 Coze API、或开发 Coze 相关应用时使用。支持流式和非流式对话、工作流调用等功能。
109baoyu-comic
Knowledge comic creator supporting multiple styles (Logicomix/Ligne Claire, Ohmsha manga guide). Creates original educational comics with detailed panel layouts and sequential image generation. Use when user asks to create "知识漫画", "教育漫画", "biography comic", "tutorial comic", or "Logicomix-style comic".
68ui-analyzer
Analyze UI design screenshots and generate React components with TypeScript and Tailwind CSS. Use this skill when the user provides UI mockups, design screenshots, or Figma exports and requests implementation. Provides detailed layout analysis, component breakdown, design token extraction, and production-ready code generation following best practices.
49flutter-api
Comprehensive Flutter API reference guide covering widgets, Material Design, Cupertino, animations, gestures, navigation, state management, and platform integration. Use when developing Flutter applications and needing detailed API knowledge for widgets, layout, styling, animations, platform channels, or any Flutter SDK functionality. Essential for building cross-platform mobile, web, and desktop applications with Flutter.
45tech-article-writer
技术干货文章创作专家,擅长用简练语言传递核心知识点,不遗漏关键信息
40