skills/beaverslab/beaver-skill/beaver-image-gen

beaver-image-gen

SKILL.md

Image Generation (AI SDK)

Official API-based image generation. Supports OpenAI, Google, DashScope (阿里通义万象) and Replicate providers.

Script Directory

Agent Execution:

  1. SKILL_DIR = this SKILL.md file's directory
  2. Script path = ${SKILL_DIR}/scripts/main.ts

Step 0: Load Preferences ⛔ BLOCKING

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.

Check EXTEND.md existence (priority: project → user):

test -f .beaver-skill/beaver-image-gen/EXTEND.md && echo "project"
test -f "$HOME/.beaver-skill/beaver-image-gen/EXTEND.md" && echo "user"
Result Action
Found Load, parse, apply settings. If default_model.[provider] is null → ask model only (Flow 2)
Not found ⛔ Run first-time setup (below) → Save EXTEND.md → Then continue

CRITICAL: If not found, complete the full setup using AskUserQuestion BEFORE generating any images. Generation is BLOCKED until EXTEND.md is created.

Path Location
.beaver-skill/beaver-image-gen/EXTEND.md Project directory
$HOME/.beaver-skill/beaver-image-gen/EXTEND.md User home

EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models

Schema: references/config/preferences-schema.md

First-Time Setup (when no EXTEND.md found)

Use AskUserQuestion with ALL questions in ONE call:

  1. Default Provider: Google (Recommended) | OpenAI | DashScope | Replicate
  2. Default Model (per selected provider):
    • Google: gemini-3-pro-image-preview (Recommended) | gemini-3.1-flash-image-preview | gemini-3-flash-preview
    • OpenAI: gpt-image-1.5 (Recommended) | gpt-image-1
    • DashScope: z-image-turbo (Recommended) | z-image-ultra
    • Replicate: google/nano-banana-pro (Recommended) | google/nano-banana
  3. Default Quality: 2k (Recommended) | normal
  4. Save Location: Project .beaver-skill/ (Recommended) | User ~/.beaver-skill/

EXTEND.md template:

---
version: 1
default_provider: [selected or null]
default_quality: [selected or null]
default_aspect_ratio: null
default_image_size: null
default_model:
  google: [selected or null]
  openai: null
  dashscope: null
  replicate: null
---

If EXTEND.md exists but default_model.[current_provider] is null → ask ONLY the model question for that provider, then update EXTEND.md.

See references/config/first-time-setup.md for detailed flow diagrams.

Usage

# Basic
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image cat.png

# With aspect ratio
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

# High quality
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

# From prompt files
npx -y bun ${SKILL_DIR}/scripts/main.ts --promptfiles system.md content.md --image out.png

# With reference images (Google multimodal or OpenAI edits)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

# With reference images (explicit provider/model)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

# Specific provider
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

# DashScope (阿里通义万象)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

# Replicate (google/nano-banana-pro)
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Replicate with specific model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Options

Option Description
--prompt <text>, -p Prompt text
--promptfiles <files...> Read prompt from files (concatenated)
--image <path> Output image path (required)
--provider google|openai|dashscope|replicate Force provider (default: google)
--model <id>, -m Model ID (Google: gemini-3-pro-image-preview, gemini-3.1-flash-image-preview; OpenAI: gpt-image-1.5)
--ar <ratio> Aspect ratio (e.g., 16:9, 1:1, 4:3)
--size <WxH> Size (e.g., 1024x1024)
--quality normal|2k Quality preset (default: 2k)
--imageSize 1K|2K|4K Image size for Google (default: from quality)
--ref <files...> Reference images. Supported by Google multimodal (gemini-3-pro-image-preview, gemini-3-flash-preview, gemini-3.1-flash-image-preview) and OpenAI edits (GPT Image models). If provider omitted: Google first, then OpenAI
--n <count> Number of images. Supported: OpenAI (except dall-e-3), Imagen, Replicate. Gemini multimodal and chat completions mode: warns and generates 1
--json JSON output

Environment Variables

Variable Description
OPENAI_API_KEY OpenAI API key
GOOGLE_API_KEY Google API key
DASHSCOPE_API_KEY DashScope API key (阿里云)
REPLICATE_API_TOKEN Replicate API token
OPENAI_IMAGE_MODEL OpenAI model override
GOOGLE_IMAGE_MODEL Google model override
DASHSCOPE_IMAGE_MODEL DashScope model override (default: z-image-turbo)
REPLICATE_IMAGE_MODEL Replicate model override (default: google/nano-banana-pro)
OPENAI_BASE_URL Custom OpenAI endpoint
GOOGLE_BASE_URL Custom Google endpoint
DASHSCOPE_BASE_URL Custom DashScope endpoint
REPLICATE_BASE_URL Custom Replicate endpoint

Load Priority: CLI args > EXTEND.md > env vars > <cwd>/.beaver-skill/.env > ~/.beaver-skill/.env

Model Resolution

Model priority (highest → lowest), applies to all providers:

  1. CLI flag: --model <id>
  2. EXTEND.md: default_model.[provider]
  3. Env var: <PROVIDER>_IMAGE_MODEL (e.g., GOOGLE_IMAGE_MODEL)
  4. Built-in default

EXTEND.md overrides env vars. If both EXTEND.md default_model.google: "gemini-3-pro-image-preview" and env var GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview exist, EXTEND.md wins.

Agent MUST display model info before each generation:

  • Show: Using [provider] / [model]
  • Show switch hint: Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

Replicate Models

Supported model formats:

  • owner/name (recommended for official models), e.g. google/nano-banana-pro
  • owner/name:version (community models by version), e.g. stability-ai/sdxl:<version>

Examples:

# Use Replicate default model
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

# Override model explicitly
npx -y bun ${SKILL_DIR}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Provider Selection

  1. --ref provided + no --provider → auto-select Google first, then OpenAI, then Replicate
  2. --provider specified → use it (if --ref, must be google, openai, or replicate)
  3. Only one API key available → use that provider
  4. Multiple available → default to Google

Quality Presets

Preset Google imageSize OpenAI Size Use Case
normal 1K 1024px Quick previews
2k (default) 2K 2048px Covers, illustrations, infographics

Google imageSize: Can be overridden with --imageSize 1K|2K|4K

Aspect Ratios

Supported: 1:1, 16:9, 9:16, 4:3, 3:4, 2.35:1

  • Google multimodal: uses imageConfig.aspectRatio
  • Google Imagen: uses aspectRatio parameter
  • OpenAI: maps to closest supported size

Generation Mode

Default: Sequential generation (one image at a time). This ensures stable output and easier debugging.

Parallel Generation: Only use when user explicitly requests parallel/concurrent generation.

Mode When to Use
Sequential (default) Normal usage, single images, small batches
Parallel User explicitly requests, large batches (10+)

Parallel Settings (when requested):

Setting Value
Recommended concurrency 4 subagents
Max concurrency 8 subagents
Use case Large batch generation when user requests parallel

Agent Implementation (parallel mode only):

# Launch multiple generations in parallel using Task tool
# Each Task runs as background subagent with run_in_background=true
# Collect results via TaskOutput when all complete

Error Handling

  • Missing API key → error with setup instructions
  • Generation failure → auto-retry once
  • Invalid aspect ratio → warning, proceed with default
  • Reference images with unsupported provider/model → error with fix hint (switch to Google multimodal: gemini-3-pro-image-preview, gemini-3.1-flash-image-preview; or OpenAI GPT Image edits)

Extension Support

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

Weekly Installs
4
First Seen
11 days ago
Installed on
github-copilot4
codex4
kimi-cli4
gemini-cli4
cursor4
amp4