image-gen

SKILL.md

/image-gen — AI Image Generation

Generate images from text prompts using Gemini Nano Banana Pro or Ideogram V3, directly from Claude Code.

Environment Variables Required

Must be set in ~/.zshrc or ~/.zshenv:

GEMINI_API_KEY=your-gemini-api-key
IDEOGRAM_API_KEY=your-ideogram-api-key

Providers

Provider Model Best For
gemini Nano Banana Pro (gemini-3-pro-image-preview) Highest quality, text rendering, complex scenes
ideogram Ideogram V3 Typography, stylized art, preset styles, color palettes, character consistency

Output Organization

All generated images are saved to ~/image-gen/ organized by category:

~/image-gen/
  logos/           — logos, icons, brand marks
  mockups/         — app screens, website mockups, UI designs
  social-media/    — social media posts, ads, banners
  photos/          — photorealistic images, portraits, scenes
  illustrations/   — artwork, illustrations, artistic pieces
  marketing/       — flyers, posters, brochures, print materials
  misc/            — anything that doesn't fit above

Rules:

  • Determine the appropriate category from the prompt. If none fit, create a descriptive new folder.
  • If the folder doesn't exist yet, create it with mkdir -p.
  • The user can override with --output <path>.

File Naming

Name files descriptively based on the prompt content, using kebab-case. Append a short timestamp suffix to avoid collisions.

Examples:

  • "logo for OT With Miss Lexi" → ot-with-miss-lexi-logo-1403.png
  • "sunset over mountains" → sunset-over-mountains-1403.png
  • 3 variations → ot-with-miss-lexi-logo-1403-1.png, ...-2.png, ...-3.png

The timestamp suffix is just HHMM from the current time. Keep filenames readable.

Process

Step 1: Parse Input

Extract from $ARGUMENTS:

Explicit flags (user-specified):

  • Prompt: The image description (everything that isn't a flag)
  • Provider: --gemini or --ideogram
  • Aspect ratio: --square, --landscape, --portrait, --wide, or --aspect 3:4
  • Output path: --output <path> (overrides default category folder)

Ideogram-only flags (see references/ideogram-options.md for complete lists):

  • Count: --count N — generate 1-4 images at once
  • Style type: --style-type <REALISTIC|DESIGN|FICTION|GENERAL|AUTO>
  • Style preset: --style <PRESET_NAME> (e.g. OIL_PAINTING, WATERCOLOR, FLAT_VECTOR)
  • Color palette preset: --palette <PRESET> (e.g. EMBER, FRESH, JUNGLE)
  • Custom colors: --colors "#hex1,#hex2,#hex3" — custom color palette
  • Negative prompt: --no "things to exclude"
  • Speed: --speed <FLASH|TURBO|DEFAULT|QUALITY>
  • Magic prompt: --magic <ON|OFF|AUTO> — auto-enhance the prompt
  • Character reference: --char-ref <path/to/image> — maintain character consistency
  • Style reference: --style-ref <path/to/image> — use an image as style guide

Step 2: Clarify Before Generating

IMPORTANT: Never assume provider, style, palette, or count. Always ask when not explicitly specified.

If $ARGUMENTS is empty, ask what image the user wants.

If the user provided a prompt but did NOT explicitly specify --gemini or --ideogram (and other options), use AskUserQuestion to clarify. Ask up to 2-3 focused questions based on what's missing. Tailor questions to the type of image:

Example questions for a logo request:

question: "Which provider should I use?"
options:
  - Ideogram (better for logos, has style presets like FLAT_VECTOR)
  - Gemini Pro (highest overall quality)

question: "Any style or color preferences?"
options:
  - Clean flat vector (FLAT_VECTOR preset)
  - Minimal illustration (MINIMAL_ILLUSTRATION preset)
  - Let me specify colors (will ask for hex values)
  - No preference, surprise me

Example questions for a photo request:

question: "Which provider should I use?"
options:
  - Gemini Pro (best for photorealism)
  - Ideogram with REALISTIC style

question: "What aspect ratio?"
options:
  - Square (1:1)
  - Landscape (16:9)
  - Portrait (9:16)

Skip questions when the user has been explicit. If they said --ideogram --style WATERCOLOR --palette PASTEL, just run it. Only ask about things they left open.

Step 3: Verify API Keys

# Check the chosen provider's key exists
if [ "$PROVIDER" = "gemini" ] && [ -z "$GEMINI_API_KEY" ]; then
  echo "GEMINI_API_KEY not set. Falling back to Ideogram."
  PROVIDER="ideogram"
fi
if [ "$PROVIDER" = "ideogram" ] && [ -z "$IDEOGRAM_API_KEY" ]; then
  echo "IDEOGRAM_API_KEY not set. Falling back to Gemini."
  PROVIDER="gemini"
fi

Step 4: Setup Output

# Determine category folder from prompt context
OUTPUT_DIR="${OUTPUT_PATH:-$HOME/image-gen/${CATEGORY}}"
mkdir -p "$OUTPUT_DIR"

# Generate descriptive filename
TIMESTAMP=$(date +%H%M)
# FILENAME should be a kebab-case description derived from the prompt

Step 5a: Generate — Gemini Nano Banana Pro

Model: gemini-3-pro-image-preview (always Pro, best quality)

Aspect ratio mapping:

  • square → omit (default)
  • landscape → "aspectRatio": "16:9"
  • portrait → "aspectRatio": "9:16"
  • wide → "aspectRatio": "21:9"

Build and send:

IMAGE_CONFIG=""
if [ -n "$ASPECT_RATIO" ]; then
  IMAGE_CONFIG="\"imageConfig\": {\"aspectRatio\": \"${ASPECT_RATIO}\"},"
fi

RESPONSE=$(curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
  -H "x-goog-api-key: ${GEMINI_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{
    \"contents\": [{\"parts\": [{\"text\": \"Generate an image: ${PROMPT}\"}]}],
    \"generationConfig\": {
      ${IMAGE_CONFIG}
      \"responseModalities\": [\"TEXT\", \"IMAGE\"]
    }
  }")

Extract and save:

echo "$RESPONSE" | python3 -c "
import sys, json, base64
resp = json.load(sys.stdin)
if 'error' in resp:
    print('ERROR:', resp['error'].get('message', resp['error']))
    sys.exit(1)
for part in resp.get('candidates', [{}])[0].get('content', {}).get('parts', []):
    if 'inlineData' in part:
        img_data = base64.b64decode(part['inlineData']['data'])
        mime = part['inlineData']['mimeType']
        ext = 'png' if 'png' in mime else 'jpg'
        path = sys.argv[1] + '.' + ext
        with open(path, 'wb') as f:
            f.write(img_data)
        print(path)
        break
else:
    for part in resp.get('candidates', [{}])[0].get('content', {}).get('parts', []):
        if 'text' in part:
            print('API text:', part['text'])
    print('ERROR: No image in response')
    sys.exit(1)
" "${OUTPUT_DIR}/${FILENAME}"

Step 5b: Generate — Ideogram V3

Aspect ratio mapping:

  • square → 1x1
  • landscape → 16x9
  • portrait → 9x16
  • wide → 21x9
  • All supported values: 1x3, 3x1, 1x2, 2x1, 9x16, 16x9, 10x16, 16x10, 2x3, 3x2, 3x4, 4x3, 4x5, 5x4, 1x1

Two request modes:

Mode A: JSON request (no reference images)

When NO --char-ref or --style-ref flags are provided, use a standard JSON request.

Build the request body with python3, including only the parameters the user specified:

REQUEST_BODY=$(python3 << 'PYEOF'
import json

body = {"prompt": PROMPT_VAR}
body["rendering_speed"] = SPEED_VAR or "DEFAULT"
body["style_type"] = STYLE_TYPE_VAR or "AUTO"

# Only include optional fields when specified
if ASPECT_VAR:
    body["aspect_ratio"] = ASPECT_VAR
if COUNT_VAR and int(COUNT_VAR) > 1:
    body["num_images"] = int(COUNT_VAR)
if STYLE_PRESET_VAR:
    body["style_preset"] = STYLE_PRESET_VAR
if NEGATIVE_PROMPT_VAR:
    body["negative_prompt"] = NEGATIVE_PROMPT_VAR
if MAGIC_VAR:
    body["magic_prompt"] = MAGIC_VAR
if PALETTE_PRESET_VAR:
    body["color_palette"] = {"name": PALETTE_PRESET_VAR}
if CUSTOM_COLORS_VAR:
    members = [{"color_hex": c.strip()} for c in CUSTOM_COLORS_VAR.split(",")]
    body["color_palette"] = {"members": members}

print(json.dumps(body))
PYEOF
)

RESPONSE=$(curl -s -X POST \
  "https://api.ideogram.ai/v1/ideogram-v3/generate" \
  -H "Api-Key: ${IDEOGRAM_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "$REQUEST_BODY")

Mode B: Multipart request (with reference images)

When --char-ref or --style-ref is provided, use multipart/form-data.

Character reference (--char-ref <path>): Maintains consistent facial features, hairstyles, and traits across images. Max 1 image, 10MB, JPEG/PNG/WebP.

Style reference (--style-ref <path>): Applies the artistic style from a reference image. Up to multiple images, 10MB total, JPEG/PNG/WebP. Cannot be combined with --style-type.

# Build the curl command dynamically
CURL_CMD="curl -s -X POST https://api.ideogram.ai/v1/ideogram-v3/generate"
CURL_CMD="$CURL_CMD -H 'Api-Key: ${IDEOGRAM_API_KEY}'"
CURL_CMD="$CURL_CMD -F 'prompt=${PROMPT}'"
CURL_CMD="$CURL_CMD -F 'rendering_speed=${SPEED:-DEFAULT}'"

if [ -n "$ASPECT_RATIO" ]; then
  CURL_CMD="$CURL_CMD -F 'aspect_ratio=${ASPECT_RATIO}'"
fi
if [ -n "$STYLE_TYPE" ]; then
  CURL_CMD="$CURL_CMD -F 'style_type=${STYLE_TYPE}'"
fi
if [ -n "$STYLE_PRESET" ]; then
  CURL_CMD="$CURL_CMD -F 'style_preset=${STYLE_PRESET}'"
fi
if [ -n "$COUNT" ]; then
  CURL_CMD="$CURL_CMD -F 'num_images=${COUNT}'"
fi
if [ -n "$NEGATIVE_PROMPT" ]; then
  CURL_CMD="$CURL_CMD -F 'negative_prompt=${NEGATIVE_PROMPT}'"
fi
if [ -n "$MAGIC_PROMPT" ]; then
  CURL_CMD="$CURL_CMD -F 'magic_prompt=${MAGIC_PROMPT}'"
fi
if [ -n "$CHAR_REF_PATH" ]; then
  CURL_CMD="$CURL_CMD -F 'character_reference_images=@${CHAR_REF_PATH}'"
fi
if [ -n "$STYLE_REF_PATH" ]; then
  CURL_CMD="$CURL_CMD -F 'style_reference_images=@${STYLE_REF_PATH}'"
fi

RESPONSE=$(eval $CURL_CMD)

Download the generated images:

python3 -c "
import sys, json, urllib.request

resp = json.load(sys.stdin)
if 'error' in resp or 'data' not in resp:
    print('ERROR:', json.dumps(resp, indent=2))
    sys.exit(1)

paths = []
for i, item in enumerate(resp['data']):
    url = item.get('url')
    if not url:
        print(f'Image {i+1}: flagged as unsafe, skipped')
        continue
    suffix = f'-{i+1}' if len(resp['data']) > 1 else ''
    path = f'${OUTPUT_DIR}/${FILENAME}{suffix}.png'
    urllib.request.urlretrieve(url, path)
    paths.append(path)
    print(path)

if not paths:
    print('ERROR: No images generated')
    sys.exit(1)
" <<< "$RESPONSE"

Step 6: Open the Image(s)

for f in ${OUTPUT_DIR}/${FILENAME}*.{png,jpg}; do
  [ -f "$f" ] && open "$f"
done

Step 7: Report Results

Tell the user:

  • Provider and model used
  • File path(s) where images were saved
  • Any options applied (style, palette, character ref, etc.)
  • Offer to: regenerate, try the other provider, adjust prompt, change style/palette

Usage Examples

/image-gen a cozy cabin in the mountains at sunset
/image-gen minimalist logo for a coffee shop --ideogram --style FLAT_VECTOR
/image-gen photorealistic golden retriever puppy --gemini --portrait
/image-gen retro spaceship --ideogram --style 80S_ILLUSTRATION --palette EMBER
/image-gen product photo of headphones on marble --gemini --wide
/image-gen dreamy forest --ideogram --style WATERCOLOR --colors "#2d5a27,#8fbc8f,#f5f5dc"
/image-gen 3 logo variations --ideogram --count 3 --style FLAT_VECTOR
/image-gen noir detective --ideogram --style DRAMATIC_CINEMA --no "bright colors, daylight"
/image-gen same character in a new scene --ideogram --char-ref ~/photos/character.png
/image-gen painting in this style --ideogram --style-ref ~/photos/reference-art.jpg

Error Handling

  • Rate limited: Wait 5 seconds and retry once
  • Missing API key: Tell user which env var to set
  • Empty response / no image: Show raw API response for debugging
  • Ideogram safety filter: Inform user, suggest rephrasing
  • Character ref too large: Must be under 10MB, JPEG/PNG/WebP only
  • Incompatible options: --style-ref cannot combine with --style-type; --colors overrides --palette

Always show the raw API error if generation fails.

Weekly Installs
3
GitHub Stars
1
First Seen
4 days ago
Installed on
opencode3
gemini-cli3
claude-code3
github-copilot3
codex3
amp3