gemini-imagegen
Gemini Image Generation
Generate and edit images via Google Gemini's native multimodal image generation.
Model Selection
| Model ID | Codename | Best for | Max resolution |
|---|---|---|---|
gemini-2.5-flash-image |
Nano Banana | Fast drafts, high-volume, low-latency | 1K |
gemini-3-pro-image-preview |
Nano Banana Pro | Studio-quality, text rendering, complex prompts | 4K |
Default: gemini-3-pro-image-preview (Pro) unless speed/cost is a concern.
Setup
# Install (once)
# pip install google-genai
from google import genai
import os, base64
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
If GEMINI_API_KEY is missing, instruct the user to set it as an environment variable.
Never ask the user to paste the key in chat.
Text-to-Image
from google import genai
from google.genai import types
import os
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents="A photorealistic cat on a rainbow sofa",
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
),
)
# Extract and save
for part in response.candidates[0].content.parts:
if part.inline_data is not None:
with open("output.png", "wb") as f:
f.write(part.inline_data.data)
break
Aspect Ratio
Set via image_config:
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
image_config=types.ImageConfig(
aspect_ratio="16:9", # for slides / widescreen
),
)
Supported ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
Common choices:
- Slides / presentations →
16:9 - Social media / portraits →
9:16or4:5 - Square thumbnails →
1:1
Image Editing (with reference image)
from google.genai import types
from pathlib import Path
import base64
ref_bytes = Path("input.jpg").read_bytes()
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[
types.Part(inline_data=types.Blob(mime_type="image/jpeg", data=base64.b64encode(ref_bytes).decode())),
types.Part(text="Remove the background and replace with a sunset gradient"),
],
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
),
)
Pro supports up to 14 reference images for multi-image composition and up to 5 human reference images for character/identity consistency.
Batch Generation (for slides)
When generating multiple images (e.g. one per slide), loop sequentially and save with numbered filenames:
import os, time
from google import genai
from google.genai import types
client = genai.Client(api_key=os.environ["GEMINI_API_KEY"])
prompts = [...] # list of prompt strings
for i, prompt in enumerate(prompts, 1):
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=prompt,
config=types.GenerateContentConfig(
response_modalities=["TEXT", "IMAGE"],
image_config=types.ImageConfig(aspect_ratio="16:9"),
),
)
for part in response.candidates[0].content.parts:
if part.inline_data is not None:
with open(f"slide_{i}.png", "wb") as f:
f.write(part.inline_data.data)
break
time.sleep(1) # rate limit courtesy
Error Handling
- Safety filter block: The model may refuse prompts it deems unsafe. Adjust the prompt to be less ambiguous (remove violent/adult/medical imagery language) and retry.
- Empty response: If
response.candidatesis empty or has no image parts, the prompt may be too vague. Add concrete scene details and retry. - Rate limit (429): Back off with exponential delay. Default:
time.sleep(2 ** attempt). - Timeout: Set a reasonable timeout; Pro model may take 10–30s for complex prompts.
Prompt Best Practices
- Structure: scene → subject → style → composition → constraints
- Always specify art style: "flat vector illustration", "watercolor painting", "3D render", "photorealistic photograph"
- Include lighting and mood: "soft diffused lighting", "dramatic rim light", "golden hour"
- For text in images: quote exact text, specify font style and placement
- For slide illustrations: add "negative space on [side]" to leave room for text overlay
- Use English prompts even for non-English content (better generation quality)
- Keep prompts under 500 words; be specific but not verbose
Style Consistency for Multi-Image Sets
When generating a series (e.g. slide deck), prepend a style prefix to every prompt:
Style prefix: "flat vector illustration, soft pastel color palette, clean lines, minimal detail, 16:9 widescreen"
Slide 1 prompt: "{style_prefix}, a wide establishing shot of a modern office building at sunrise"
Slide 2 prompt: "{style_prefix}, a close-up of hands typing on a laptop keyboard"
This ensures visual coherence across all generated images.
More from moosegoose0701/skill-compose
media-downloader
Download videos and audio from 1500+ websites including YouTube, Bilibili, TikTok, Twitter/X, Instagram, Vimeo, and more using yt-dlp. Use when the user wants to download videos, save media from social platforms, download with specific resolution (720p/1080p/4K), get subtitles, or download entire playlists. Triggers on requests like 'download this video', 'save this YouTube video', 'download in 1080p', 'download with subtitles', 'download this playlist'.
35skill-creator
Guide for creating effective skills. This skill should be used when users want to create a new skill (or update an existing skill) that extends Claude's capabilities with specialized knowledge, workflows, or tool integrations.
30skill-evolver
Analyze skill execution traces to identify issues and automatically evolve/improve skills. Use when users provide trace files (JSON) from skill runs and want to improve skill performance based on real execution data. Triggers on requests like "analyze traces", "evolve skill based on traces", "improve skill from execution history", "find issues in skill traces", or when working with skill trace/log files.
29audio-extractor
Extract audio from videos and download audio-only content from 1500+ websites using yt-dlp. Converts to MP3, M4A, FLAC, WAV, or OPUS with embedded metadata and cover art. Use when the user wants to extract audio from videos, download podcasts, download music from YouTube/SoundCloud/Bandcamp, convert video to audio, or batch download playlist audio. Triggers on requests like 'extract audio', 'download as MP3', 'get the audio from this video', 'download this podcast', 'download music', 'convert to FLAC'.
26skills-planner
Plan which skills are needed to fulfill user requirements. Use when the user wants to design an agent workflow, plan skill composition, or determine what skills are needed for a task. Input includes user requirements and existing skills list. Output includes recommended existing skills, new skills to create, and a system prompt for the composed agent.
23imagegen
Use when the user asks to generate or edit images via the OpenAI Image API (for example: generate image, edit/inpaint/mask, background removal or replacement, transparent background, product shots, concept art, covers, or batch variants); run the bundled CLI (`scripts/image_gen.py`) and require `OPENAI_API_KEY` for live calls.
23