image-generator
Image Generator
This skill generates and edits images using Google's Gemini Nano Banana Pro model (gemini-3-pro-image-preview).
IMPORTANT: Setup Required
Before using this skill, the user must set the GEMINI_API_KEY environment variable:
- Get a free API key from Google AI Studio
- Export the key in your shell profile (
~/.zshrc,~/.bashrc, etc.):export GEMINI_API_KEY="your_api_key_here" - Restart your terminal or run
source ~/.zshrc(or~/.bashrc)
The skill will not work without this configuration.
Pre-flight Check
Before making any API call, verify the key is set:
if [ -z "$GEMINI_API_KEY" ]; then
echo "ERROR: GEMINI_API_KEY is not set. Please export it in your shell profile."
exit 1
fi
If the key is missing, stop and tell the user to set it using the instructions above.
Configuration
Model: gemini-3-pro-image-preview
API Key: Read from the GEMINI_API_KEY environment variable
Iterating on User-Provided Images
When the user provides a path to an image they want to edit or iterate on, use this workflow:
Step 1: Read and encode the image to base64
# Get the image path from user
IMG_PATH="/path/to/user/image.png"
# Detect mime type
if [[ "$IMG_PATH" == *.png ]]; then
MIME_TYPE="image/png"
elif [[ "$IMG_PATH" == *.jpg ]] || [[ "$IMG_PATH" == *.jpeg ]]; then
MIME_TYPE="image/jpeg"
elif [[ "$IMG_PATH" == *.webp ]]; then
MIME_TYPE="image/webp"
else
MIME_TYPE="image/png"
fi
# Encode to base64 (works on both macOS and Linux)
if [[ "$(uname)" == "Darwin" ]]; then
IMG_BASE64=$(base64 -i "$IMG_PATH")
else
IMG_BASE64=$(base64 -w0 "$IMG_PATH")
fi
Step 2: Send image with edit prompt (File-Based Approach)
IMPORTANT: Always use a file-based approach for the request body. Base64-encoded images are too large for command-line arguments and will cause "argument list too long" errors.
# User's edit request
EDIT_PROMPT="Add a santa hat to the person in this image"
# Write request to a JSON file (avoids command line length limits)
cat > /tmp/gemini_request.json << JSONEOF
{
"contents": [{
"parts": [
{"text": "$EDIT_PROMPT"},
{
"inline_data": {
"mime_type": "$MIME_TYPE",
"data": "$IMG_BASE64"
}
}
]
}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
}
JSONEOF
# Call the API using the file
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
Step 3: Extract and save the edited image
# Extract image from response and save
python3 -c "
import json
import base64
with open('/tmp/gemini_response.json') as f:
data = json.load(f)
for part in data['candidates'][0]['content']['parts']:
if 'inlineData' in part:
img_data = part['inlineData']['data']
mime = part['inlineData']['mimeType']
ext = 'png' if 'png' in mime else 'jpg'
with open('edited_image.' + ext, 'wb') as out:
out.write(base64.b64decode(img_data))
print(f'Saved: edited_image.{ext}')
elif 'text' in part:
print(part['text'])
"
Complete Example (File-Based)
For iterating on images, always use file-based requests:
# Variables
IMG_PATH="/path/to/image.png"
EDIT_PROMPT="Make the background a sunset beach"
OUTPUT_PATH="edited_output.png"
# Detect mime type and encode
MIME_TYPE=$([[ "$IMG_PATH" == *.png ]] && echo "image/png" || echo "image/jpeg")
IMG_BASE64=$(base64 -i "$IMG_PATH" 2>/dev/null || base64 -w0 "$IMG_PATH")
# Write request to file (required - base64 images are too large for command line)
cat > /tmp/gemini_request.json << JSONEOF
{
"contents": [{
"parts": [
{"text": "$EDIT_PROMPT"},
{"inline_data": {"mime_type": "$MIME_TYPE", "data": "$IMG_BASE64"}}
]
}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"]
}
}
JSONEOF
# Call API and extract image
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
# Save the output image
python3 -c "
import json, base64
with open('/tmp/gemini_response.json') as f:
data = json.load(f)
for part in data.get('candidates', [{}])[0].get('content', {}).get('parts', []):
if 'inlineData' in part:
with open('$OUTPUT_PATH', 'wb') as f:
f.write(base64.b64decode(part['inlineData']['data']))
print('Saved: $OUTPUT_PATH')
"
Multi-Image Input (Combine/Compose)
To combine elements from multiple images (also uses file-based approach):
IMG1_PATH="/path/to/image1.png"
IMG2_PATH="/path/to/image2.png"
PROMPT="Put the dress from the first image on the person in the second image"
IMG1_BASE64=$(base64 -i "$IMG1_PATH" 2>/dev/null || base64 -w0 "$IMG1_PATH")
IMG2_BASE64=$(base64 -i "$IMG2_PATH" 2>/dev/null || base64 -w0 "$IMG2_PATH")
# Write request to file
cat > /tmp/gemini_request.json << JSONEOF
{
"contents": [{
"parts": [
{"text": "$PROMPT"},
{"inline_data": {"mime_type": "image/png", "data": "$IMG1_BASE64"}},
{"inline_data": {"mime_type": "image/png", "data": "$IMG2_BASE64"}}
]
}],
"generationConfig": {"responseModalities": ["TEXT", "IMAGE"]}
}
JSONEOF
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d @/tmp/gemini_request.json > /tmp/gemini_response.json
Capabilities
Text-to-Image Generation
- Generate high-quality images from text descriptions
- Support for photorealistic, stylized, and artistic outputs
- Accurate text rendering in images (logos, infographics, diagrams)
Image Editing
- Add or remove elements from images
- Inpainting with semantic masking (edit specific parts)
- Style transfer (apply artistic styles to photos)
- Multi-image composition (combine elements from multiple images)
Advanced Features
- High Resolution: 1K, 2K, or 4K output
- Aspect Ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
- Google Search Grounding: Generate images based on real-time data
- Multi-turn Editing: Iteratively refine images through conversation
- Up to 14 Reference Images: Combine multiple inputs for complex compositions
API Usage
Basic Text-to-Image (Python)
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=["Your prompt here"],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
aspect_ratio="16:9", # Optional
image_size="2K" # Optional: "1K", "2K", "4K"
)
)
)
for part in response.parts:
if part.text is not None:
print(part.text)
elif part.inline_data is not None:
image = part.as_image()
image.save("generated_image.png")
Basic Text-to-Image (JavaScript)
import { GoogleGenAI } from "@google/genai";
import * as fs from "node:fs";
const ai = new GoogleGenAI({});
const response = await ai.models.generateContent({
model: "gemini-3-pro-image-preview",
contents: "Your prompt here",
config: {
responseModalities: ['TEXT', 'IMAGE'],
imageConfig: {
aspectRatio: "16:9",
imageSize: "2K"
}
}
});
for (const part of response.candidates[0].content.parts) {
if (part.text) {
console.log(part.text);
} else if (part.inlineData) {
const buffer = Buffer.from(part.inlineData.data, "base64");
fs.writeFileSync("generated_image.png", buffer);
}
}
REST API (curl)
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent" \
-H "x-goog-api-key: $GEMINI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{
"parts": [{"text": "Your prompt here"}]
}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageConfig": {
"aspectRatio": "16:9",
"imageSize": "2K"
}
}
}' | jq -r '.candidates[0].content.parts[] | select(.inlineData) | .inlineData.data' | base64 --decode > output.png
Image Editing (with input image)
from google import genai
from google.genai import types
from PIL import Image
client = genai.Client()
input_image = Image.open('input.png')
prompt = "Add a wizard hat to the cat in this image"
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[prompt, input_image],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE']
)
)
for part in response.parts:
if part.inline_data is not None:
image = part.as_image()
image.save("edited_image.png")
Multi-Image Composition
from google import genai
from google.genai import types
from PIL import Image
client = genai.Client()
image1 = Image.open('dress.png')
image2 = Image.open('model.png')
prompt = "Put the dress from the first image on the model from the second image"
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents=[image1, image2, prompt],
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(
aspect_ratio="3:4",
image_size="2K"
)
)
)
With Google Search Grounding
from google import genai
from google.genai import types
client = genai.Client()
response = client.models.generate_content(
model="gemini-3-pro-image-preview",
contents="Visualize the current weather forecast for San Francisco",
config=types.GenerateContentConfig(
response_modalities=['TEXT', 'IMAGE'],
image_config=types.ImageConfig(aspect_ratio="16:9"),
tools=[{"google_search": {}}]
)
)
Prompting Best Practices
1. Be Descriptive, Not Keyword-Based
Instead of: cat, wizard hat, cute
Write: A fluffy orange cat wearing a small knitted wizard hat, sitting on a wooden floor with soft natural lighting from a window
2. Specify Style and Mood
- Photography terms: "shot with 85mm lens", "soft bokeh background", "golden hour lighting"
- Artistic styles: "in the style of Van Gogh", "minimalist illustration", "photorealistic"
- Mood: "warm and cozy atmosphere", "dramatic noir lighting"
3. For Text in Images
Be explicit about:
- The exact text to render
- Font style (descriptively): "clean, bold, sans-serif font"
- Placement and size
4. For Editing
- Describe what to change and what to preserve
- Use "keep everything else unchanged"
- Reference specific elements clearly
5. For Product/Commercial Images
Mention:
- Lighting setup: "three-point softbox lighting"
- Background: "clean white studio background"
- Camera angle: "slightly elevated 45-degree shot"
Resolution and Aspect Ratio Reference
| Aspect Ratio | 1K Resolution | 2K Resolution | 4K Resolution |
|---|---|---|---|
| 1:1 | 1024x1024 | 2048x2048 | 4096x4096 |
| 16:9 | 1376x768 | 2752x1536 | 5504x3072 |
| 9:16 | 768x1376 | 1536x2752 | 3072x5504 |
| 3:2 | 1264x848 | 2528x1696 | 5056x3392 |
| 2:3 | 848x1264 | 1696x2528 | 3392x5056 |
Common Use Cases
Logo Creation
Create a modern, minimalist logo for a coffee shop called 'The Daily Grind'.
The text should be in a clean, bold, sans-serif font.
Black and white color scheme. Put the logo in a circle.
Product Photography
A high-resolution, studio-lit product photograph of a minimalist ceramic
coffee mug in matte black on a polished concrete surface. Three-point
softbox lighting with soft, diffused highlights. Slightly elevated
45-degree camera angle. Sharp focus on steam rising from the coffee.
Style Transfer
Transform this photograph of a city street at night into Vincent van Gogh's
'Starry Night' style. Preserve the composition but render with swirling,
impasto brushstrokes and deep blues with bright yellows.
Infographic
Create a vibrant infographic explaining photosynthesis as a recipe.
Show "ingredients" (sunlight, water, CO2) and "finished dish" (sugar/energy).
Style like a colorful kids' cookbook, suitable for 4th graders.
Error Handling
Common issues:
- No image returned: Check that
response_modalitiesincludes'IMAGE' - Safety filters: Some prompts may be blocked; try rephrasing
- Rate limits: Implement exponential backoff for retries
- Large images: For 4K, ensure sufficient timeout settings
Dependencies
To use the Python SDK:
pip install google-genai pillow
For JavaScript:
npm install @google/genai
Important Notes
- All generated images include a SynthID watermark
- The model uses a "thinking" process for complex prompts
- For best text rendering, generate text first, then request image with that text
- Images are not stored by the API - save outputs locally
More from dair-ai/dair-academy-plugins
llm-council
Orchestrate multiple open-weight LLMs via Fireworks AI to deliberate on queries. Models respond individually, rank each other's responses, then a Chairman synthesizes the final answer. Use this skill when the user wants multiple AI perspectives, consensus-building, or the "LLM Council" approach inspired by Karpathy. Powered by fast, affordable open-weight models on Fireworks.
2llm council
Orchestrate multiple open-weight LLMs via Fireworks AI to deliberate on queries. Models respond individually, rank each other's responses, then a Chairman synthesizes the final answer. Use this skill when the user wants multiple AI perspectives, consensus-building, or the "LLM Council" approach inspired by Karpathy. Powered by fast, affordable open-weight models on Fireworks.
2wiki-builder
Start, structure, grow, query, and maintain reusable research wikis. Use when the user wants to create a new wiki, add sources to an existing wiki, compile source material into wiki pages, customize wiki structure or flavor, generate research maps, update indexes, or maintain knowledge bases for papers, topics, projects, products, people, organizations, or ongoing research areas.
2