nanobanana-skill
Nanobanana Image Skill
Use the bundled nanobanana.py tool to generate or edit images with Gemini image models. The default path now targets Nano Banana 2 (gemini-3.1-flash-image-preview) and turns on thinking summaries plus Google Search grounding by default, because those defaults are the main reason to use this skill instead of a generic image prompt.
When to use this skill
Use it for:
- Creating a new image from text
- Editing or remixing one or more existing images
- Combining several reference images into one composition
- Requests where factual freshness matters and image grounding should use live web data
- Requests that need strong text rendering, layout reasoning, or more deliberate composition
Do not use it for:
- Non-image tasks
- Image work that explicitly must use another provider or another installed image skill
Requirements
GEMINI_API_KEYmust exist in~/.nanobanana.envor the shell environment.- Python dependencies from
requirements.txtmust be installed. - The executable is the
nanobanana.pyfile in this same skill directory. Resolve its absolute path once before running it.
Example env file:
GEMINI_API_KEY=sk-dummy
Default behavior
Unless the user explicitly asks otherwise, prefer these defaults:
- Model:
gemini-3.1-flash-image-preview - Search grounding: enabled
- Thinking summaries: enabled
- Thinking level:
high - Resolution:
1K - Aspect ratio: leave unspecified unless the user clearly wants a shape
Leaving aspect ratio unspecified is usually better for edits because Gemini can match the input image shape. For text-only generation, pick an aspect ratio only when the user implies a format such as poster, square post, banner, phone wallpaper, or ultrawide hero image.
Workflow
1. Clarify only the missing constraints
Ask only for details that materially affect the result:
- The image brief or editing instruction
- Any input/reference images to use
- Target format or aspect ratio if implied by the use case
- Output filename if the user cares where it lands
Do not force the user to choose a model, search mode, or thinking mode unless they asked for that level of control. The latest Nanobanana 2 path is already the default.
2. Choose the right mode
Use generate when there are no input images.
Use edit/composite when there are one or more input images. Nanobanana 2 can mix multiple references, so do not artificially limit the task to a single image if the user is clearly asking for a blend, lineup, storyboard, or consistency pass.
3. Run the bundled tool
Use the script beside this skill file.
Basic generation:
python3 /absolute/path/to/nanobanana.py \
--prompt "Create a high-end coffee bag package design with tactile paper texture and clear typography" \
--output /absolute/path/to/output/package.png
Editing or compositing:
python3 /absolute/path/to/nanobanana.py \
--prompt "Turn these product photos into a clean 4:5 ecommerce hero image with a soft studio shadow and subtle headline area" \
--input /absolute/path/to/ref1.png /absolute/path/to/ref2.png \
--aspect-ratio 4:5 \
--output /absolute/path/to/output/hero.png
Grounded generation with saved text metadata:
python3 /absolute/path/to/nanobanana.py \
--prompt "Use Google Search to ground an editorial illustration about the most recent lunar mission and create a clean magazine cover concept" \
--aspect-ratio 2:3 \
--text-output /absolute/path/to/output/cover.txt \
--metadata-output /absolute/path/to/output/cover.json \
--output /absolute/path/to/output/cover.png
4. Return the result clearly
Always tell the user:
- The saved image path or paths
- Whether search grounding stayed enabled
- Whether text/thought summaries were saved anywhere
- Any relevant limitation or warning from the run
If the model returns text but no image, report that plainly and suggest a more explicit image-focused prompt instead of pretending the run succeeded.
Recommended options
Models
gemini-3.1-flash-image-preview: default. Best default for fast, high-volume image generation and editing with Nanobanana 2 features.gemini-3-pro-image-preview: slower, but a good override for very detail-heavy or typography-sensitive work.gemini-2.5-flash-image: legacy fallback if the user specifically wants the older Nanobanana model.
Aspect ratios
Supported ratios:
1:11:41:82:33:23:44:14:34:55:48:19:1616:921:9
Quick picks:
1:1for logos, icons, thumbnails, and general social posts4:5for feed posts and product cards2:3for posters and book-cover style work9:16for stories, shorts, and phone wallpaper16:9or21:9for slides, banners, and desktop hero art1:4,4:1,1:8,8:1for very tall or very wide experimental layouts now supported by Nanobanana 2
Resolution
512px: Nanobanana 2 only. Best for quick ideation.1K: default. Good tradeoff for most requests.2K: use for polished deliverables.4K: use when the user explicitly needs a high-resolution final.
Thinking and search
- Keep search grounding on by default when the request could benefit from live facts, current events, or real product references.
- Keep thinking summaries on by default because Nanobanana 2 often composes better on multi-constraint tasks.
- Lower
--thinking-leveltoloworminimalwhen the user prioritizes latency over refinement. - Disable search only when the user wants a purely imaginative result or explicitly requests no web grounding.
Prompting guidance
Good Nanobanana prompts are direct production briefs, not vague art wishes. Include:
- Subject
- Visual style
- Composition or camera framing
- Required text if any
- Output use case
- Constraints such as brand colors, empty space, or realism level
Prefer prompts like:
Create a premium sparkling water can advertisement. Use a cold studio product-photo look, silver highlights, condensation droplets, and a clean dark-teal background. Leave negative space in the upper-right for headline copy.
Instead of:
make a cool drink ad
For edits, tell the model what to preserve and what to change:
Keep the shoe silhouette and logo placement intact. Replace the background with a bright outdoor basketball court, add dynamic afternoon shadows, and keep the image looking like a real sports campaign photo.
Nanobanana 2 capabilities to lean on
- Multi-reference composition with many input images
- Better grounded visuals with Google Search and Google Image Search support behind the built-in search tool
- Wider aspect-ratio support, including very tall and very wide outputs
512pxfast ideation output in addition to1K,2K, and4K- Stronger iterative reasoning via Gemini 3 thinking controls
Error handling
If the run fails:
- Check
~/.nanobanana.envand confirmGEMINI_API_KEYis present. - Confirm each input image path exists and is readable.
- Confirm the output directory is writable.
- If a feature looks unsupported, retry with
gemini-3.1-flash-image-previewfirst. - If the response contains only text, rewrite the prompt so the image deliverable is explicit.
Examples
Fast ideation
python3 /absolute/path/to/nanobanana.py \
--prompt "Create three-dimensional sticker-style fruit mascots on white" \
--resolution 512px \
--output /absolute/path/to/output/stickers.png
Grounded current-events visual
python3 /absolute/path/to/nanobanana.py \
--prompt "Use Google Search to ground a newspaper-style illustration about the latest Mars mission and create a restrained front-page visual" \
--aspect-ratio 3:2 \
--output /absolute/path/to/output/mars.png
Multi-reference composite
python3 /absolute/path/to/nanobanana.py \
--prompt "Create a single brand moodboard from these references. Keep the ceramic texture from the first image, the palette from the second, and the lighting mood from the third." \
--input /absolute/path/to/a.png /absolute/path/to/b.png /absolute/path/to/c.png \
--aspect-ratio 16:9 \
--output /absolute/path/to/output/moodboard.png