Nano Banana Pro Image Generation, Editing & Prompting

Generate new images, edit existing ones, or craft precise prompts for Google's Nano Banana Pro API (Gemini 3 Pro Image), including style transfer, color-grade transfer, locked-variable edits, text-heavy layouts, and subject replacement.

Usage

Run the script using absolute path (do NOT cd to skill directory first):

Generate new image:

uv run ~/skills/nano-banana-pro/scripts/generate_image.py --prompt "your image description" --filename "output-name.png" [--resolution 1K|2K|4K] [--api-key KEY]

Edit existing image:

uv run ~/skills/nano-banana-pro/scripts/generate_image.py --prompt "editing instructions" --filename "output-name.png" --input-image "path/to/input.png" [--resolution 1K|2K|4K] [--api-key KEY]

Important: Always run from the user's current working directory so images are saved where the user is working, not in the skill directory.

Default Workflow (draft → iterate → final)

Goal: fast iteration without burning time on 4K until the prompt is correct.

Draft (1K): quick feedback loop
- uv run ~/skills/nano-banana-pro/scripts/generate_image.py --prompt "<draft prompt>" --filename "yyyy-mm-dd-hh-mm-ss-draft.png" --resolution 1K
Iterate: adjust prompt in small diffs; keep filename new per run
- If editing: keep the same --input-image for every iteration until you’re happy.
Final (4K): only when prompt is locked
- uv run ~/.codex/skills/nano-banana-pro/scripts/generate_image.py --prompt "<final prompt>" --filename "yyyy-mm-dd-hh-mm-ss-final.png" --resolution 4K

Resolution Options

The Gemini 3 Pro Image API supports three resolutions (uppercase K required):

1K (default) - ~1024px resolution
2K - ~2048px resolution
4K - ~4096px resolution

Map user requests to API parameters:

No mention of resolution → 1K
"low resolution", "1080", "1080p", "1K" → 1K
"2K", "2048", "normal", "medium resolution" → 2K
"high resolution", "high-res", "hi-res", "4K", "ultra" → 4K

API Key

The script checks for API key in this order:

--api-key argument (use if user provided key in chat)
GEMINI_API_KEY environment variable

If neither is available, the script exits with an error message.

Preflight + Common Failures (fast fixes)

Preflight:
- command -v uv (must exist)
- test -n \"$GEMINI_API_KEY\" (or pass --api-key)
- If editing: test -f \"path/to/input.png\"
Common failures:
- Error: No API key provided. → set GEMINI_API_KEY or pass --api-key
- Error loading input image: → wrong path / unreadable file; verify --input-image points to a real image
- “quota/permission/403” style API errors → wrong key, no access, or quota exceeded; try a different key/account

Filename Generation

Generate filenames with the pattern: yyyy-mm-dd-hh-mm-ss-name.png

Format: {timestamp}-{descriptive-name}.png

Timestamp: Current date/time in format yyyy-mm-dd-hh-mm-ss (24-hour format)
Name: Descriptive lowercase text with hyphens
Keep the descriptive part concise (1-5 words typically)
Use context from user's prompt or conversation
If unclear, use random identifier (e.g., x9k2, a7b3)

Examples:

Prompt "A serene Japanese garden" → 2025-11-23-14-23-05-japanese-garden.png
Prompt "sunset over mountains" → 2025-11-23-15-30-12-sunset-mountains.png
Prompt "create an image of a robot" → 2025-11-23-16-45-33-robot.png
Unclear context → 2025-11-23-17-12-48-x9k2.png

Image Editing

When the user wants to modify an existing image:

Check if they provide an image path or reference an image in the current directory
Use --input-image parameter with the path to the image
The prompt should contain editing instructions (e.g., "make the sky more dramatic", "remove the person", "change to cartoon style")
Common editing tasks: add/remove elements, change style, adjust colors, blur background, etc.

Prompt Handling

For simple generation: Pass the user's image description as-is to --prompt. Only rework if clearly insufficient.

For editing: Pass editing instructions in --prompt (e.g., "add a rainbow in the sky", "make it look like a watercolor painting").

For prompt-writing requests: Do not run generation unless the user asks for an image. Deliver a ready-to-run prompt as the primary output.

For complex prompts: Use labeled sections and explicit rules when the task includes text, counts, layouts, typography, multiple inputs, diagrams, or exact constraints.

Preserve user's creative intent in both cases.

Prompt Engineering Workflow

Use this when the user asks for a Nano Banana prompt, wants prompt optimization, or the generation requires precise structure.

Clarify goal and constraints from the request: subject, medium, layout, text, counts, input images, output format.
Choose the smallest pattern that enforces the constraints.
Draft the prompt with labeled sections and explicit rules.
Add validation anchors for exact counts, placement, readable text, and forbidden changes.
Provide 1-2 variants only if useful or requested.

Prompt Skeleton

GOAL: <what to generate>
INPUTS: <image refs or none>
LAYOUT: <spatial regions, hierarchy, placement>
SUBJECTS: <entities, counts, poses>
TEXT: <exact strings, fonts, placement>
STYLE: <medium, rendering, aesthetic>
LIGHTING/CAMERA: <angle, lens, lighting>
CONSTRAINTS: <must/never rules, exact counts>
VALIDATION: <"ensure exactly X", "no extra text">

Prompt Pattern Selection

Layout-heavy, text-heavy, or multi-panel: use structured layout anchors.
Multi-input synthesis or system-level control: use JSON prompt.
Cinematic scene with people/props: use scene composition.
Character identity transfer: use character pipeline.

When to Load References

references/prompt-patterns.md — detailed templates, checklists, and pattern selection.
references/prompt-bank.md — full example prompts and use-case blueprints.

Prompt Templates (high hit-rate)

Use templates when the user is vague or when edits must be precise.

Generation template:
- “Create an image of: . Style: . Composition: <camera/shot>. Lighting: . Background: . Color palette: . Avoid: .”
Editing template (preserve everything else):
- “Change ONLY: . Keep identical: subject, composition/crop, pose, lighting, color palette, background, text, and overall style. Do not add new objects. If text exists, keep it unchanged.”

Motion-Ready Start Frames

Use this when the still image is meant to become the starting frame for an animated AI video shot.

Rule

Treat the frame like a keyframe pulled from motion, not like a neutral photo.

What to emphasize

implied movement already underway
active posture and directional energy
asymmetry, momentum, and continuation-ready gesture
hair/fabric/environmental motion where appropriate

Helpful motion language

kinetic, dynamic, mid-motion, caught in movement, in-action, directional energy, wind-swept, hair in motion, fabric in motion

Avoid

Static portrait energy when the next step is animation.

Advanced Edit Patterns

Use these when the job is tighter than a generic edit.

Locked-Variables Edit Pattern

Use this when only one element should change and everything else must stay fixed.

Change ONLY: [single variable].
Keep locked: subject identity, pose, framing/crop, camera angle/lens feel, lighting direction, color grade, background, wardrobe, and overall style.
Do not change facial structure, expression, proportions, or any unmentioned element.

Color-Grade Transfer Pattern

Use this when the target shot should keep its composition but inherit the look of a reference image.

Transfer the exact color grade and tonal treatment from the reference style image onto the target image. Keep the target image's composition, framing, focus, lighting direction, pose, depth of field, and camera perspective exactly the same. Do not alter subject position or lens behavior. Apply only the tonal palette, contrast behavior, highlight rolloff, shadow density, and overall cinematic color treatment from the style reference.

Subject Replacement Pattern

Use this when the composition/style should stay the same but the main person/object must be swapped.

Replace the original subject with the person from the reference images, seamlessly integrated into the exact same pose, body positioning, framing, camera angle, and environment. Preserve natural biomechanics, perspective, scale, and shadow consistency. Match the original lighting conditions exactly. Maintain realistic skin texture, visible pores, fine facial detail, and natural tonal variation. No smoothing, no distortion, no artificial blending artifacts.

Upscale / Enhancement Templates

Use these when the user wants to enhance an existing image to cinematic quality without changing its content. Always use with --input-image. Resolution should be 4K for final upscales, 2K for draft review.

Portrait / Person Upscale

Enhance the uploaded image to a flawless, ultra-high-quality cinematic version while preserving the subject with absolute precision. The person's identity, facial anatomy, expression, body pose, clothing, accessories, surroundings, framing, and overall composition must remain completely unchanged. Do not alter, reinterpret, replace, or introduce any new visual elements. Restore and refine micro-level details including precise facial contours, authentic skin texture with naturally visible pores, individually rendered hair strands, sharp and vivid eyes, and clean, well-defined edges throughout the image. Enhance dynamic range, contrast, and dimensionality with balanced, studio-quality cinematic lighting.

Automotive / Car Upscale

Enhance the uploaded image to a flawless, ultra-high-quality cinematic version while preserving the vehicle with absolute precision. The car's make, model, body shape, paint color, finish type, livery, badges, wheels, aerodynamic elements, surroundings, framing, and overall composition must remain completely unchanged. Do not alter, reinterpret, replace, or introduce any new visual elements. Restore and refine micro-level details including precise panel contours, authentic paint surface with visible metallic flake or matte grain, individually resolved mesh and spoke geometry in the wheels, sharp and legible badges and emblems, clean brake caliper detail, and well-defined edges throughout the image. Enhance dynamic range, contrast, and dimensionality with balanced, cinematic lighting that preserves the original light direction and color temperature.

Universal Upscale (any image)

Enhance the uploaded image to a flawless, ultra-high-quality cinematic version while preserving every subject and element with absolute precision. All subjects, objects, materials, colors, textures, surroundings, framing, and overall composition must remain completely unchanged. Do not alter, reinterpret, replace, or introduce any new visual elements. Restore and refine micro-level details including precise contours, authentic surface textures, individually resolved fine structures, sharp focal elements, and clean, well-defined edges throughout the image. Enhance dynamic range, contrast, and dimensionality with balanced, cinematic lighting that preserves the original light direction and color temperature.

Choosing the right template

Image contains	Use
People, faces, portraits	Portrait / Person
Cars, vehicles, automotive	Automotive / Car
Anything else or mixed subjects	Universal

Output

Saves PNG to current directory (or specified path if filename includes directory)
Script outputs the full path to the generated image
Do not read the image back - just inform the user of the saved path

Examples