nano-banana-image-gen

SKILL.md

Nano Banana 2 image generation reference

Nano Banana (NB) is Google's Gemini image generation family. NB2 = Gemini 3.1 Flash.

Model selection

Model Best for Tradeoffs
NB1 Existing workflows that already work, lowest cost, speed-critical No thinking, no visual grounding, no extreme ratios
NB2 Default for new projects (~95% of Pro quality) ~15% slower than Pro at 2K, weaker arm/leg composition, spelling errors in infographics
Pro Complex multi-layered prompts, extreme logical constraints, interior scale/logic Most expensive

Decision: Start with NB2. Step up to Pro only if NB2 consistently fails your specific prompt.

Visual grounding (NB2 only)

NB2 searches the internet for reference images before generating. Useful for:

  • Specific real-world locations (churches, bridges, city squares, niche buildings)
  • Exact animal species, breeds, insects
  • Historically accurate scenes

Example: "Generate a cinematic, golden-hour photograph of the main historical church in Voiron, France. Ensure the architectural details, the spire, the surrounding square, and the landscape (mountains) are accurate to reality."

Cost optimization

512px batch-to-upscale workflow:

  1. Use Batch API (50% discount) to generate dozens of 512px variations
  2. Review and pick the best composition
  3. Upscale that image to 1K/2K/4K

512px output runs faster and costs roughly the same as NB1.

Parameters

Parameter Values Notes
Resolution 512px, 1K, 2K, 4K 512px for drafts, upscale winners
Aspect ratio Standard + 1:4, 1:8, 4:1, 8:1 Extreme ratios for banners, comics, scrolling
Thinking mode On/Off Keep OFF by default. Enable for complex infographics or grounding + spatial reasoning

Prompt recipes

3D character selfie (requires image upload): Transform personal photos into stylized 3D characters interacting with real selves.

Anime to photorealistic (requires image upload): "Convert this uploaded animated still into an ultra-realistic, cinematic, and fully photorealistic scene. Transform the animated characters into real humans while perfectly preserving their original identities, facial structures, outfits, expressions, and overall likeness."

Historical street view: "Generate a hyper-realistic image of [event] perfectly replicating a Google Maps Street View capture. Include a 123-degree wide-angle barrel distortion..."

Crayon filter: "A child's crayon drawing on white lined notebook paper of [subject]. Use chunky wax-crayon strokes, wobbly outlines, and bright bold colors that messily overflow the lines. Include visible heavy pressure marks, waxy smudges, and uneven scribble shading."

Comic strip: "Create a 4-panel horizontal comic strip (aspect ratio 4:1). [Story]. Use a vibrant, Franco-Belgian comic book style. Keep the [character] design consistent across all panels."

Known limitations

  • Spelling mistakes in infographics (NB1 was better at this)
  • Consistency across multiple generations is weaker than Pro
  • Arm/leg composition issues more frequent than other models
  • Knowledge cutoff prevents referencing very recent products even with search on
  • No seed value support yet (style changes between generations)

Attribution

Based on "Getting the most out of Nano Banana 2" by @NanoBanana on X (Mar 11, 2026). Original thread covers model comparisons, visual grounding, parameters, and prompt techniques for Google's Gemini image generation models.

Weekly Installs
6
GitHub Stars
56
First Seen
3 days ago
Installed on
openclaw6
gemini-cli6
github-copilot6
codex6
kimi-cli6
cursor6