nano-banana-image-gen
Nano Banana 2 image generation reference
Nano Banana (NB) is Google's Gemini image generation family. NB2 = Gemini 3.1 Flash.
Model selection
| Model | Best for | Tradeoffs |
|---|---|---|
| NB1 | Existing workflows that already work, lowest cost, speed-critical | No thinking, no visual grounding, no extreme ratios |
| NB2 | Default for new projects (~95% of Pro quality) | ~15% slower than Pro at 2K, weaker arm/leg composition, spelling errors in infographics |
| Pro | Complex multi-layered prompts, extreme logical constraints, interior scale/logic | Most expensive |
Decision: Start with NB2. Step up to Pro only if NB2 consistently fails your specific prompt.
Visual grounding (NB2 only)
NB2 searches the internet for reference images before generating. Useful for:
- Specific real-world locations (churches, bridges, city squares, niche buildings)
- Exact animal species, breeds, insects
- Historically accurate scenes
Example: "Generate a cinematic, golden-hour photograph of the main historical church in Voiron, France. Ensure the architectural details, the spire, the surrounding square, and the landscape (mountains) are accurate to reality."
Cost optimization
512px batch-to-upscale workflow:
- Use Batch API (50% discount) to generate dozens of 512px variations
- Review and pick the best composition
- Upscale that image to 1K/2K/4K
512px output runs faster and costs roughly the same as NB1.
Parameters
| Parameter | Values | Notes |
|---|---|---|
| Resolution | 512px, 1K, 2K, 4K | 512px for drafts, upscale winners |
| Aspect ratio | Standard + 1:4, 1:8, 4:1, 8:1 | Extreme ratios for banners, comics, scrolling |
| Thinking mode | On/Off | Keep OFF by default. Enable for complex infographics or grounding + spatial reasoning |
Prompt recipes
3D character selfie (requires image upload): Transform personal photos into stylized 3D characters interacting with real selves.
Anime to photorealistic (requires image upload): "Convert this uploaded animated still into an ultra-realistic, cinematic, and fully photorealistic scene. Transform the animated characters into real humans while perfectly preserving their original identities, facial structures, outfits, expressions, and overall likeness."
Historical street view: "Generate a hyper-realistic image of [event] perfectly replicating a Google Maps Street View capture. Include a 123-degree wide-angle barrel distortion..."
Crayon filter: "A child's crayon drawing on white lined notebook paper of [subject]. Use chunky wax-crayon strokes, wobbly outlines, and bright bold colors that messily overflow the lines. Include visible heavy pressure marks, waxy smudges, and uneven scribble shading."
Comic strip: "Create a 4-panel horizontal comic strip (aspect ratio 4:1). [Story]. Use a vibrant, Franco-Belgian comic book style. Keep the [character] design consistent across all panels."
Known limitations
- Spelling mistakes in infographics (NB1 was better at this)
- Consistency across multiple generations is weaker than Pro
- Arm/leg composition issues more frequent than other models
- Knowledge cutoff prevents referencing very recent products even with search on
- No seed value support yet (style changes between generations)
Attribution
Based on "Getting the most out of Nano Banana 2" by @NanoBanana on X (Mar 11, 2026). Original thread covers model comparisons, visual grounding, parameters, and prompt techniques for Google's Gemini image generation models.