gemini-image
Gemini Image Generation
Generate high-quality images from text prompts using Google's Gemini and Imagen models through executable scripts.
When to Use This Skill
Use this skill when you need to:
- Create visual content from text descriptions
- Generate multiple image variations
- Create images at specific resolutions (1K, 2K, 4K)
- Produce images for different aspect ratios (social media, banners, etc.)
- Generate photorealistic images or artistic visuals
- Create images with person generation controls
- Batch generate multiple images at once
- Combine with text generation for complete content creation
Available Scripts
scripts/generate_image.js
Purpose: Generate images using Gemini 3 Pro Image or Imagen 4 models
When to use:
- Any image generation task
- Multiple image generation (1-4 per request)
- Custom resolution and aspect ratio needs
- Professional asset creation
- Photorealistic or artistic image generation
Key parameters:
| Parameter | Description | Example |
|---|---|---|
prompt |
Text description (required) | "A futuristic city at sunset" |
--model, -m |
Model to use | gemini-3.1-flash-image-preview |
--output-dir, -o |
Output directory for images | images/ |
--name, -n |
Base name for output files | artwork |
--no-timestamp |
Disable auto timestamp | Flag |
--aspect, -a |
Aspect ratio | 16:9 |
--size, -s |
Resolution | 2K or 4K |
--num |
Number of images (1-4) | 4 |
--person |
Person generation policy | allow_adult |
Output: List of saved PNG file paths
Workflows
Workflow 1: Basic Image Generation
node scripts/generate_image.js "A futuristic city at sunset with flying cars"
- Best for: Quick image generation, prototypes
- Model:
gemini-3.1-flash-image-preview(default, Nano Banana 2) - Output:
images/generated_image_YYYYMMDD_HHMMSS.png
Workflow 2: Social Media (Instagram, Facebook)
node scripts/generate_image.js "Minimalist coffee shop interior" --aspect 1:1 --size 2K --name coffee-shop
- Best for: Instagram posts, profile pictures
- Aspect: 1:1 (square format)
- Resolution: 2K (2048x2048)
- Output:
images/coffee-shop_YYYYMMDD_HHMMSS.png
Workflow 3: YouTube Thumbnails (16:9)
node scripts/generate_image.js "Tech gadget review thumbnail with vibrant colors" --aspect 16:9 --size 2K --name thumbnail
- Best for: YouTube, video thumbnails
- Aspect: 16:9 (widescreen)
- Resolution: 2K (2752x1536)
- Output:
images/thumbnail_YYYYMMDD_HHMMSS.png
Workflow 4: Multiple Variations
node scripts/generate_image.js "Abstract geometric patterns in blue and gold" --num 4 --name abstract
- Best for: A/B testing, design options
- Generates: 4 distinct variations
- Output:
images/abstract_YYYYMMDD_HHMMSS_0.png,images/abstract_YYYYMMDD_HHMMSS_1.png, etc.
Workflow 5: Custom Output Directory
node scripts/generate_image.js "Detailed architectural rendering of modern museum" --aspect 16:9 --size 4K --output-dir ./professional/ --name museum
- Best for: Print materials, high-end assets, organized projects
- Model:
gemini-3.1-flash-image-previeworgemini-3-pro-image-preview(for 4K) - Resolution: 4K (5504x3072 for 16:9)
- Directory created automatically if it doesn't exist
Workflow 6: Photorealistic Images (Imagen 4)
node scripts/generate_image.js "Robot holding a red skateboard in urban setting" --model imagen-4.0-generate-001 --aspect 16:9 --size 2K --num 2 --name robot-skate
- Best for: Realistic photos, product shots
- Model:
imagen-4.0-generate-001(photorealistic) - Notes: English prompts only
- Max 4 images per request
Workflow 7: Blog Post Featured Image
node scripts/generate_image.js "Serene mountain lake at sunrise with reflections" --aspect 16:9 --size 2K --output-dir ./blog-images/ --name featured-image
- Best for: Blog headers, article images
- Combines well with: gemini-text for blog content generation
Workflow 8: Content Creation Pipeline (Text + Image)
# 1. Generate content (gemini-text skill)
node skills/gemini-text/scripts/generate.js "Write a product description for smart home device"
# 2. Generate product image (this skill)
node scripts/generate_image.js "Sleek modern smart home device on white background" --aspect 4:3 --size 2K --name product
# 3. Create social media post
- Best for: E-commerce, marketing campaigns
- Combines with: gemini-text, gemini-batch for batch production
Workflow 9: Disable Timestamp
node scripts/generate_image.js "Fixed filename image" --name my-image --no-timestamp
- Best for: When you want complete control over filename
- Output:
images/my-image.png(no timestamp) - Use when: Generating files for specific naming schemes or automated pipelines
Parameters Reference
Model Selection
| Model | Nickname | Quality | Max Size | Best For |
|---|---|---|---|---|
gemini-3.1-flash-image-preview |
Nano Banana 2 | Pro-level | 4K | New default, fast + strong quality |
gemini-3-pro-image-preview |
Nano Banana Pro | Highest | 4K | Maximum quality and complex text rendering |
gemini-2.5-flash-image |
Nano Banana | Good | 2K | High-volume, low-latency |
imagen-4.0-generate-001 |
Imagen 4 | Photorealistic | 2K | Realistic photos, product shots |
Aspect Ratios
| Ratio | Use Case | 1K Size | 2K Size |
|---|---|---|---|
| 1:1 | Instagram, avatars | 1024x1024 | 2048x2048 |
| 16:9 | YouTube, presentations | 1376x768 | 2752x1536 |
| 9:16 | Instagram Stories, TikTok | 768x1376 | 1536x2752 |
| 4:3 | Traditional displays | 1024x768 | 2048x1536 |
| 3:4 | Portrait orientation | 768x1024 | 1536x2048 |
| 21:9 | Ultrawide | - | 5504x2400 |
Note: 4K resolution is available with gemini-3.1-flash-image-preview and gemini-3-pro-image-preview
Resolution Guide
| Size | Use Case | Best Model |
|---|---|---|
| 1K (1024px) | Web thumbnails, previews | Any model |
| 2K (2048px) | Standard web, social media | Any model |
| 4K (4096px) | Print, high-end assets | gemini-3-pro only |
Person Generation Policy
| Policy | Description | Restrictions |
|---|---|---|
dont_allow |
No people in images | None |
allow_adult |
Adults only | Recommended default |
allow_all |
All ages | Restricted in EU, UK, CH, MENA |
Output Interpretation
File Naming
- Default format:
{name}_YYYYMMDD_HHMMSS.png(auto timestamp) - Single image example:
artwork_20260130_031643.png - Multiple images:
{name}_YYYYMMDD_HHMMSS_0.png,{name}_YYYYMMDD_HHMMSS_1.png, etc. - Without timestamp (
--no-timestamp):{name}.png - Script prints: "Saved: /path/to/file.png"
Image Quality
- All images include SynthID watermark for authenticity
- PNG format for lossless quality
- Can be converted to JPEG/WEBP if needed
- 4K images are significantly larger file sizes
Error Messages
- "Model not available": Check model name spelling
- "Unsupported size": Verify size/model combination
- "Aspect ratio error": Use supported ratios for selected model
Common Issues
"google-genai or pillow not installed"
cd scripts && npm install
"Image generation failed"
- Check prompt length (too verbose can fail)
- Try simpler, more focused prompts
- Verify model availability in your region
- Check API quota limits
"Unsupported aspect ratio"
- Check if ratio is supported by selected model
- Imagen 4 has fewer ratio options than Gemini
- Use 16:9 or 1:1 for best compatibility
"4K not supported"
- 4K works best with
gemini-3.1-flash-image-previeworgemini-3-pro-image-preview - Use
--size 2Kfor older models - Try
--model gemini-3.1-flash-image-preview --size 4K
"Imagen prompt language error"
- Imagen models support English prompts only
- Use
gemini-3.1-flash-image-previewfor other languages - Translate prompt to English for Imagen
File too large for storage
- Use
--size 1Kfor smaller files - Compress images after generation
- Convert PNG to JPEG for web use
Best Practices
Prompt Engineering
- Be specific and descriptive
- Include style descriptors (e.g., "photorealistic", "digital art")
- Mention lighting, mood, and composition
- Use analogies for complex concepts
- Avoid negative prompts (describe what you want, not what to avoid)
Model Selection
- Use
gemini-3.1-flash-image-previewfor: Best default balance, quality, speed, 4K - Use
gemini-3-pro-image-previewfor: Maximum quality, complex text rendering - Use
gemini-2.5-flash-imagefor: Speed, high volume - Use
imagen-4.0-generate-001for: Photorealism, product shots
Performance Optimization
- Generate multiple images at once with
--num - Use lower resolution for previews
- Batch requests for high-volume needs (gemini-batch skill)
- Cache results for repeated requests
Quality Tips
- Use 2K resolution for most web uses
- 4K only when maximum detail is needed
- Combine specific prompts with style guidance
- Test prompts with
--num 1before generating batches
Cost Management
- Use flash models for cost efficiency
- 4K generation costs significantly more
- Batch multiple requests when possible
- Generate at 1K for testing, 2K/4K for final
Related Skills
- gemini-text: Generate text content alongside images
- gemini-tts: Create audio for image-based content
- gemini-batch: Process multiple image requests efficiently
- gemini-embeddings: Generate image embeddings for similarity search
Quick Reference
# Basic
node scripts/generate_image.js "Your prompt"
# Social media (1:1)
node scripts/generate_image.js "Prompt" --aspect 1:1 --size 2K --name social-post
# YouTube thumbnail (16:9)
node scripts/generate_image.js "Prompt" --aspect 16:9 --size 2K --name thumbnail
# 4K high quality
node scripts/generate_image.js "Prompt" --aspect 16:9 --size 4K --name high-res
# Multiple variations
node scripts/generate_image.js "Prompt" --num 4 --name variations
# Custom directory
node scripts/generate_image.js "Prompt" --output-dir ./my-images/ --name custom
# Photorealistic
node scripts/generate_image.js "Prompt" --model imagen-4.0-generate-001 --aspect 16:9 --size 2K --name photo
# No timestamp
node scripts/generate_image.js "Prompt" --name fixed-name --no-timestamp
Reference
- See
references/for model documentation (if available) - Get API key: https://aistudio.google.com/apikey
- Documentation: https://ai.google.dev/gemini-api/docs/image-generation
- SynthID: https://deepmind.google/technologies/synthid/
More from akrindev/google-studio-skills
gemini-tts
Generate speech from text using Google Gemini TTS models via scripts/. Use for text-to-speech, audio generation, voice synthesis, multi-speaker conversations, and creating audio content. Supports multiple voices and streaming. Triggers on "text to speech", "TTS", "generate audio", "voice synthesis", "speak this text".
68gemini-embeddings
Generate text embeddings using Gemini Embedding API via scripts/. Use for creating vector representations of text, semantic search, similarity matching, clustering, and RAG applications. Triggers on "embeddings", "semantic search", "vector search", "text similarity", "RAG", "retrieval".
16gemini-files
Upload and manage files using Google Gemini File API via scripts/. Use for uploading images, audio, video, PDFs, and other files for use with Gemini models. Supports file upload, status checking, and file management. Triggers on "upload file", "file API", "upload image", "upload PDF", "upload video", "file management".
13gemini-batch
Process large volumes of requests using Gemini Batch API via scripts/. Use for batch processing, bulk text generation, processing JSONL files, async job execution, and cost-efficient high-volume AI tasks. Triggers on "batch processing", "bulk requests", "JSONL", "async job", "batch job".
13gemini-text
Generate text content using Google Gemini models via scripts/. Use for text generation, multimodal prompts with images, thinking mode for complex reasoning, JSON-formatted outputs, and Google Search grounding for real-time information. Triggers on "generate with gemini", "use gemini for text", "AI text generation", "multimodal prompt", "gemini thinking mode", "grounded response".
9