prompt-to-image
Prompt To Image
Generate prompt-based images with a bundled Node script that calls provider REST APIs directly.
This skill is for creating new images from prompts, and it also supports basic reference-image-guided generation for specific OpenAI and Gemini models. It is not for complex multi-step compositing workflows or rendering markdown into graphics.
Use This Skill For
- OpenAI image generation with current GPT Image models such as
gpt-image-1.5(default),gpt-image-1, orgpt-image-1-mini - Gemini image generation with current Google image models such as
gemini-2.5-flash-image(Nano Banana),gemini-3.1-flash-image-preview(Nano Banana 2), orgemini-3-pro-image-preview(Nano Banana Pro) - Reference-image prompting and image edits when the provider/model supports it
- Single images or small batches of prompt variations
- Tasks where the user wants a local image file as output
Do Not Use This Skill For
- Markdown, quote cards, code screenshots, or social post rendering from text blocks
Use
markdown-to-imageinstead. - Heavy multi-step image-edit workflows, advanced compositing, or provider-agnostic reference pipelines
This skill is deliberately simpler than
baoyu-image-gen.
Script Path
SKILL_DIR/scripts/main.mjs
Resolve SKILL_DIR as the directory containing this SKILL.md.
Setup
Provide API keys via environment variables:
OPENAI_API_KEYfor OpenAI modelsGOOGLE_GENERATIVE_AI_API_KEYfor Gemini models
Aliases also supported by the script:
GOOGLE_API_KEYGEMINI_API_KEY
Optional default model environment variables:
OPENAI_IMAGE_MODELdefault:gpt-image-1.5GOOGLE_IMAGE_MODELorGEMINI_IMAGE_MODELdefault:gemini-2.5-flash-image
Optional .env lookup paths:
<cwd>/.prompt-to-image/.env~/.prompt-to-image/.env
The script uses those files as fallbacks beneath existing environment variables, and normalizes the Google key aliases.
No repo dependency changes are required. The script uses Node's built-in fetch.
Quick Start
# Default provider selection prefers Google when both keys are present
node ${SKILL_DIR}/scripts/main.mjs \
--prompt "A cinematic koi fish temple at sunrise" \
--output out.png
# Force OpenAI with an explicit image model
node ${SKILL_DIR}/scripts/main.mjs \
--provider openai \
--model gpt-image-1.5 \
--prompt "A paper-cut poster of a lunar rover in a desert" \
--size 1536x1024 \
--output rover.png
# Use Nano Banana / Gemini image generation
node ${SKILL_DIR}/scripts/main.mjs \
--provider google \
--model gemini-2.5-flash-image \
--prompt "A charming ramen shop mascot, flat vector style" \
--ar 4:5 \
--output mascot.png
# Generate multiple concepts
node ${SKILL_DIR}/scripts/main.mjs \
--provider openai \
--prompt "Three different thumbnail concepts for a retro sci-fi podcast" \
--n 3 \
--output thumbnails/concept.png
CLI Options
| Option | Description |
|---|---|
-p, --prompt <text> |
Inline prompt text |
--promptfiles <files...> |
Read prompt text from files and concatenate them |
--ref, --reference <files...> |
One or more reference images |
-o, --output <path> |
Output file path, or basename for multi-image runs |
--provider openai|google |
Force provider selection |
-m, --model <id> |
Model override |
--size <WxH> |
Exact size, for OpenAI image models |
--ar, --aspect-ratio <W:H> |
Aspect ratio. Gemini 2.5 Flash Image supports ratios such as 1:1, 4:5, 9:16, and 16:9 directly |
--n <count> |
Number of images to generate |
--quality <value> |
OpenAI provider option passthrough |
--style <value> |
OpenAI style passthrough, e.g. vivid or natural |
--timeout-ms <ms> |
Abort request after a timeout |
--json |
Print machine-readable metadata after saving files |
Provider Selection
The script uses this order:
--provider, if provided- If only one provider key is available, use that provider
- If both are available, default to Google because Nano Banana is the more flexible default here
Model ID Cheat Sheet
When choosing a model, prefer the stable, commonly used IDs below unless the user explicitly requests something else.
OpenAI: most common choices
gpt-image-1Use when the user explicitly asks for it or when you need to match the older GPT Image pricing/performance tier.gpt-image-1.5Default choice for most OpenAI image generation tasks. Use this first for general-purpose text-to-image work.gpt-image-1-miniUse when the user explicitly wants a cheaper/faster GPT Image variant.
Preferred OpenAI default in this skill: gpt-image-1.5.
Google Gemini: most common choices
gemini-2.5-flash-imageDefault Nano Banana model. Use this first for most Gemini image generation tasks.gemini-3.1-flash-image-previewAlso known as Nano Banana 2. It provides high-quality image generation and conversational editing at a mainstream price point and low latency. Treat it as the high-efficiency counterpart to Gemini 3 Pro Image when the user wants speed, lower cost, or high-volume developer workflows. Use when the user explicitly asks for Nano Banana 2 or wants a newer Gemini image variant.gemini-3-pro-image-previewAlso known as Nano Banana Pro. Use when the user mentions Nano Banana Pro, or wants a more premium or instruction-heavy Gemini image model and explicitly asks for higher-end quality.
Preferred Google default in this skill: gemini-2.5-flash-image.
Selection rules for the LLM using this skill
- If the user names a model, use that exact model ID.
- If the user asks for OpenAI image generation but does not name a model, start with
gpt-image-1.5. - If the user asks for Gemini, Nano Banana, or Google image generation without naming a model, start with
gemini-2.5-flash-image. - Only reach for
gpt-image-1.5,gpt-image-1-mini,gemini-3.1-flash-image-preview(Nano Banana 2), orgemini-3-pro-image-preview(Nano Banana Pro) when the user asks for them or there is a clear reason to optimize for that tradeoff.
Pricing Notes
Pricing changes over time. If cost matters, prefer checking the provider's current pricing page before quoting numbers externally.
Current Google image pricing notes
Assume 1024x1024 pricing unless noted otherwise.
| Model | Pricing |
|---|---|
gemini-2.5-flash-image |
$0.039 per image |
gemini-3.1-flash-image-preview |
$0.067 per image |
gemini-3-pro-image-preview |
$0.134 per image |
Current OpenAI image pricing notes
Assume medium quality and 1024x1024 pricing.
| Model | Pricing |
|---|---|
gpt-image-1 |
$0.042 per image |
gpt-image-1.5 |
$0.034 per image |
gpt-image-1-mini |
$0.011 per image |
These numbers were added from the providers' official pricing pages and should be treated as guidance, not permanent constants.
Provider Notes
OpenAI
- Default model:
gpt-image-1.5 - Uses
POST /v1/images/generations - With
--ref, usesPOST /v1/images/edits - Best when the user wants explicit image sizes like
1024x1024or1536x1024 - Supports
--size,--n, and OpenAI request fields such as--qualityand--style - If the user gives only an aspect ratio, the script maps that to the closest supported OpenAI size
- Reference images are supported in this skill for GPT Image models such as
gpt-image-1.5,gpt-image-1, andgpt-image-1-mini
Google Gemini Image ("Nano Banana")
- Default model:
gemini-2.5-flash-image - Uses
POST https://generativelanguage.googleapis.com/v1beta/models/<model>:generateContent --aris sent ingenerationConfig.imageConfig.aspectRatio- For
--n > 1, the script makes sequential calls and asks for one distinct variation per call --sizeis not used for Nano Banana in this skill- Reference images in this skill are supported for
gemini-3.1-flash-image-previewandgemini-3-pro-image-preview
Output Behavior
- If
--n 1,--outputis treated as the final file path; when no extension is provided, the script appends one - If
--n > 1and--outputincludes an extension, files are saved asname-1.png,name-2.png, etc. - If
--n > 1and--outputhas no extension, it is treated as a directory
Reference Images
Use --ref or --reference to pass one or more source images.
This makes the skill compatible with baoyu-xhs-images style reference-image chains, where image 1 is generated first and later slides reuse image 1 as --ref for visual consistency.
Example with OpenAI edits:
node ${SKILL_DIR}/scripts/main.mjs \
--provider openai \
--model gpt-image-1.5 \
--prompt "Turn this product photo into a clean editorial hero image" \
--ref source.png \
--output edited.png
Example with Gemini reference-image prompting:
node ${SKILL_DIR}/scripts/main.mjs \
--provider google \
--model gemini-3.1-flash-image-preview \
--prompt "Keep the subject identity but restyle this as a cozy watercolor poster" \
--ref character.png \
--output poster.png
Selection rules:
- For OpenAI +
--ref, prefer GPT Image models such asgpt-image-1.5. - For Google +
--ref, prefergemini-3.1-flash-image-previeworgemini-3-pro-image-preview. - If the user wants consistent follow-on images from source art, use reference images instead of plain text-only prompting.
Interop
prompt-to-image interoperates cleanly with both baoyu-xhs-images and markdown-to-image.
With baoyu-xhs-images
Use prompt-to-image as the image generation engine behind the XHS workflow:
baoyu-xhs-imagescreates the cover and content prompt files- Generate image 1 without
--ref - Generate images 2+ with
--ref <image-1>for visual consistency
This works because prompt-to-image supports:
--promptfiles--ref--ar 3:4--quality 2k
For the OpenAI path, --quality 2k is normalized to an OpenAI-compatible high-quality request.
With markdown-to-image
Use prompt-to-image output as the visual layer for Inkframe:
- Generate the image with
prompt-to-image - Convert the saved image to a base64 data URL
- Put that data URL into Inkframe design fields such as
backgroundImageUrlorcontentBoxImageUrl - Render the final card/poster with
markdown-to-image
All three together
Recommended pipeline:
baoyu-xhs-imageshandles Xiaohongshu-specific structure, style, and prompt planningprompt-to-imagerenders the actual imagesmarkdown-to-imagecomposes branded summary cards, title cards, or text overlays using those generated images
Use this combined workflow when the user wants both AI-generated illustration and polished typographic composition.
Working Style
- Prefer concise but concrete prompts. Mention subject, style, composition, lighting, and mood.
- Use
--sizewhen the user gives an exact output size and you are on OpenAI. - Use
--arwhen the user cares more about layout shape than exact pixel dimensions, especially on Gemini. - For multiple concepts, keep
--nmodest unless the user explicitly wants a large batch. - Save to a real file path and hand the user the created image file(s), not just console output.
Error Handling
- Missing provider key: fail with a clear setup message
- No image returned: surface the provider response details if available
- Unsupported combination: prefer warning plus best-effort behavior, unless the request cannot work
Test Prompts
These are good manual eval prompts for this skill:
Create two square app icon concepts for a budgeting app called PennyTrail. Make them warm, simple, and premium.Use Nano Banana to generate a vertical travel poster for Kyoto in rain, woodblock print style, lots of negative space.Generate an OpenAI image for a YouTube thumbnail about debugging React hydration issues. High contrast, modern, readable composition.