openai-image
OpenAI Image Generation & Editing
Generate images from text prompts, edit existing images, analyze with vision, and apply editorial transforms. Default model is gpt-image-1.5 (fastest, cheapest, best text rendering).
Quick Start
- Confirm
OPENAI_API_KEYis set:echo $OPENAI_API_KEY | head -c 10 - Install the SDK if needed:
pip install openai - Generate an image:
python3 scripts/openai_image.py generate "A watercolor painting of a sunset over Mos Eisley" --output sunset.png
Commands
Generate
Create images from a text prompt.
# Basic generation
python3 scripts/openai_image.py generate "your prompt" --output result.png
# High quality, specific size
python3 scripts/openai_image.py generate "your prompt" --quality high --size 1536x1024 -o landscape.png
# Transparent background (PNG only)
python3 scripts/openai_image.py generate "a logo on transparent background" --background transparent -o logo.png
# Multiple images at once
python3 scripts/openai_image.py generate "your prompt" -n 4 --output-dir ./variants/
# Compressed JPEG output
python3 scripts/openai_image.py generate "your prompt" --format jpeg --compression 80 -o photo.jpg
# Use the older model if needed
python3 scripts/openai_image.py generate "your prompt" --model gpt-image-1 -o result.png
Edit
Modify existing images with a text prompt. Optionally supply a mask to constrain edits.
# Edit a single image
python3 scripts/openai_image.py edit "make the sky dramatic and stormy" -i photo.jpg -o dramatic.png
# Edit with a mask (transparent areas in the mask = regions to change)
python3 scripts/openai_image.py edit "replace with a garden" -i room.jpg --mask mask.png -o garden_room.png
# Combine multiple images
python3 scripts/openai_image.py edit "merge these into a collage with consistent lighting" -i img1.jpg img2.jpg img3.jpg -o collage.png
# High input fidelity (preserves more of the original style)
python3 scripts/openai_image.py edit "add a hat" -i portrait.jpg --input-fidelity high -o hat.png
When to Use Input Fidelity
The --input-fidelity flag controls how much the output preserves the source image's structure:
- Use
highwhen you want to preserve the spatial layout of the source: walls, windows, furniture placement, body poses. Good for stylizing a venue photo while keeping the architecture intact, or retouching a portrait without changing the pose. - Omit it (or use
low) when the source is a loose reference: you want the AI to use the shape or composition as a starting point but reimagine the contents freely. Good for filling an empty glass with a different liquid, or using a product shot as a structural anchor.
Rule of thumb: if the edit prompt describes changing what's in the image, omit fidelity. If it describes changing how the image looks, use high.
Reference-Based Generation
The most powerful edit pattern is using a photo as a structural anchor while completely reimagining its contents. Feed a product photo to edit not to modify the product, but to let the AI use its shape and proportions as a scaffold for something new.
# Use an empty coupe glass photo as a structural reference, reimagine the contents
python3 scripts/openai_image.py edit \
"Fill this coupe glass with a bright blue butterfly pea tea cocktail, violet-shifting ice cubes, condensation on the glass" \
-i ref_empty_coupe.jpg --quality high -o cocktail_blue.png
# Use a rocks glass photo as a shape anchor for a completely different drink
python3 scripts/openai_image.py edit \
"Golden amber old fashioned with a large ice sphere, orange peel garnish, smoke wisps" \
-i ref_rocks_glass.jpg --quality high -o cocktail_amber.png
# Use a venue photo as a layout reference for a different setting
python3 scripts/openai_image.py edit \
"Transform this space into a 1920s speakeasy with warm Edison bulbs, dark wood, and brass fixtures" \
-i venue_photo.jpg --input-fidelity high -o speakeasy.png
Notice: the first two examples omit --input-fidelity because the glass shape is a loose reference. The third uses --input-fidelity high because the wall/window layout should be preserved.
Describe
Analyze images using GPT-4o vision. Returns alt text, captions, tags, or structured analysis.
# Generate alt text for web accessibility (default)
python3 scripts/openai_image.py describe photo.jpg
# Get a natural language caption
python3 scripts/openai_image.py describe photo.jpg --mode caption
# Detailed multi-paragraph description
python3 scripts/openai_image.py describe photo.jpg --mode detailed
# Keyword tags
python3 scripts/openai_image.py describe photo.jpg --mode tags
# Structured JSON (alt_text, caption, tags, colors, objects, scene)
python3 scripts/openai_image.py describe photo.jpg --mode json
# Custom analysis
python3 scripts/openai_image.py describe photo.jpg --custom "what fonts and colors are used in this design?"
# Multiple images
python3 scripts/openai_image.py describe img1.jpg img2.png img3.webp
# Use the full gpt-4o model for better accuracy
python3 scripts/openai_image.py describe photo.jpg --model gpt-4o
Background Remove
Remove background to transparent PNG.
python3 scripts/openai_image.py bg-remove product.jpg -o product-nobg.png
Style Transfer
Apply an art style to an image. 10 built-in presets plus custom.
# Built-in styles: watercolor, oil-painting, pixel-art, pencil-sketch,
# anime, pop-art, art-deco, minimalist, cyberpunk, stained-glass
python3 scripts/openai_image.py style-transfer photo.jpg --style watercolor -o watercolor.png
python3 scripts/openai_image.py style-transfer photo.jpg --style pixel-art -o pixel.png
# Custom style
python3 scripts/openai_image.py style-transfer photo.jpg --style custom --custom-style "1920s art nouveau poster" -o nouveau.png
Restore
Restore damaged, faded, or degraded photographs. Uses high input fidelity by default.
python3 scripts/openai_image.py restore old_photo.jpg -o restored.png
Thumbnail
Generate web-optimized thumbnails (JPEG at 80% compression by default).
# From a text prompt
python3 scripts/openai_image.py thumbnail "a cozy coffee shop interior" -o thumb.jpg
# From an existing image
python3 scripts/openai_image.py thumbnail "clean product shot" --from-image product.jpg -o thumb.jpg
Batch
Process multiple image jobs from a JSON manifest. Each job can generate or edit independently, sharing a common style prefix and defaults.
python3 scripts/openai_image.py --retries 3 batch drinks.json --output-dir ./public/images/
Manifest format (drinks.json):
{
"style_prefix": "Vivid, hyper-real 1920s cinematic movie still. Rich jewel tones, warm golden lighting, film grain.",
"defaults": {
"quality": "high",
"size": "1024x1024",
"model": "gpt-image-1.5",
"format": "png"
},
"jobs": [
{
"name": "cold_open",
"input": "ref_coupe.jpg",
"prompt": "Blue butterfly pea tea cocktail with violet-shifting ice cubes, condensation on glass",
"output": "drink_cold_open.png"
},
{
"name": "smoking_gun",
"input": "ref_rocks.jpg",
"prompt": "Golden amber with smoke cloche, large ice sphere, orange peel",
"output": "drink_smoking_gun.png"
},
{
"name": "hero_banner",
"prompt": "Elegant bar counter with three cocktails backlit by warm Edison bulbs",
"output": "hero_banner.png",
"size": "1536x1024"
}
]
}
Each job inherits from defaults and can override any field. Jobs with input use the edit API (reference-based generation); jobs without input use generate. The style_prefix is prepended to every job's prompt.
Batch also generates an index.html gallery in the output directory with thumbnails and job info. Open it in a browser to review all results at a glance.
Output is a summary JSON with per-job status:
{
"status": "success",
"message": "Batch complete: 3/3 succeeded",
"results": [
{"name": "cold_open", "status": "success", "path": "/abs/path/drink_cold_open.png"},
{"name": "smoking_gun", "status": "success", "path": "/abs/path/drink_smoking_gun.png"},
{"name": "hero_banner", "status": "success", "path": "/abs/path/hero_banner.png"}
]
}
Parameters Reference
Global flags
These flags go before the subcommand name:
| Flag | Values | Default | Notes |
|---|---|---|---|
--retries |
0-10 |
0 |
Retry transient API errors with exponential backoff (1s, 2s, 4s... capped at 30s) |
--prefix |
string | none | Style preamble prepended to prompts in generate, edit, and style-transfer |
--preset |
draft, balanced, final |
none | Quality preset. draft = mini/low ($0.005), balanced = 1.5/medium ($0.034), final = 1.5/high ($0.133). Explicit --model/--quality override. |
--dry-run |
flag | off | Estimate cost in USD without making API calls. Works with all commands and batch. |
# Example: retry up to 3 times with a style prefix
python3 scripts/openai_image.py --retries 3 --prefix "Photorealistic, 8K, shallow depth of field." generate "a cup of coffee" -o coffee.png
# Use a preset for quick iteration
python3 scripts/openai_image.py --preset draft generate "concept sketch of a robot" -o robot_draft.png
# Estimate cost before running
python3 scripts/openai_image.py --preset final --dry-run generate "hero image" -n 4
# Dry-run a whole batch manifest
python3 scripts/openai_image.py --dry-run batch drinks.json
Presets
Presets map to model + quality combinations. Use them to switch between iteration and production without remembering flag combos:
| Preset | Model | Quality | Approx. Cost (square) |
|---|---|---|---|
draft |
gpt-image-1-mini |
low |
$0.005 |
balanced |
gpt-image-1.5 |
medium |
$0.034 |
final |
gpt-image-1.5 |
high |
$0.133 |
If you pass --model or --quality explicitly, those override the preset values.
Dry Run
--dry-run calculates the estimated cost without calling the API. The output is JSON:
{
"status": "dry_run",
"estimated_cost_usd": 0.532,
"breakdown": [
{"model": "gpt-image-1.5", "quality": "high", "size": "1024x1024", "n": 1, "cost_usd": 0.133}
]
}
For batch manifests, the breakdown includes each job by name. When --quality is auto, the estimate uses medium pricing as a reasonable midpoint.
Generation & Editing flags
| Flag | Values | Default | Notes |
|---|---|---|---|
--model |
gpt-image-1.5, gpt-image-1, gpt-image-1-mini, dall-e-3 |
gpt-image-1.5 |
1.5 is newest and recommended; mini is cheapest. DALL-E 3 is deprecated (shutdown 2026-05-12). |
--size |
auto, 1024x1024, 1536x1024, 1024x1536 |
auto |
DALL-E 3 also supports 1792x1024, 1024x1792 |
--quality |
auto, low, medium, high |
auto |
GPT Image models. DALL-E 3 uses standard / hd instead. |
--format |
png, jpeg, webp |
png |
GPT Image models only; DALL-E returns URL |
--compression |
0-100 |
none | JPEG/WebP only |
--background |
auto, transparent, opaque |
auto |
Transparent requires PNG or WebP format. Best at medium or high quality. |
-n |
1-10 |
1 |
Number of images |
-o / --output |
file path | auto-named | Single image explicit path |
--output-dir |
directory | . |
Where auto-named files go |
--input-fidelity |
low, high |
low |
Edit only. high preserves source layout; low (default) uses source as loose reference. |
Describe flags
| Flag | Values | Default | Notes |
|---|---|---|---|
--mode |
alt-text, caption, detailed, tags, json |
alt-text |
Output format for vision analysis |
--custom |
string | none | Freeform analysis prompt (overrides --mode) |
--model |
gpt-4o, gpt-4o-mini |
gpt-4o-mini |
Vision model; mini is cheaper, 4o is more accurate |
Style transfer flags
| Flag | Values | Default |
|---|---|---|
--style |
watercolor, oil-painting, pixel-art, pencil-sketch, anime, pop-art, art-deco, minimalist, cyberpunk, stained-glass, custom |
required |
--custom-style |
string | none (required when --style custom) |
Thumbnail flags
| Flag | Values | Default |
|---|---|---|
--from-image |
file path | none (generates from prompt if omitted) |
--format |
png, jpeg, webp |
jpeg |
--compression |
0-100 |
80 |
Batch flags
| Flag | Values | Default | Notes |
|---|---|---|---|
manifest (positional) |
file path | required | Path to JSON manifest |
--output-dir |
directory | . |
Base directory for output files |
Manifest fields: style_prefix (string), defaults (object with model/quality/size/format/compression/background/input_fidelity), jobs (array of objects with name/prompt/input/output and optional per-job overrides).
Resolution Expectations
Output dimensions vary by model, quality, and --size. This table shows what to expect:
| Model | Quality Levels | Available Sizes | Notes |
|---|---|---|---|
gpt-image-1.5 |
low, medium, high, auto |
1024x1024, 1536x1024, 1024x1536, auto |
State of the art, recommended |
gpt-image-1 |
low, medium, high, auto |
1024x1024, 1536x1024, 1024x1536, auto |
Previous generation |
gpt-image-1-mini |
low, medium, high, auto |
1024x1024, 1536x1024, 1024x1536, auto |
Budget option, all sizes supported |
dall-e-3 |
standard, hd |
1024x1024, 1792x1024, 1024x1792 | Deprecated. Shutdown 2026-05-12 |
dall-e-2 |
standard |
256x256, 512x512, 1024x1024 | Deprecated. Shutdown 2026-05-12 |
When --size auto (the default), the API picks the best size for the prompt. For predictable output, set size explicitly. Use 1536x1024 for landscape backgrounds and hero images, 1024x1024 for product shots and thumbnails, 1024x1536 for portrait/mobile.
Note: DALL-E 2 and DALL-E 3 are deprecated and will stop working on May 12, 2026. Migrate to gpt-image-1.5 for new work.
Cost Guidance
Per-image costs in USD (as of late 2025). Check OpenAI pricing for current rates.
| Model | Quality | Square (1024x1024) | Landscape/Portrait (1536x) |
|---|---|---|---|
gpt-image-1.5 |
low |
$0.009 | $0.013 |
gpt-image-1.5 |
medium |
$0.034 | $0.050 |
gpt-image-1.5 |
high |
$0.133 | $0.200 |
gpt-image-1-mini |
low |
$0.005 | $0.006 |
gpt-image-1-mini |
medium |
$0.011 | $0.015 |
gpt-image-1-mini |
high |
$0.036 | $0.052 |
dall-e-3 (deprecated) |
standard |
$0.040 | $0.080 |
dall-e-3 (deprecated) |
hd |
$0.080 | $0.120 |
Cost-aware usage for agents:
- Draft with
low, ship withhigh. Use--quality lowwhile iterating on prompts ($0.009/image). Switch tohighonly for the final version ($0.133). That is a 15x cost difference. - Use
gpt-image-1-minifor throwaway work. At $0.005/image (low quality), it is essentially free for drafting, testing prompts, or generating placeholder images. - Batch math matters. A 10-image batch at
gpt-image-1.5high quality landscape runs $2.00. The same batch at low quality is $0.13. Ask yourself whether every image in the batch needs high quality, or whether some (backgrounds, textures) can use medium or low. - Describe is nearly free.
gpt-4o-minivision calls cost fractions of a cent per image. Usedescribe --mode jsonfreely for analysis, alt text, and tagging. - Edit costs the same as generate. Using a reference photo does not add cost but dramatically improves quality. Always prefer edit with a reference photo over blind generation.
- Avoid DALL-E 3. It is deprecated (shutdown May 2026), costs 4x more than
gpt-image-1.5at standard quality, and produces lower-quality results. Usegpt-image-1.5for everything.
Prompt Engineering
The GPT image models respond well to detailed, specific prompts. A few things that help:
- Be specific about style: "oil painting", "3D render", "pixel art", "watercolor", "photorealistic"
- Describe composition: "close-up", "aerial view", "centered", "rule of thirds"
- Mention lighting: "golden hour", "dramatic shadows", "soft diffused light"
- Include context: "on a white background", "in a forest setting", "floating in space"
For edits, describe the full desired result rather than just the change. "A portrait of a person wearing a red hat in a garden" works better than "add a hat".
Prompt Spec Scaffold
When building prompts for image generation, use this structured template. Fill in each segment that applies, skip the rest. The agent should compose the final prompt by concatenating the filled segments into a single string.
[SUBJECT] What is the main focus? e.g. "A Bengal cat sitting on a stack of old books"
[STYLE] Art style or medium. e.g. "Hyper-real photograph" or "Ukiyo-e woodblock print"
[COMPOSITION] Camera angle and framing. e.g. "Close-up, shallow depth of field, rule of thirds"
[LIGHTING] Light source and quality. e.g. "Warm golden hour side-lighting, long shadows"
[COLOR] Palette or mood. e.g. "Muted earth tones with a pop of teal"
[BACKGROUND] Setting and context. e.g. "In a dimly lit library with leather-bound volumes"
[CONSTRAINTS] Technical limits. e.g. "No text, no watermarks, transparent background"
Example assembled prompt:
A Bengal cat sitting on a stack of old books. Hyper-real photograph. Close-up, shallow depth of field, rule of thirds. Warm golden hour side-lighting, long shadows. Muted earth tones with a pop of teal. In a dimly lit library with leather-bound volumes. No text, no watermarks.
The agent should auto-enhance user prompts by filling in missing segments. If the user says "make me a picture of a cat", the agent adds style, composition, lighting, and color based on context. No API call needed for prompt enhancement -- the agent does it.
See references/sample-prompts.md for curated examples by category.
Consistent Series
When generating a cohesive set of images (product shots, menu items, page backgrounds), use these techniques to keep them visually unified:
1. Use --prefix for a shared style preamble. Every prompt gets the same visual DNA:
PREFIX="Vivid, hyper-real 1920s cinematic movie still. Rich jewel tones, warm golden lighting, film grain."
python3 scripts/openai_image.py --prefix "$PREFIX" generate "blue cocktail in a coupe glass" --quality high -o drink1.png
python3 scripts/openai_image.py --prefix "$PREFIX" generate "amber old fashioned with smoke" --quality high -o drink2.png
python3 scripts/openai_image.py --prefix "$PREFIX" generate "emerald absinthe drip" --quality high -o drink3.png
2. Use batch for manifests. Define the prefix once, list all jobs:
python3 scripts/openai_image.py --retries 3 batch drinks.json --output-dir ./public/images/
3. Keep quality and size consistent. Mixing --quality medium and --quality high across a series produces visible inconsistency. Pick one and stick with it.
4. Use reference photos as structural anchors. Feed the same glass, product, or venue photo into multiple edit calls with different prompts. The shared geometry keeps the series grounded. See "Reference-Based Generation" above.
5. The "Mise en place" method. When generating images that involve multiple steps, variations, or use the same recurring elements (like ingredients for a recipe, tools for a craft, or parts of a product), first generate a single "mise en place" style image that contains all the individual elements laid out clearly on a flat, neutral surface. You can then use this initial "ingredients" image as a structural anchor (using edit with --input-fidelity) for subsequent generations, ensuring visual consistency of the core components across the entire series.
Masks
Masks are PNG files with an alpha channel. Fully transparent pixels mark the area to edit; opaque pixels protect the original. To create a mask:
- Open the source image in any editor (GIMP, Photoshop, Preview)
- Erase the region you want to change (make it transparent)
- Save as PNG (preserves alpha channel)
The mask must match the source image dimensions.
Output
All commands return structured JSON:
{
"status": "success",
"message": "Generated 1 image(s)",
"model": "gpt-image-1.5",
"images": [
{"index": 0, "path": "/absolute/path/to/gen_20260304_143022.png"}
]
}
The describe command returns text instead of images:
PNG output files include embedded metadata (tEXt chunks) with the prompt, model, quality, and size used to generate them. View with identify -verbose file.png (ImageMagick) or any PNG metadata viewer.
{
"status": "success",
"message": "Described 1 image",
"model": "gpt-4o-mini",
"mode": "alt-text",
"result": {
"file": "/path/to/photo.jpg",
"description": "Pixel art spaceship with blue cockpit and orange thrusters on white background"
}
}
Troubleshooting
- "OPENAI_API_KEY not set": Run
export OPENAI_API_KEY='sk-...'or add it to~/.bashrc. - "openai package is not installed": Run
pip install openai. - Billing/quota errors: Check your OpenAI account at platform.openai.com for usage limits.
- Mask dimension mismatch: Resize the mask to match the source image exactly.
- DALL-E 3 format errors: DALL-E 3 does not support
--formator--background. Omit those flags or use a GPT image model. - Transient API errors (connection, timeout, 502/503): The OpenAI Images API has a roughly 10-20% transient failure rate under load. Use
--retries 3to automatically retry with exponential backoff (1s, 2s, 4s delays). Retry status is logged to stderr so you can monitor progress. For batch jobs, always use--retries 3. - Empty output file: The script now verifies every saved file is non-empty. If you see a
WriteError, the API returned empty data. Retry the command or check your API quota.
More from baphomet480/claude-skills
kitchen-sink-design-system
Kitchen Sink design system workflow for Next.js and React projects, with secondary support for Astro, SvelteKit, Nuxt, and static HTML. Use when asked for a Kitchen Sink page, Design System, UI Audit, Style Guide, or Component Inventory, or when a project needs a component inventory plus component creation and a sink page implementation. Covers CVA variant architecture, Tailwind v3/v4 token systems, shadcn/ui integration, and TinaCMS content modeling.
40deep-research
Conduct comprehensive, multi-round research that produces rich visual reports. Use when asked for "deep research", "comprehensive analysis", "compare frameworks", "evaluate options", "research the state of X", or any task requiring investigation across 10+ sources. NOT for quick lookups — this is a 5-15 minute deep dive that produces a briefing-quality artifact with screenshots, diagrams, tables, and cited findings.
37design-lookup
Search and retrieve CSS components, SVG icons, design patterns, and visual inspiration from the web. Use when the user asks to find, look up, or search for CSS snippets, SVG icons, UI components, loading spinners, animations, design inspiration, or any visual/frontend design resource. Triggers on requests like "find me a CSS button", "look up an SVG spinner", "search for a card component", "find a wave divider SVG", or "get design inspiration for a dashboard".
34nextjs-tinacms
Build Next.js 16 + React 19 + TinaCMS sites with visual editing, blocks-based page builder, and complete SEO. Use this skill whenever the user mentions TinaCMS, Tina CMS, Next.js with a CMS, visual editing with Next.js, click-to-edit, content-managed Next.js site, blocks pattern page builder, or migrating to Next.js + TinaCMS. Also trigger for TinaCMS schema design, self-hosted TinaCMS, TinaCMS media configuration, or any TinaCMS troubleshooting. Covers Day 0-2 setup from scaffolding through production deployment on Vercel.
32cloudflare-pages
Deploy static sites to Cloudflare Pages with custom domains and CI/CD. Use when the user wants to deploy a site to Cloudflare Pages, add a custom domain to a Pages project, set up GitHub Actions CI/CD for Cloudflare Pages, roll back a deployment, or verify deployment status. Triggers on "deploy to Cloudflare", "Cloudflare Pages", "add custom domain", "pages deploy", or any Cloudflare Pages hosting workflow.
31local-ocr
Local OCR pipeline for AI agents featuring auto-rotation, deskew, and searchable PDF generation via ocrmypdf and Tesseract.
23