gpt-image
Installation
SKILL.md
gpt-image
Agent runbook for GPT Image 2 generation/editing. Use the prompt library + packaged CLI. Do not reimplement image API code.
Operating loop
- Classify request:
generate,edit,inpaint, ormulti-reference; identify asset type, exact text, aspect ratio, references, safety constraints, and budget/quality. - Search references first: open
references/gallery.md; load/search the closestreferences/gallery-<category>.mdfile(s). Read actual**Prompt**text before choosing a pattern. - Refine with craft: load
references/craft.mdfor dense text, diagrams, UI, data visualization, multi-panel layouts, weak prompts, or no close gallery match. - Confer when useful: before costly/ambiguous/high-polish calls, present 1–3 matched directions plus planned size/quality; ask at most one concise question. Skip long discussion for precise “generate now” requests.
- Preflight, no side effects: use existing CLI/skill if present. Check command availability (
command -v gpt-image), installed tool lists when the tool manager exists, or the runtime’s own skill registry when available. Do not assume a local home path in cloud/hosted runtimes. - No blind setup: do not reinstall, overwrite skill folders, create/modify
.env, or write API keys unless the user explicitly requested setup. Global/shared installs are opt-in only. - Execute via CLI only: call
gpt-imageorscripts/generate.py. Do not create a newgenerate.py, SDK wrapper, or ad-hoc script for normal image requests. - Report: output file path(s), key flags, and one concise refinement suggestion if useful.
Fast path: precise prompt + explicit “generate now” → quick reference/craft check, then CLI.
CLI resolution
Preferred call order:
# Existing CLI on PATH
gpt-image -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]
# Installed skill folder; use runtime-provided skill path when available
uv run "$SKILL_DIR/scripts/generate.py" -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]
# Direct transient CLI when the user requested setup/one-off CLI execution
uvx --from git+https://github.com/wuyoscar/gpt_image_2_skill gpt-image -p "PROMPT" [options]
scripts/generate.py is a launcher: repo-local src/gpt_image_cli → installed gpt-image → PATH gpt-image → transient uvx/uv fallback.
Key and cost rules
- CLI reads
OPENAI_API_KEYfrom process env, then.env, then~/.envwithout overriding existing env; successful API calls may bill the user’s OpenAI account. - If host/runtime has native platform-managed image generation and the user wants that path, use the host tool instead of this CLI.
- If
OPENAI_API_KEYis unset, report missing key or use host-native generation when requested; do not write secrets. - If user wants to avoid local-key use, respect
unset OPENAI_API_KEY; if a key exists in.env/~/.env, tell them to remove/rename it for the session rather than working around it. - Never print secret values.
Flags
| Flag | Values | Use |
|---|---|---|
-p, --prompt |
string | Required prompt/edit instruction |
-f, --file |
path | Output path; auto-named if omitted |
-i, --image |
repeatable path | Use edits endpoint; supports multiple references |
-m, --mask |
PNG path | Inpaint with alpha mask; requires -i |
--model |
default gpt-image-2 |
Image model |
--size |
1k, 2k, 4k, portrait, landscape, square, wide, tall, or literal |
Canvas size |
--quality |
low, medium, high, auto |
Cost/quality dial |
-n, --n |
integer | Number of images |
--background |
auto, opaque |
Generation background |
--moderation |
auto, low |
Generation moderation setting |
--format |
png, jpeg, webp |
Output encoding |
--compression |
0-100 |
JPEG/WebP compression |
--user |
string | Optional end-user identifier |
Quality policy:
low: cheap drafts, broad exploration, many variants.medium: normal exploration, style probing, balanced cost.high: final assets, Chinese text, posters, diagrams, UI, paper figures, dense labels.
Size policy:
- default/social square:
1k/1024x1024 - poster/mobile/beauty:
portrait - landscape/gameplay/photo:
landscape - print/paper figure:
2k - widescreen hero:
4k - vertical story/banner:
tall
Endpoint routing
| Mode | Trigger | Endpoint |
|---|---|---|
| Text-to-image | no -i |
/v1/images/generations |
| Reference edit | one or more -i |
/v1/images/edits |
| Inpaint | -i + -m |
/v1/images/edits with mask |
Surface API errors verbatim enough for debugging; exit codes: 0 success, 1 API/refusal, 2 bad args/missing key.
Reference loading
references/gallery.md: routing index for the 162-prompt Reference Gallery Atlas. Load first.references/gallery-*.md: concrete prompts, previews, paths, metadata, attribution. Load 1 category for normal requests; 2–3 for hybrids.references/craft.md: prompt-craft checklist. Load for prompt repair, exact text, UI/data/diagram grammar, edit invariants, and multi-panel consistency.references/openai-cookbook.md: official parameter/model semantics. Load for API behavior or model capability questions.
Reference loading policy: load the smallest useful slice; never load all category files by default.
Verification
- Before API call: confirm endpoint mode, size, quality, output path, and required reference/mask files.
- After CLI call: report path(s) printed by the CLI and surface stderr on failure.
- For edits/inpaints: verify
-ipaths exist; verify-mexists when used.
Preserve Curated vs Author + Source metadata when adapting examples. Add new collected prompts to the Reference Gallery before README promotion.