seo-images
SEO Images — Premium AI Image Generation for Content
Generate publication-quality images for blog posts, newsletters, and social media using structured JSON prompts and Nano Banana 2 (Gemini 3.1 Flash Image).
No stock photo slop. No generic AI art. Images that match the quality of your writing.
Quick Start
# Generate a dark dashboard hero image
/seo-image "OpenClaw Meta Ads: $1,400/mo Saved" --style dark_data --ratio 16:9
# Generate a founder portrait
/seo-image "founder working late" --style founder_editorial --preset office_morning --ratio 4:5
# Generate a product shot
/seo-image "premium leather wallet on marble" --style product_lifestyle --ratio 1:1
# Generate a social share card
/seo-image "Ridge Wallet: $200M/Year" --style social_card --ratio 16:9
Setup
Option A: Replicate (recommended)
# Get your API token from https://replicate.com/account/api-tokens
export REPLICATE_API_TOKEN="r8_your_token_here"
Model: google/nano-banana-2 on Replicate
Cost: ~$0.02-0.05 per image at 1K, ~$0.08-0.12 at 2K
Speed: 5-15 seconds per image
Option B: Google AI Studio (free tier available)
# Get your API key from https://aistudio.google.com/apikey
export GOOGLE_AI_API_KEY="your_key_here"
Model: gemini-3.1-flash-image-preview
Cost: Free tier available (rate limited), then pay-as-you-go
Endpoint: https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent
Option C: fal.ai
export FAL_KEY="your_key_here"
Option D: Any provider with Nano Banana 2 access
The skill generates structured JSON prompts. The prompt is the valuable part — pipe it to whatever provider you prefer.
How It Works
- You provide: Article context + style selection + optional customization
- Skill builds: A cinema-grade structured JSON prompt with full camera/lighting/color specs
- Skill flattens: JSON into an optimized text prompt for the model
- Skill calls: Replicate API (or your preferred provider)
- Skill outputs: 1-3 variations at your specified aspect ratio and resolution
The 6 Visual Styles
Read the full style definitions in styles/. Each style has:
- Full JSON schema with every parameter documented
- Presets for common use cases
- Negative prompts to avoid common AI failures
- Vertical targeting (which industries/content types it works for)
1. founder_editorial — Cinematic People Photography
Hyper-realistic editorial portraits. Real people in real environments. Magazine-quality lighting and composition. Shot on named cameras with specific lenses.
Best for: About pages, founder stories, team pages, interview headers, newsletter author images
Presets: rooftop_golden, office_morning, street_candid
2. dark_data — Premium Data Visualization Cards
Dark glass dashboard cards with bold stats, clean typography, and subtle depth. Rendered text that's actually legible. Think Bloomberg Terminal meets Stripe marketing.
Best for: Stats callouts, performance reports, case study headers, metric hero images, newsletter stats
Presets: revenue_card, performance_card
3. product_lifestyle — Editorial Product Photography
Premium product shots in lifestyle contexts. The product is hero but the environment tells a story. Phase One quality.
Best for: Product launches, comparison posts, e-commerce features, DTC content
4. social_card — Scroll-Stopping Typography
High-impact text-forward cards for blog OG images, tweet cards, LinkedIn headers. Bold type, high contrast, designed to stop the scroll at thumbnail size.
Best for: Blog share images, tweet cards, LinkedIn post images, newsletter hero, chapter headers
5. warm_lifestyle — Bright Editorial Spaces
Warm, aspirational lifestyle photography. Natural light, earth tones, beautiful spaces. Film-stock quality.
Best for: Wellness content, real estate, creator economy, coaching, course promos
6. cinematic_scene — Movie-Still Narratives
Film-quality scenes with dramatic lighting and deliberate composition. Named director references for specific visual languages.
Best for: Long-form editorial, investigative pieces, dramatic feature stories, newsletter covers
Prompt Architecture
Every style uses this structured JSON format:
{
"prompt": "the flattened text prompt sent to the model",
"style_id": "which style template was used",
"subject": { ... },
"environment": { ... },
"camera": {
"camera_model": "specific camera body (e.g. Leica M11)",
"lens_focal_length_mm": 50,
"aperture": "f/1.4",
"shot_type": "medium_close_up",
"perspective": "eye_level",
"depth_of_field": "shallow"
},
"lighting": {
"key_light": "direction, quality, falloff description",
"fill_light": "source and intensity",
"rim_light": "position and effect",
"color_temperature": "warm | neutral | cool | mixed"
},
"color_and_style": {
"palette": ["#hex1", "#hex2", "#hex3"],
"grade": "specific color grading description",
"grain": "film grain type and intensity",
"overall_style": "cultural reference (e.g. 'A24 meets Acne Studios')"
},
"output_settings": {
"aspect_ratio": "16:9",
"resolution": {"width": 1920, "height": 1080},
"format": "png"
},
"negative_prompt": ["array of specific things to avoid"]
}
Why this matters: Generic prompts produce generic images. Specifying the exact camera model, aperture, film stock, lighting rig, and color grade gives Nano Banana 2 the detail it needs to produce images that look like they were shot by a real photographer with real equipment.
Generating Images
Via the generation script
# Set your API token
export REPLICATE_API_TOKEN="your_token"
# Generate from a style + context
bash scripts/generate.sh \
--style dark_data \
--context "Article about Ridge Wallet scaling to \$200M revenue" \
--headline "Ridge Wallet: \$200M/Year" \
--subtext "4-Voice Content System · 500+ AI Creatives/Day" \
--ratio 16:9 \
--resolution 2K \
--output ./output/ridge-hero.png
# Generate with a preset
bash scripts/generate.sh \
--style founder_editorial \
--preset office_morning \
--context "Newsletter author headshot" \
--ratio 4:5 \
--output ./output/author.png
# Generate a social card
bash scripts/generate.sh \
--style social_card \
--headline "The Agent That Never Sleeps" \
--subtext "How AI Replaced 90 Minutes of Ads Manager" \
--ratio 16:9 \
--output ./output/og-image.png
Via the agent (OpenClaw / Claude Code / Codex)
The agent reads the article context, selects the appropriate style, builds the full JSON prompt, and calls the API. Example:
/seo-image for my article "OpenClaw Meta Ads Complete Guide" — I need a hero image and a social share card
The agent will:
- Read the article to understand the content
- Select
dark_datafor the hero (tech/data content) - Select
social_cardfor the share image - Build full structured prompts for each
- Generate via Replicate
- Save both images to the specified output directory
Reference Images
For style consistency across articles, maintain reference images:
references/
founder_editorial/ → 3-5 gold-standard portrait references
dark_data/ → 3-5 reference dashboard cards
product_lifestyle/ → 3-5 reference product shots
social_card/ → 3-5 reference typography cards
warm_lifestyle/ → 3-5 reference lifestyle shots
cinematic_scene/ → 3-5 reference film stills
Pass reference images to the API via image_input to maintain visual consistency:
curl -s -X POST "https://api.replicate.com/v1/models/google/nano-banana-2/predictions" \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "your flattened prompt here",
"image_input": ["https://url-to-reference-image.png"],
"aspect_ratio": "16:9",
"resolution": "2K",
"output_format": "png"
}
}'
API Reference
Replicate — google/nano-banana-2
# Create prediction
curl -s -X POST "https://api.replicate.com/v1/models/google/nano-banana-2/predictions" \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"input": {
"prompt": "string — the full text prompt",
"image_input": ["array of image URLs — up to 14 reference images"],
"aspect_ratio": "16:9",
"resolution": "1K | 2K | 4K",
"output_format": "jpg | png",
"google_search": false,
"image_search": false
}
}'
# Poll for result
curl -s -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
"https://api.replicate.com/v1/predictions/{prediction_id}"
Parameters:
| Param | Type | Default | Description |
|---|---|---|---|
prompt |
string | required | The image description |
image_input |
array of URLs | [] | Reference images (up to 14) |
aspect_ratio |
enum | match_input_image | 1:1, 3:2, 2:3, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9, 1:4, 4:1, 1:8, 8:1 |
resolution |
enum | 1K | 0.5K, 1K, 2K, 4K |
output_format |
enum | jpg | jpg, png |
google_search |
bool | false | Use real-time web data |
image_search |
bool | false | Use Google Image Search for visual context |
Google AI Studio — gemini-3.1-flash-image-preview
curl -s -X POST \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-3.1-flash-image-preview:generateContent?key=$GOOGLE_AI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"contents": [{"parts": [{"text": "your prompt here"}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"imageSizeOptions": {"aspectRatio": "16:9"}
}
}'
Prompt Flattening
The structured JSON is for YOUR reference and consistency. The model receives a flattened text version. The scripts/generate.sh script handles this automatically, but here's the logic:
JSON structured prompt → Flatten to narrative text → Send to model
Example:
{
"subject": {"age": 32, "pose": "leaning on railing"},
"camera": {"camera_model": "Leica M11", "aperture": "f/1.4"},
"lighting": {"key_light": "warm sunset from camera-left"}
}
→ "hyper-realistic portrait of a 32-year-old man leaning casually on
a railing, warm sunset light from camera-left with soft falloff,
shot on Leica M11 at f/1.4, shallow depth of field..."
The structured format ensures consistency and makes it easy to tweak individual parameters without rewriting the entire prompt.
Tips
- Camera specificity matters. "Shot on Leica M11 at f/1.4" produces dramatically different results than "professional photography."
- Name the color grade. "Cinematic teal and orange" or "desaturated Portra 400 emulation" gives the model a clear reference point.
- Negative prompts prevent common failures. Always include "no warped hands" for people shots and "no illegible text" for typography.
- Reference images are your secret weapon. Feed 1-3 reference images via
image_inputto maintain style consistency across a content series. - Text rendering works now. Use quotes around text you want rendered:
"GROWTH PLAYS". Specify the font:"bold white Impact font". NB2 handles this well. - Don't over-prompt. If the prompt exceeds ~500 words, the model starts losing focus. Keep the flattened version tight. The JSON is for your records.
More from themattberman/seo-kit
seo-health
Technical SEO monitoring. Weekly PageSpeed audits, Core Web Vitals tracking, crawl health checks, image optimization, and mobile usability. Catches problems before they tank your rankings.
4seo-agent
Self-improving SEO system. Finds keywords via DataForSEO, writes content that ranks, monitors GSC for what's climbing, then writes MORE content to push winners higher. A feedback loop, not a one-shot tool.
4seo-checklist
SEO technical checklist and publishing reference. Meta tags, schema markup, llms.txt, topical authority, internal linking rules, Core Web Vitals targets, and image optimization. The agent uses this when publishing content. You can use it to audit existing pages.
4seo-forge
Matt Berman's custom SEO content engine. Learns your brand through a smart interview, then creates content that ranks AND converts using Puppet Strings, Scroll Traps, and Care To Click psychology. Anti-AI-Overview by design. DataForSEO integrated. The content your competitors can't copy because it requires real experience.
4seo-links
Backlink acquisition engine. Mines competitor backlinks, finds unlinked brand mentions, discovers broken link opportunities, audits internal linking, and prospects resource pages. Turns link building from guesswork into a repeatable system.
4