picture-it
picture-it
Photoshop for AI agents. Composable image operations from the CLI.
Source: https://github.com/geongeorge/picture-it | npm: https://www.npmjs.com/package/picture-it
Prerequisites
picture-it must be installed and configured. Requires Node.js 18+.
# Install (pick one)
npm install -g picture-it
pnpm add -g picture-it
bun install -g picture-it
# Setup
picture-it download-fonts
Credentials
The FAL API key is required for AI operations (generate, edit, remove-bg, upscale). Set it via environment variable or the CLI:
# Option 1: Environment variable (preferred — use platform-managed secrets)
export FAL_KEY=your-key-here
# Option 2: CLI config (stored in ~/.picture-it/config.json with 0600 permissions)
picture-it auth --fal <fal-api-key>
NEVER paste API keys into chat. Always use environment variables or the CLI auth command. Get a FAL key from https://fal.ai.
Note: User images are uploaded to fal.ai for AI processing when using generate, edit, remove-bg, or upscale commands. Local-only commands (crop, grade, grain, vignette, text, compose, template, info) do not transmit data.
Core Concept
Every command takes an image in and outputs an image. Chain them to build anything. The agent calling picture-it IS the planner — there is no AI planner inside the tool.
Before You Generate Anything — Think First
Image generation costs real money ($0.03–$0.15 per FAL call). A 4-pass workflow is $0.10+. Don't burn budget on a vague idea — spend time planning before running any commands.
Step 1: Understand the purpose
Before touching picture-it, get full clarity on what the user wants. Ask yourself:
- What is this image for? (blog header, Instagram ad, YouTube thumbnail, product comparison, poster)
- Who is the audience? (developers, consumers, enterprise buyers)
- What should someone FEEL when they see it? (excitement, trust, urgency, curiosity)
- What's the one message? Every good image communicates exactly one thing.
- Where will it be displayed? This determines size, text sizing, and composition rules.
If any of these are unclear, ask the user before proceeding. A 30-second question saves $0.15 in wasted generation.
Step 2: Plan the composition
Think through at least 3 different approaches before picking one. Consider:
- Can this be done without FAL? Templates and Satori compose are free. A solid gradient + good typography is often enough.
- What's the minimum number of FAL calls? Each call costs money. Plan the fewest passes that achieve the goal.
- Which technique fits? Text-behind-subject for thumbnails, remove-bg + compose for product photos, multi-pass for cinematic scenes.
Present your top 2-3 ideas to the user briefly — one sentence each — and let them pick before generating. Example:
"Here are a few directions:
- Dramatic product shot — generate a dark stage, edit to place your logo as a glowing 3D object ($0.07)
- Clean comparison — remove-bg from both products, compose on gradient with text ($0.01)
- Text-behind-subject — generate an action scene, edit to weave the title behind the subject ($0.07)
Which direction, or a mix?"
Step 3: Plan the pipeline
Before running the first command, write out the full pipeline:
1. generate (flux-dev $0.03) — dark stage scene
2. edit (seedream $0.04) — place logo into scene
3. compose (free) — add text overlay
4. grade + vignette (free) — post-process
Total: ~$0.07
This avoids discovering mid-way that you need a different approach and wasting the earlier calls.
Commands Quick Reference
| Command | What it does | Needs FAL? |
|---|---|---|
generate |
Create image from text prompt | Yes |
edit |
Edit image(s) with AI | Yes |
remove-bg |
Remove background | Yes |
replace-bg |
Remove bg + generate new one | Yes |
crop |
Resize/crop to exact dimensions | No |
grade |
Apply color grading | No |
grain |
Add film grain | No |
vignette |
Add edge darkening | No |
text |
Render text onto image (Satori) | No |
compose |
Overlay images/text/shapes from JSON | No |
template |
Built-in templates (no AI) | No |
info |
Analyze image dimensions/colors | No |
Model Selection
Choose the right model for the job — don't overspend.
Generation-only models:
| Model | Cost | Best for |
|---|---|---|
flux-schnell |
$0.003 | Default. Fast drafts, backgrounds, base scenes |
imagineart |
$0.03 | High-fidelity realism, accurate text rendering |
flux-dev |
$0.03 | Detailed scenes, portraits, cinematic quality |
recraft-v3 |
$0.04 | Text in images, vector art, brand-style graphics |
fibo |
$0.04 | Enterprise, structured/controlled generation |
recraft-v4 |
$0.25 | Premium. Best composition, lighting, materials. Use sparingly |
Edit-only models:
| Model | Cost | Best for |
|---|---|---|
reve-fast |
$0.02 | Cheapest. Quick iterations, speed over refinement |
kontext-lora |
$0.035 | Edits with LoRA styles, brand-consistent modifications |
kontext |
$0.04 | Default. Targeted local edits, scene transforms, text placement |
reve |
$0.04 | Style transforms, product variations, context-aware edits |
fibo-edit |
$0.04 | Precise control with JSON + masks, object add/remove, restyling |
Both generate AND edit (use --model with either generate or edit):
| Model | Cost | Best for |
|---|---|---|
seedream-v4 |
$0.03 | Budget option. Good multi-image compositing |
seedream |
$0.04 | Multi-image compositing (up to 10 inputs), placing objects in scenes |
banana2 |
$0.08 | Better image preservation, >10 inputs, extreme aspect ratios, web search |
banana-pro |
$0.15 | Premium. Best realism, typography, character consistency for up to 5 people |
How to pick the right model:
| Task | Best model | Cost | Why |
|---|---|---|---|
| Quick background/draft | flux-schnell |
$0.003 | Fastest, cheapest |
| Quality hero image | flux-dev or imagineart |
$0.03 | Good balance of quality/cost |
| Text-heavy generation | recraft-v3 |
$0.04 | Best text rendering in generated images |
| Quick edit iteration | reve-fast |
$0.02 | Half the price, good enough for drafts |
| Single image edit (bg swap, add text) | kontext |
$0.04 | Best targeted edits |
| Compose multiple images into one scene | seedream |
$0.04 | Handles up to 10 inputs |
| Style transfer / product variations | reve |
$0.04 | Context-aware transforms |
| Precise masked edit (add/remove object) | fibo-edit |
$0.04 | JSON + mask control |
| Subject must stay very faithful | banana2 |
$0.08 | Best preservation |
| Premium quality, complex scene | banana-pro |
$0.15 | Best overall but expensive |
Background removal:
bria(default) — Best edge quality, clean cutoutsbirefnet— Good general purposepixelcut— Alternativerembg— Cheapest
How to Write Good Prompts
This is the difference between mediocre and professional output. Read references/prompt-library.md for a full library of tested prompts you can copy and adapt. Key rules:
For generation: Be specific about lighting ("dramatic side lighting from upper right"), camera ("shot on Canon R5 70-200mm f2.8"), and atmosphere ("dust particles visible in the light beam"). Vague prompts produce generic results.
For text-behind-subject: The key phrase is: "Add '[TEXT]' in large bold [color] letters BEHIND the [subject] — the [subject's] body overlaps and partially covers the letters." Without "BEHIND" and the occlusion instruction, the text floats on top.
For edits: Always end with "Keep everything else exactly the same" and list what to preserve. Without this, the AI changes things you didn't want changed.
For background replacement: Use realistic, specific locations ("modern upscale mall entrance during daytime, natural warm daylight"). Over-dramatic backgrounds ("city at night with neon reflections") look obviously fake.
Typography
For big titles and hero text: Use the FAL model via edit — it handles large text well and integrates it into the scene naturally. No font size math needed, just say "very large bold" in the prompt.
For precise small text (credits, URLs, badges, coverlines): Use compose or text with Satori. This is where font sizing matters — images display much smaller on phones. Quick rule: on a 1080px Instagram image, nothing under 36px is readable. Run picture-it download-fonts first if fonts aren't installed.
Hierarchy: Max 3 text sizes per image. Brand name should be larger than tagline.
Font pairing: Serif + sans-serif works best. For FAL model text, just describe the style in the prompt. For Satori, 3 fonts are bundled — drop more .ttf files into ~/.picture-it/fonts/. Run picture-it download-fonts if fonts aren't installed. See references/composition-guide.md for pairing suggestions.
Composition Techniques
Read references/composition-guide.md for detailed multi-pass workflows, product photography, magazine covers, and overlay composition.
Common Workflows
Simple: Generate an image
picture-it generate --prompt "dark cosmic background with nebula" --size 1200x630 -o bg.png
Simple: Add text to an image
picture-it text -i bg.png --title "Hello World" --font "Space Grotesk" --color white --font-size 64 -o hero.png
Medium: Blog header with AI background + text
picture-it generate --prompt "abstract dark tech background" --size 1200x630 -o bg.png
picture-it text -i bg.png --title "My Blog Post" --font "DM Serif Display" --font-size 72 -o header.png
picture-it grade -i header.png --name cinematic -o header-graded.png
Medium: Edit a photo background
picture-it edit -i photo.jpg --prompt "replace background with modern hotel entrance, keep subject identical" --model banana-pro -o edited.jpg
Advanced: Text behind subject (YouTube thumbnail style)
# 1. Generate a scene
picture-it generate --prompt "runner on mountain trail at golden hour" --model flux-dev --size 1280x720 -o runner.png
# 2. Use FAL edit to add text BEHIND the subject
picture-it edit -i runner.png --prompt "Add 'RUN FASTER' in large bold black letters BEHIND the runner — the runner's body overlaps the text" --model seedream -o thumbnail.png
Advanced: Product comparison with real photos
# 1. Remove backgrounds from product photos
picture-it remove-bg -i product-a.png --model bria -o a-cutout.png
picture-it remove-bg -i product-b.png --model bria -o b-cutout.png
# 2. Generate a background
picture-it generate --prompt "split gradient, blue left to orange right" --size 1200x630 -o bg.png
# 3. Compose cutouts onto background with text
picture-it compose -i bg.png --overlays overlays.json -o comparison.png
Advanced: Multi-pass cinematic composition
# 1. Generate base scene
picture-it generate --prompt "dark stage with green spotlight" --model flux-dev --size 2048x1080 -o stage.png
# 2. Edit scene to place objects
picture-it edit -i stage.png -i logo.png --prompt "Place Figure 2 as glowing 3D cube in the spotlight" --model seedream -o composed.png
# 3. Post-process
picture-it crop -i composed.png --size 1200x630 --position attention -o cropped.png
picture-it grade -i cropped.png --name cinematic -o graded.png
picture-it vignette -i graded.png --opacity 0.3 -o final.png
Platform Presets
Use --platform <name> with generate or crop:
| Preset | Size |
|---|---|
blog-featured |
1200x630 |
og-image |
1200x630 |
youtube-thumbnail |
1280x720 |
instagram-square |
1080x1080 |
instagram-story |
1080x1920 |
twitter-header |
1500x500 |
Output Behavior
- stdout: only the output file path
- stderr: progress logs
- Exit 0 on success, Exit 1 on failure
Read stdout to get the file path. This is how you chain commands.
Gotchas
- Always use
--model briaforremove-bg— the default birefnet leaves rectangular artifacts that cause ugly glow/shadow halos when compositing. - The
gloweffect in compose mode blurs the entire rectangular buffer, not the shape. Avoid using glow on cutout images — use the background color/lighting to create the glow effect instead. - The
shadoweffect has the same rectangular artifact issue. For cutout images on clean backgrounds, skip shadows entirely. - When editing with FAL, the model may alter product details (logos, text, design elements). For product images where accuracy matters, use
remove-bg+composeinstead ofeditto preserve the original exactly. - SeedDream takes ~60 seconds per generation. Don't assume it failed if it's slow.
- For
editwith banana-pro, don't passresolutionorlimit_generationsparams — it auto-detects. - Always
cropto exact dimensions after FAL generation — FAL models output approximate sizes. - Use
flux-dev($0.03) notflux-schnell($0.003) when image quality matters (hero images, portraits). The quality difference is significant. - Satori does NOT support: display:grid, transforms, animations, box-shadow, filters. Use flexbox only.
- When adding text behind a subject with
edit, be very explicit in the prompt: "the text is BEHIND the subject — the subject's body overlaps and partially covers the letters."