bananahub

Installation
SKILL.md

BananaHub

Generate or edit provider-backed images from non-English or mixed-language requests inside one /bananahub workflow. Gemini/Nano Banana remains the default model family, and OpenAI GPT Image support is provider-routed. BananaHub keeps prompt optimization, conservative enhancement, model fallback, image editing, template use, and BananaHub discovery in a single skill instead of splitting them across separate installs.

Quick Start

  • Install via Open Agent Skills: npx skills add https://github.com/bananahub-ai/bananahub-skill --skill bananahub
  • Install in Claude Code directly: claude skill install https://github.com/bananahub-ai/bananahub-skill
  • Run setup once: /bananahub init
  • Generate from a natural-language request: /bananahub 一只橘猫趴在键盘上打盹
  • Edit an image: /bananahub edit 把背景换成海滩 --input photo.png
  • Discover a reusable template: /bananahub discover 代码库讲解图

Key Paths

  • Generation script: {baseDir}/scripts/bananahub.py
  • Provider adapters: {baseDir}/scripts/providers/ — Gemini, OpenAI Images, and chat/completions-compatible runtime adapters
  • Runtime config module: {baseDir}/scripts/runtime_config.py — provider constants, aliases, transport defaults, config keys, and endpoint normalization
  • Config store module: {baseDir}/scripts/config_store.py — config loading, profile merge, validation, provider override, and serialization helpers
  • Prompt optimization rules: references/prompt-guide.md — read during Phase 1 (base optimization)
  • Enhancement profiles: references/profiles/{name}.md — read during Phase 3 (on-demand)
  • Official references: references/official-sources.md — authoritative source URLs, core example library
  • Capability registry: references/capability-registry.md — provider/model feature routing and fallback policy
  • Model registry: references/model-registry.json — canonical model ids, aliases, defaults, and provider families
  • Provider guides: references/providers/{provider}.md — lazy-loaded model-family prompt and runtime rules
  • Template system: references/template-system.md — read when handling templates/use/create-template commands
  • Hub discovery guide: references/hub-discovery.md — read when handling discover or when local template matching is weak
  • Template files: {baseDir}/references/templates/<id>/template.md (built-in) + ~/.config/bananahub/templates/<id>/template.md (user-installed)
  • Telemetry helper: python3 {baseDir}/scripts/bananahub.py telemetry ... — use for built-in/installed template adoption events
  • Telemetry state: ~/.config/bananahub/telemetry.json — stores the local anonymous usage id
  • Init guide: references/init-guide.md — read when handling init command
  • Optimization pipeline: references/optimization-pipeline.md — read when optimizing prompts
  • Template format spec: references/template-format-spec.md — detailed field definitions, repo structure, sample requirements
  • Template validator: python3 {baseDir}/scripts/validate_templates.py — validates bundled/user template metadata for schema v1/v2 compatibility
  • Mode detector: python3 {baseDir}/scripts/bananahub.py check-mode — reports provider-backed / host-native / prompt-only execution mode and capability layer boundaries
  • Prompt archive: current working directory bananahub-prompts/ when --save-prompt, --prompt-output, or BANANAHUB_SAVE_PROMPTS=1 is used
  • API config (priority high→low):
    1. --config <file> CLI flag
    2. Environment variables (GOOGLE_API_KEY, GEMINI_API_KEY, BANANAHUB_PROVIDER, BANANAHUB_AUTH_MODE, BANANAHUB_MODEL, GOOGLE_GEMINI_BASE_URL, GEMINI_BASE_URL, BANANAHUB_BASE_URL, GOOGLE_CLOUD_PROJECT, GOOGLE_CLOUD_LOCATION)
    3. Skill config: ~/.config/bananahub/config.json
      • {"provider": "google-ai-studio", "api_key": "...", "model": "gemini-3-pro-image-preview"}
      • {"provider": "gemini-compatible", "api_key": "...", "base_url": "https://..."}
      • {"provider": "openai", "openai_api_key": "...", "model": "gpt-image-2"}
      • {"provider": "openai-compatible", "api_key": "...", "base_url": "https://...", "model": "gpt-image-2"}
      • {"provider": "chatgpt-compatible", "chatgpt_api_key": "...", "chatgpt_base_url": "https://...", "model": "gpt-5.4"}
      • multi-profile: {"default_profile":"nano","profiles":{"nano":{"provider":"google-ai-studio","api_key":"..."},"gpt":{"provider":"openai","openai_api_key":"...","model":"gpt-image-2"}}}
      • {"provider": "vertex-ai", "auth_mode": "adc", "project": "...", "location": "global"}
    4. Persistent config helpers:
      • python3 {baseDir}/scripts/bananahub.py config show
      • python3 {baseDir}/scripts/bananahub.py config doctor --json
      • python3 {baseDir}/scripts/bananahub.py init --wizard
      • python3 {baseDir}/scripts/bananahub.py init --wizard --install-deps
      • python3 {baseDir}/scripts/bananahub.py config quickset --provider openai-compatible --profile gpt --default-profile --base-url https://your-openai-compatible-endpoint --api-key <key> --model gpt-image-2
      • python3 {baseDir}/scripts/bananahub.py config quickset --provider openai --profile gpt --default-profile --api-key <key> --model gpt-image-2
      • python3 {baseDir}/scripts/bananahub.py config quickset --provider google-ai-studio --profile nano --default-profile --api-key <key> --model gemini-3-pro-image-preview
      • python3 {baseDir}/scripts/bananahub.py config quickset --provider vertex-ai --profile vertex --default-profile --auth-mode adc --project <gcp-project> --location global
      • python3 {baseDir}/scripts/bananahub.py config set --clear-base-url
  • Output directory: current working directory (where the skill is invoked)

First-Run Detection

Before executing any command other than help, check if the environment is ready:

  1. Run python3 {baseDir}/scripts/bananahub.py config doctor --json when setup status is unclear.
  2. If status is needs_setup, read references/init-guide.md and either run init --wizard, run init --wizard --install-deps when dependencies are missing, or offer the suggested_commands[0] quickset command.
  3. Never ask the user to paste real API keys into chat; prefer the local wizard or a terminal command with <key> placeholder.
  4. If config exists but generation fails with auth/dependency errors → suggest config doctor --json or init --wizard.
  5. Persist new config into ~/.config/bananahub/config.json, preferably as a named profile (gpt, nano, vertex, or chat).
  6. Treat gpt-image-2 as the overall default model; provider-specific defaults still apply for Gemini/Vertex paths.
  7. Supported runtime providers:
    • google-ai-studio: generate / edit / models / init
    • gemini-compatible: generate / edit / models / init
    • vertex-ai: generate / edit / models / init
    • openai: OpenAI-native GPT Image generate / edit / models / init
    • openai-compatible: OpenAI-style endpoint generate / models / init, capability-dependent
    • chatgpt-compatible: chat/completions endpoint that returns images inside assistant replies
  8. openai-compatible is not the same as OpenAI-native GPT Image. Do not assume edit, mask edit, or GPT Image parameters unless the endpoint declares support.
  9. Endpoint normalization rules:
    • gemini-compatible: if the user pastes a URL ending in /v1beta, keep it conceptually but normalize the trailing version during runtime so it is not duplicated
    • openai-compatible: if the user pastes a bare host, the runtime may append /v1; for Google's official endpoint, resolve it to /v1beta/openai

Runtime Mode Layers

Run python3 {baseDir}/scripts/bananahub.py check-mode --pretty when the execution path is unclear. BananaHub has three execution modes:

Mode Trigger Behavior
provider-backed Config validates for a supported provider Optimize/render prompt, call generate or edit, and save image outputs
host-native Provider config is missing or incomplete, but BANANAHUB_HOST_IMAGEGEN=1 or the caller explicitly has a native image tool Optimize/render prompt, optionally archive it, then hand it to the host image tool instead of calling the provider script
prompt-only No valid provider and no host image tool Act as a prompt/template advisor: return the final prompt and archive it when requested; do not claim image generation succeeded

Capability ownership is layered:

  • Cross-model skill layer: prompt optimization, translation policy, conservative enhancement, --direct, --raw, prompt archiving, template discovery/activation, host-native delegation, and prompt-only advisory output.
  • Template layer: matching and activation are common, but provider/model compatibility, prompt variants, tested quality, and samples belong to template metadata.
  • Provider/model layer: image edit, mask edit, multi-reference, exact size, native quality, transparent background, output format/compression, and fallback are not universal; route them through references/capability-registry.md, references/model-registry.json, and provider adapters.

If a feature changes request payload shape, file validation, cost, policy behavior, or output parsing, do not treat it as cross-model even if several providers happen to support similar wording.

Command Routing

Route user input to the appropriate action based on arguments:

Argument Action
init Read references/init-guide.md, then diagnose and fix environment issues
help Show usage instructions (brief list of supported commands and examples)
<description> Read references/optimization-pipeline.md, then: base optimization → intent recognition → optional enhancement → generate
edit <description> --input <image-path> [--ref <reference-image>...] Edit an existing image: optimize prompt → call edit subcommand
optimize <description> Optimize prompt only; display result without generating
generate <English prompt> Generate image directly with given English prompt (skip optimization)
models Run python3 {baseDir}/scripts/bananahub.py models to query image-capable models from API
check-mode Run python3 {baseDir}/scripts/bananahub.py check-mode --pretty to inspect provider-backed / host-native / prompt-only mode and capability layers
templates Read references/template-system.md, then list all templates grouped by profile and type
templates <name> Read references/template-system.md, parse frontmatter type, then show prompt-template or workflow-template details accordingly
use <template-id> [custom description] Read references/template-system.md, parse frontmatter type, then either generate from a prompt template or activate a workflow template
discover <request> Read references/hub-discovery.md, then search BananaHub for matching templates without scraping the visual site
discover curated <request> Read references/hub-discovery.md, then search only the curated BananaHub catalog
discover trending Read references/hub-discovery.md, then show current trending BananaHub templates
create-template [description] Read references/template-system.md, determine whether the user needs a prompt or workflow template, then guide creation

Note:

  • optimize, --direct, and --raw are skill-layer controls interpreted by you before invoking the script
  • Do not pass --direct or --raw through to {baseDir}/scripts/bananahub.py
  • discover is also a skill-layer command: use BananaHub machine-readable files and npx bananahub add ..., not {baseDir}/scripts/bananahub.py
  • telemetry is an internal helper, not a user-facing chat command. Use it when a template is selected or successfully produces output.

Optional flags (append to any generation command):

  • --model <model_id> — specify model
  • --aspect <ratio> — aspect ratio (e.g., 16:9, 1:1, 9:16)
  • --image-size <preset> — native image-size preset (1K, 2K, 4K)
  • --openai-size <value> — OpenAI-native size for OpenAI-style image generation
  • --quality <value> — provider-native quality preset when supported
  • --background <value> — provider-native background option when supported
  • --output-format <value> — provider-native output format when supported
  • --output-compression <N> — provider-native output compression when supported
  • --resize <WxH> — post-process resize after generation/edit (e.g., 1024x1024)
  • --size <value> — legacy compatibility flag; 1K/2K/4K means native image size, WxH means post-process resize
  • --output <path> — specify output path
  • --save-prompt — archive the final prompt under bananahub-prompts/
  • --prompt-output <path> — archive the final prompt to a specific file or directory
  • --input <path> — source image for edit commands
  • --ref <path> [path...] — reference images for edit commands (Gemini up to 13 refs; OpenAI provider enforces its own lower runtime limit)
  • --mask <path> — OpenAI-native mask image for masked edits
  • --direct — direct mode: skip all confirmations, generate immediately
  • --raw — raw mode: translate only, no optimization
  • --retries <N> — retry count per model on 503 before fallback (default: 1, i.e. try each model twice)
  • --no-fallback — disable automatic model fallback

Three Optimization Modes

Mode 1: Default (no flag)

User input → Base optimization (silent) → Intent recognition → Profile match?
  ├─ Yes → Show enhancement suggestion → User confirms/edits/rejects → Generate
  └─ No (general) → Generate directly

Mode 2: Direct (--direct or user says "直接画/直出")

User input → Base optimization → Intent recognition → Load Profile enhancement → Generate directly

No confirmations. Suitable for experienced users or batch generation.

Mode 3: Raw (--raw)

User input → Translate to English only → Generate directly

No optimization. In-image text is still preserved in original language.

Prompt Optimization Summary

Read references/optimization-pipeline.md for the full pipeline. Overview:

  1. Phase 0: Extract hard constraints (exact_text, must_keep, must_avoid, style_lock, approved_baseline, allowed_delta when relevant)
  2. Phase 1: Base optimization — format correction, smart translation, structuring, conservative guardrail
  3. Phase 1.5: Capability/provider routing — inspect references/capability-registry.md, resolve model aliases from references/model-registry.json, then lazy-load references/providers/*.md only for the selected model family
  4. Phase 2: Intent recognition — match to one of 10 profiles via keyword table
  5. Phase 2.1: Local template auto-matching — suggest installed templates (progressive disclosure)
  6. Phase 2.2: BananaHub discovery — search remote catalog only when explicitly useful
  7. Phase 2.5: Style overlay detection (hand-drawn sketch-note)
  8. Phase 3: Enhancement — read matching profile from references/profiles/, classify subject, fill missing dimensions
  9. Phase 3.5: Model recommendation — prefer gpt-image-2 for generation-led high-fidelity outputs; prefer Gemini/Nano Banana for edit/reference/consistency-heavy flows unless the user or template overrides it

Image Generation Flow

  1. Build command:
    python3 {baseDir}/scripts/bananahub.py generate "<prompt>" [--aspect RATIO] [--model MODEL] [--output PATH]
    
    When this generation comes from an active template, also pass: --template-id <id> --template-repo <repo> --template-distribution bundled|remote --template-source curated|discovered
  2. Execute script and parse JSON output
  3. Automatic model fallback: on server error (500/502/503/504), tries the selected provider family fallback chain from references/model-registry.json. Do not cross provider families unless the user explicitly enables cross-provider fallback. Use --no-fallback to disable.
  4. On success:
    ✅ 图片已生成
    📁 路径: [file_path]
    🔧 模型: [model] | 宽高比: [ratio] | 尺寸: [WxH]
    📝 使用的 Prompt: [final prompt used]
    
    If the script returns template_telemetry, treat it as best-effort success reporting only; do not surface failures unless the user asked.
  5. On failure: suggest fix based on error type (content policy → rephrase, auth → check key, network → check proxy)

Image Editing Flow

  1. Validate input: confirm --input image path exists; validate --ref images Reject more than 13 reference images or more than 14 total images.
  2. Extract invariants: what must remain unchanged in the source image
  3. Lock the baseline when applicable: if the source image is an accepted result, treat it as the only source of truth for later rounds
  4. Name the allowed delta: isolate the one change this round is allowed to make
  5. Optimize edit prompt: run Phase 1 only (skip Phase 2/3); keep conservative, isolate the delta
  6. Build command:
    python3 {baseDir}/scripts/bananahub.py edit "<prompt>" --input <image_path> [--ref <ref1> ...] [--model MODEL] [--output PATH]
    
    --ref accepts up to 13 reference images. Total images (input + refs) ≤ 14. When this edit runs inside an active template/workflow, also pass: --template-id <id> --template-repo <repo> --template-distribution bundled|remote --template-source curated|discovered
  7. On success:
    ✅ 图片已编辑
    📁 路径: [file_path]
    📥 原图: [input_path]
    📎 参考图: [ref_images, if any]
    🔧 模型: [model] | 尺寸: [WxH]
    📝 使用的 Prompt: [final prompt used]
    

Multi-image use cases: style transfer, character consistency, multi-image blending, object replacement.

Iteration Guide

  • Change one variable at a time
  • Retain the last effective prompt as a base
  • Treat follow-ups as deltas, not full rewrites
  • Preserve locked constraints unless user explicitly changes them
  • After the user accepts an output, treat that file as the approved baseline until the user replaces it
  • For follow-up edits, state the exact keep-unchanged constraints before the allowed delta
  • For deterministic derivative tasks such as invert, crop, export, add safe padding, or build exact lockups, prefer local deterministic transforms instead of asking the model to redraw the asset

Template System Summary

Read references/template-system.md for the full template system. Overview:

  • Search paths: built-in (references/templates/) + user-installed (~/.config/bananahub/templates/)
  • Local vs remote: templates / use operate on installed templates; discover operates on BananaHub catalog, including the official bananahub-ai/templates library, and installs only on demand
  • Format: template.md with YAML frontmatter and type: prompt | workflow
  • Prompt templates: produce a reusable prompt with variables, then generate or edit
  • Workflow templates: act as progressive-disclosure context; load the workflow, ask only for missing blockers, and execute step-by-step with generate / edit primitives when needed
  • Model transparency: when a template or heuristic selects gpt-image-2 or Gemini/Nano Banana automatically, state that recommendation explicitly instead of hiding the model choice
  • Built-in starter examples: info-diagram for one-page infographics, article-one-page-summary for article explainers, background-replace-edit for edit workflows
  • Commands: templates (list installed), templates <name> (details), use <id> [desc] (activate), discover <need> (search hub), create-template (create)
  • Auto-matching: Phase 2.1 suggests installed templates first; Phase 2.2 can search BananaHub when local coverage is weak
  • Adoption telemetry: when a template is selected, call python3 {baseDir}/scripts/bananahub.py telemetry track --event selected ...; when template-driven generate/edit succeeds, pass template telemetry flags so the script can report generate_success / edit_success
  • Install more: prefer discover inside the skill; official rich templates install from bananahub-ai/templates, and known targets can still be installed with npx bananahub add <user/repo[/template]>
  • Publishing rule: when creating templates, save samples as sample-{model-short}-{nn}.png and make README list verified models, supported models, and sample-to-prompt mappings

Safety Rules

  • Never generate images that violate content policies (violence, sexual content, hate, etc.)
  • Never expose the API key in output
  • If a user request might trigger safety filters, proactively suggest alternative phrasing
Installs
14
GitHub Stars
117
First Seen
Apr 3, 2026