paperbanana

SKILL.md

paperbanana

Use this skill for PaperBanana operations in repo:

$PAPERBANANA_HOME  # default: ~/dev/paperbanana

Core Tasks

  1. Generate methodology diagrams from context text + caption
  2. Generate statistical plots from CSV/JSON + intent
  3. Evaluate generated diagrams vs reference images
  4. Run MCP server for agent tool integration
  5. Diagnose provider/key/quota/timeout failures

Preflight Checklist

Run these checks before generation:

paperbanana --help
paperbanana-mcp --help
echo "OPENROUTER_API_KEY set?"; [ -n "$OPENROUTER_API_KEY" ] && echo yes || echo no

Run scripts/probe_openrouter_models.py in this skill to verify model availability before long runs.

Recommended Workflow

  1. Provider: OpenRouter ($OPENROUTER_API_KEY)
  • VLM planning: google/gemini-3-flash-preview
  • Image generation: google/gemini-3.1-flash-image-preview
  1. Run a minimal smoke test first
paperbanana generate \
  --input examples/sample_inputs/transformer_method.txt \
  --caption "Overview of a transformer architecture" \
  --iterations 1
  1. Run full generation after smoke test passes

  2. Validate output directory (outputs/run_*/) and final_output.*

Language Policy (Default Chinese)

  • Unless the user explicitly requests another language, all visible chart/diagram text must be Simplified Chinese.
  • Keep technical identifiers unchanged when needed for accuracy (API paths, table names, error codes, model names).
  • Put a clear language constraint directly in --input context and --caption, e.g.:
    • 语言要求:图中所有可见文本必须为简体中文(必要的代码标识除外)。
  • If generated output still contains English labels, run paperbanana generate --continue-run <run_id> --feedback "<中文化改图要求>" --iterations 1 to force localization.

Command Templates

Generate Diagram

paperbanana generate \
  --input <method.txt> \
  --caption "<figure caption>" \
  --vlm-provider openrouter \
  --vlm-model google/gemini-3-flash-preview \
  --image-provider openrouter_imagen \
  --image-model google/gemini-3.1-flash-image-preview \
  --iterations <n>

Chinese-first template:

paperbanana generate \
  --input <method_zh.txt> \
  --caption "<中文图题,明确要求图中文字使用简体中文>" \
  --vlm-provider openrouter \
  --vlm-model google/gemini-3-flash-preview \
  --image-provider openrouter_imagen \
  --image-model google/gemini-3.1-flash-image-preview \
  --iterations 1

Continue Existing Run

paperbanana generate --continue --iterations 2
# or
paperbanana generate --continue-run <run_id> --feedback "<change request>" --iterations 2

Generate Plot

paperbanana plot \
  --data <data.csv|data.json> \
  --intent "<plot intent>" \
  --iterations 1

Chinese-first intent example:

paperbanana plot \
  --data <data.csv> \
  --intent "请生成简体中文统计图:标题、坐标轴、图例、注释均使用中文(变量名可保留英文)" \
  --iterations 1

Evaluate Diagram

paperbanana evaluate \
  --generated <generated.png> \
  --reference <reference.png> \
  --context <method.txt> \
  --caption "<figure caption>"

MCP Server

paperbanana-mcp

Troubleshooting Rules

  • 429 or rate-limit errors:
  • Long runtime without stdout is common on large multimodal steps:
    • Planner and Stylist can take >1 minute
    • Visualizer image generation can take >2 minutes
    • Check outputs/<run_id>/run_input.json to confirm run started
  • If one provider path is unstable, switch provider/model pair and rerun smoke test
  • Prefer small context + --iterations 1 for first pass

For detailed diagnostics and known failure patterns, read:

  • references/troubleshooting.md

For model availability probes, run:

  • scripts/probe_openrouter_models.py
Weekly Installs
1
First Seen
12 days ago
Installed on
codex1