Evaluate Diagram

Evaluate a generated diagram against a human reference using PaperBanana's VLM-as-Judge scoring.

Inputs

Required: $ARGUMENTS[0] — path to the generated image
Required: $ARGUMENTS[1] — path to the human reference image
Optional: User-provided context file path and figure caption (collected in procedure)

Scope Constraints

Read ONLY user-specified image files and optional context file
Do NOT read, write, or reference home directory dotfiles (~/.ssh, ~/.env, etc.)
Do NOT make network requests — the MCP tool handles remote communication
Do NOT install packages or modify system state
Output ONLY evaluation scores — do not include raw file contents

Input Sanitization

Before using $ARGUMENTS[0], $ARGUMENTS[1], or user-provided context paths:

Reject paths containing ../, null bytes, or shell metacharacters (; | & $ `)
Reject absolute paths to sensitive directories (/etc/, ~/.ssh/, ~/.aws/, ~/.gnupg/)
Verify each file exists before reading

Procedure

$ARGUMENTS[0] is the path to the generated image.
$ARGUMENTS[1] is the path to the human reference image.
Ask the user for:
- Source context: the methodology text (or a file path to read it from). If the user provides a file path, read that file to get the text.
- Figure caption: a description of what the diagram communicates.
Call the MCP tool paperbanana:evaluate_diagram with:
- generated_path: the generated image path
- reference_path: the reference image path
- context: the methodology text content
- caption: the figure caption
Present the evaluation scores to the user. Scores cover 4 dimensions: Faithfulness, Conciseness, Readability, and Aesthetics.

Output Format

Present scores in a summary table with the 4 dimensions (Faithfulness, Conciseness, Readability, Aesthetics), each with its numeric score and brief rationale.

CLI Fallback

If the MCP tool is not available, fall back to the CLI:

paperbanana evaluate --generated <generated-img> --reference <reference-img> --context <context-file> --caption "<caption>"

Example

/evaluate-diagram output.png reference.png

evaluate-diagram

Evaluate Diagram

Inputs

Scope Constraints

Input Sanitization

Procedure

Output Format

CLI Fallback

Example

More from dtsong/my-claude-setup

web-security-hardening

web-design-guidelines

soc-security-skills

tdd

vercel-react-best-practices

workflow