evaluate-diagram
Evaluate Diagram
Evaluate a generated diagram against a human reference using PaperBanana's VLM-as-Judge scoring.
Inputs
- Required:
$ARGUMENTS[0]— path to the generated image - Required:
$ARGUMENTS[1]— path to the human reference image - Optional: User-provided context file path and figure caption (collected in procedure)
Scope Constraints
- Read ONLY user-specified image files and optional context file
- Do NOT read, write, or reference home directory dotfiles (~/.ssh, ~/.env, etc.)
- Do NOT make network requests — the MCP tool handles remote communication
- Do NOT install packages or modify system state
- Output ONLY evaluation scores — do not include raw file contents
Input Sanitization
Before using $ARGUMENTS[0], $ARGUMENTS[1], or user-provided context paths:
- Reject paths containing
../, null bytes, or shell metacharacters (; | & $ `) - Reject absolute paths to sensitive directories (/etc/, ~/.ssh/, ~/.aws/, ~/.gnupg/)
- Verify each file exists before reading
Procedure
$ARGUMENTS[0]is the path to the generated image.$ARGUMENTS[1]is the path to the human reference image.- Ask the user for:
- Source context: the methodology text (or a file path to read it from). If the user provides a file path, read that file to get the text.
- Figure caption: a description of what the diagram communicates.
- Call the MCP tool
paperbanana:evaluate_diagramwith:generated_path: the generated image pathreference_path: the reference image pathcontext: the methodology text contentcaption: the figure caption
- Present the evaluation scores to the user. Scores cover 4 dimensions: Faithfulness, Conciseness, Readability, and Aesthetics.
Output Format
Present scores in a summary table with the 4 dimensions (Faithfulness, Conciseness, Readability, Aesthetics), each with its numeric score and brief rationale.
CLI Fallback
If the MCP tool is not available, fall back to the CLI:
paperbanana evaluate --generated <generated-img> --reference <reference-img> --context <context-file> --caption "<caption>"
Example
/evaluate-diagram output.png reference.png
More from dtsong/my-claude-setup
web-security-hardening
Security audit checklist for web applications. Use when reviewing, auditing, or hardening a web app's security posture. Covers rate limiting, auth headers, IP blocking, CORS, security middleware, input validation, file upload limits, ORM usage, and password hashing. Triggers on requests like "review security", "harden this app", "security audit", "check for vulnerabilities", or when building/reviewing API endpoints.
26web-design-guidelines
Review UI code for Web Interface Guidelines compliance. Use when asked to "review my UI", "check accessibility", "audit design", "review UX", or "check my site against best practices".
8soc-security-skills
>
6tdd
>
3vercel-react-best-practices
React and Next.js performance optimization guidelines from Vercel Engineering. This skill should be used when writing, reviewing, or refactoring React/Next.js code to ensure optimal performance patterns. Triggers on tasks involving React components, Next.js pages, data fetching, bundle optimization, or performance improvements.
3workflow
Use when planning implementation steps, deciding commit format, or structuring development approach. Provides brainstorm-plan-implement flow with conventional commits. Triggers on 'how should I approach this', 'commit format'.
2