figure-results-review

Installation
SKILL.md

Figure Results Review

Audit figures, plots, captions, and result narratives before they become paper evidence or meeting material.

Use this skill when:

  • the user has a figure, plot, result screenshot, figure caption, result section, or slide with experimental evidence
  • the paper stores each figure as a rendered asset such as figures/fig_name.pdf or figures/fig_name.png plus a LaTeX wrapper such as figures/fig_name.tex
  • a paper claim needs to be checked against the actual displayed evidence
  • a plot may be missing baselines, error bars, seeds, labels, units, or fairness context
  • paper figures need a consistent visual style: color palette, markers, symbols, line widths, fonts, sizing, and notation
  • new results require deciding whether to update writing, rerun experiments, diagnose failures, or narrow claims
  • a rebuttal needs a concise visual answer
  • an advisor meeting needs figures that make the decision obvious

Do not use this skill to design experiments from scratch. Use experiment-design-planner before results exist. Use result-diagnosis when the primary issue is why a result is surprising or broken. Use conference-writing-adapter when the main task is prose style after the evidence is already accepted.

Pair this skill with:

  • paper-result-asset-builder when a paper-facing figure needs to be generated or regenerated from CSV result files before visual review
  • paper-evidence-gap-miner when the figure review reveals a missing result and existing CSVs may already contain the needed evidence
  • paper-evidence-board when figures must be linked to paper claims, sections, reviewer risks, and actions
  • result-diagnosis when a plotted result is suspicious, unstable, negative, or contradictory
  • baseline-selection-audit when the visual exposes missing, weak, or unfair baselines
  • experiment-design-planner when the fix requires new experiments, ablations, controls, or metrics
  • experiment-report-writer when raw results need a structured report before figure review
  • conference-writing-adapter when the final figure narrative or visual style must be adapted to a target venue
  • research-project-memory when claim/evidence/provenance/risk/action/handoff updates should persist across sessions

Skill Directory Layout

<installed-skill-dir>/
├── SKILL.md
├── templates/
│   └── visual-style.md
└── references/
    ├── caption-and-narrative.md
    ├── claim-support.md
    ├── memory-writeback.md
    ├── paper-visual-style.md
    ├── report-template.md
    ├── statistical-evidence.md
    ├── style-memory.md
    └── visual-integrity.md

Progressive Loading

  • Always read references/claim-support.md, references/visual-integrity.md, and references/statistical-evidence.md.
  • Read references/paper-visual-style.md and references/style-memory.md when figures are intended for a paper, slide deck, rebuttal, camera-ready, or venue-specific rewrite.
  • Use templates/visual-style.md when initializing paper/.agent/visual-style.md.
  • Read references/caption-and-narrative.md when revising captions, result prose, slide text, or paper figure callouts.
  • Read references/report-template.md before writing the final review.
  • Read references/memory-writeback.md when the project has memory/, component .agent/ folders, or the user asks for persistent project memory.
  • If the expected plotting conventions depend on a target venue, benchmark, or recent paper style, verify with current accepted papers, official benchmark protocols, or user-provided exemplars.
  • If the actual image cannot be inspected, audit the provided data/caption/prose and clearly mark visual-layout judgments as unverified.

Core Principles

  • A figure is evidence for a specific claim, not decoration.
  • A figure bundle has separate layers: rendered asset, LaTeX wrapper, visual description, caption, main-text callout, and provenance.
  • Figure description and caption are different artifacts. Describe what the image shows before interpreting why it matters.
  • Every plot should answer one reviewer question.
  • The main comparison should be visually and numerically easy to find.
  • Captions must state enough setup for the result to be interpreted without searching the paper.
  • Statistical uncertainty, seeds, and variance matter when the claim depends on small differences.
  • Compute, data, baseline, and protocol fairness must be visible when they affect interpretation.
  • Paper figures should share a deliberate visual language. Style choices are part of writing because they control what reviewers notice first.
  • Visual style should evolve as memory: record one-off lessons, promote repeated preferences into project contracts, and only then generalize them into reusable skill rules.
  • A beautiful plot that does not support the claim should be revised or cut.
  • New results must update claims, writing, reviewer risks, and next actions.
  • When a figure causes paper layout trouble, first localize the affected page and figure wrapper. Prefer local wrapper/prose/placement fixes over global float spacing or paragraph settings; route broader submission-layout debugging to submit-paper.

Step 1 - Recover Evidence Context

Collect:

  • figure file path, screenshot, raw data, caption, or result prose
  • rendered asset path such as figures/fig_name.pdf or figures/fig_name.png
  • LaTeX wrapper path such as figures/fig_name.tex, if the paper uses wrapper files
  • paper claim or section the result is meant to support
  • experiment setup: dataset, model, baseline, metric, seed, split, hyperparameters, protocol
  • plotting setup: plotted variables, filters, smoothing, transforms, axis ranges, colormap, annotation rules, aggregation, and code/config path when available
  • target audience: paper, advisor meeting, slide, rebuttal, internal report, or appendix
  • target venue or benchmark expectations
  • current paper location, if any
  • linked project memory IDs such as CLM-###, EVD-###, FIG-###, TAB-###, RSK-###, or ACT-###

Rewrite the intended evidence relation:

This figure is supposed to show that [claim] because [metric/comparison/trend] under [setup].

If that sentence cannot be written, route to paper-evidence-board before polishing the visual.

Step 2 - Resolve the Figure Bundle

For paper figures, identify the bundle by shared stem:

figures/fig_name.pdf or figures/fig_name.png  # rendered asset
figures/fig_name.tex                          # LaTeX wrapper

Inspect both layers when available:

  • rendered asset: what the reader visually sees
  • wrapper .tex: \includegraphics, width, placement, \caption{}, \label{}, subfigure layout, notes, and whether the asset filename matches the intended figure
  • nearby paper source: where the wrapper is \input{} or \include{} and how the main text calls it out

If a wrapper exists without a matching asset, or an asset exists without a wrapper, flag the bundle as incomplete. If multiple asset formats exist, identify which one the wrapper includes.

Step 3 - Describe Before Interpreting

For each figure, produce a visual description before writing or judging the caption.

The visual description should state:

  • figure type and panel structure
  • axes, units, scales, and directionality
  • methods, datasets, variables, colors, markers, and legends
  • visible trends, comparisons, outliers, missing values, and uncertainty
  • plotting parameters needed to reproduce what is shown
  • experiment parameters needed to interpret the result: dataset split, model/checkpoint, baselines, metric, seed count, sampling budget, hyperparameters, compute budget, or protocol
  • what is directly visible versus inferred from logs, configs, filename, caption, or user statement

Do not put the full visual description into the paper caption. Use it as the audit record that checks whether the caption and paper prose are faithful to the figure.

Step 4 - Audit Claim Support

Read references/claim-support.md.

For each figure, answer:

  • what exact claim does it support?
  • is the displayed evidence sufficient for that claim?
  • is the claim too broad for the measured setup?
  • are baselines, ablations, controls, or diagnostics missing?
  • does the result contradict another figure or section?
  • is the result main-paper material, appendix material, diagnostic material, or not ready?

Assign one status:

  • supports-claim
  • supports-narrower-claim
  • ambiguous
  • contradicts-claim
  • diagnostic-only
  • not-ready

Step 5 - Audit Visual Integrity

Read references/visual-integrity.md.

Check:

  • axes, labels, units, scales, and transformations
  • legend readability and method names
  • ordering of methods, datasets, metrics, and ablations
  • whether the main result is visually salient
  • whether color, markers, line styles, or hatching remain readable in grayscale
  • whether figure size works for one-column, two-column, slide, or appendix usage
  • whether captions and labels match the actual plotted data
  • whether figure wrapper width, cropping, subfigure order, and labels match the rendered asset and visual description
  • whether the wrapper creates local whitespace, page-bottom instability, or fragile wrap behavior that should be fixed near this figure rather than by global LaTeX tuning

Flag any issue that could cause a reviewer to misread the result.

Step 6 - Audit Paper Visual Style

Read references/paper-visual-style.md and references/style-memory.md when the output is paper-facing.

Check:

  • color palette and colorblind/grayscale robustness
  • stable method-to-color and method-to-marker mapping across all figures
  • line width, marker size, hatch, symbol, and notation consistency
  • font size, tick density, label length, and final-column readability
  • figure dimensions for one-column, two-column, appendix, or slide use
  • whether visual emphasis matches the paper's claim hierarchy
  • whether the main method is recognizable without relying only on color
  • whether theorem/method symbols in plots match the paper notation

If the paper has no visual style policy, propose one from templates/visual-style.md and record it in paper/.agent/visual-style.md or .agent/conference-writing/project-style.md when appropriate.

For typography and final sizing, check the contract rather than the notebook preview:

  • final LaTeX insertion width, such as \columnwidth, \linewidth, or \textwidth
  • generated figure size in inches
  • axis label, tick label, legend, annotation, and panel-label sizes in points
  • whether LaTeX scales the asset enough to distort the intended typography hierarchy
  • whether code-side plot_style.yaml and paper-side visual-style.md agree when both exist

If a style issue is discovered, classify it as lesson, preference, project contract, or reusable skill rule candidate before writing it back.

Step 7 - Audit Statistical and Experimental Evidence

Read references/statistical-evidence.md.

Check:

  • number of seeds or repeated runs
  • variance, confidence intervals, standard deviation, or standard error
  • significance or effect-size interpretation when differences are small
  • data split and leakage risk
  • metric definition and averaging
  • baseline fairness and tuning budget
  • compute or speed reporting when efficiency is part of the claim
  • failure cases or negative results that should be shown
  • whether plotting parameters and experiment parameters are recoverable from the figure wrapper, caption, result report, logs, config, or paper text

If the plot lacks necessary uncertainty, decide whether to rerun, add error bars, weaken the claim, or move the result to appendix/diagnostic status.

Step 8 - Review Caption and Result Narrative

Read references/caption-and-narrative.md when output text needs revision.

For each figure, produce:

  • visual description
  • provenance summary: rendered asset, wrapper .tex, plotting parameters, experiment parameters, and source certainty
  • caption-image alignment diagnosis
  • caption diagnosis
  • revised caption or caption outline
  • one-sentence paper callout
  • claims to avoid in nearby prose
  • reviewer question answered
  • missing setup details to add

Captions should not oversell. They should state the setup, comparison, metric, and takeaway.

Step 9 - Route Fixes

For every issue, route to one or more actions:

  • fix-wrapper: wrong asset path, stale caption, label mismatch, width/crop/layout issue, or missing subfigure mapping in figures/*.tex
  • edit-figure: labels, ordering, scale, legend, layout, or visual emphasis
  • rewrite-caption: setup, metric, takeaway, caveat, or claim alignment
  • write-description: missing visual description or missing provenance record
  • rewrite-results-text: nearby paper prose overclaims or misses the takeaway
  • define-visual-style: missing or inconsistent paper visual style policy
  • record-style-lesson: new typography, sizing, legend, marker, color, export, or wrapper lesson should be appended to style memory before becoming a hard rule
  • restyle-figure: color, marker, line width, font size, symbol, panel layout, or emphasis
  • build-result-asset: raw CSV evidence exists but the paper-facing figure or wrapper needs to be generated with provenance
  • mine-existing-results: the figure lacks evidence that may already exist in CSVs or reports
  • rerun: missing seeds, variance, baseline, metric, or protocol after existing results are checked
  • diagnose-result: suspicious, negative, unstable, or contradictory pattern
  • baseline-audit: missing or unfair baseline
  • narrow-claim: evidence only supports a smaller statement
  • move-to-appendix: useful but not central enough for main paper
  • cut: visual does not support a paper need

Name the next skill when appropriate.

Step 10 - Write the Review Report

Read references/report-template.md.

If saving to a project and no path is given, use:

docs/results/figure_results_review_YYYY-MM-DD_<short-name>.md

The report must include:

  • figure inventory
  • figure bundle map: rendered asset, wrapper .tex, paper callout location, label
  • visual descriptions
  • plotting and experiment parameter provenance
  • claim-support status
  • visual integrity issues
  • visual style policy and consistency issues
  • new style lessons or preference-to-contract promotions
  • statistical evidence issues
  • caption and narrative fixes
  • reviewer-risk forecast
  • routed actions and next skills
  • memory update section

Step 11 - Write Back to Project Memory

Read references/memory-writeback.md when memory exists.

Update the smallest useful set of entries:

  • memory/evidence-board.md: figure evidence status, rendered asset, wrapper .tex, setup, plotting parameters, experiment parameters, and linked claims
  • memory/claim-board.md: claims supported, narrowed, contradicted, or not ready
  • memory/risk-board.md: reviewer risks from visual ambiguity, missing uncertainty, weak baselines, or overclaiming
  • memory/action-board.md: figure edits, reruns, caption fixes, result diagnosis, or claim revisions
  • paper/.agent/: figure map, asset/wrapper pairings, paper locations, visual descriptions, caption state, provenance gaps, and stale visual warnings
  • paper/.agent/visual-style.md or paper/.agent/style-lessons.md: style lessons, preferences, and project contracts for typography, sizing, encodings, exports, and wrapper behavior
  • .agent/conference-writing/project-style.md: venue-facing figure style decisions when conference adaptation is active
  • worktree .agent/worktree-status.md: result-generation or plotting tasks and exit conditions

Use certainty labels:

  • verified for values checked against raw data, logs, or source figures
  • user-stated for user-supplied context
  • inferred for reviewer-risk and narrative judgments
  • unverified for visual or statistical claims that could not be inspected

Final Sanity Check

Before finalizing:

  • every figure has a linked claim and reviewer question
  • every paper figure has a resolved asset/wrapper bundle when the project uses figures/*.tex
  • every reviewed figure has a visual description separate from its caption
  • plotting parameters and experiment parameters are recorded or explicitly marked unknown
  • main comparison is easy to find
  • axes, units, legends, captions, and labels are unambiguous
  • colors, markers, fonts, symbols, and figure sizes are consistent across the paper
  • style lessons are recorded at the right promotion level rather than silently becoming broad rules
  • uncertainty is present or the lack of uncertainty is justified
  • baseline and compute fairness are visible when relevant
  • overclaims are narrowed
  • fixes are routed to concrete next actions or skills
  • project memory is updated when present
Related skills
Installs
28
GitHub Stars
4
First Seen
8 days ago