Figure Results Review

Audit figures, plots, captions, and result narratives before they become paper evidence or meeting material.

Use this skill when:

the user has a figure, plot, result screenshot, figure caption, result section, or slide with experimental evidence
the paper stores each figure as a rendered asset such as figures/fig_name.pdf or figures/fig_name.png plus a LaTeX wrapper such as figures/fig_name.tex
a paper claim needs to be checked against the actual displayed evidence
a plot may be missing baselines, error bars, seeds, labels, units, or fairness context
paper figures need a consistent visual style: color palette, markers, symbols, line widths, fonts, sizing, and notation
new results require deciding whether to update writing, rerun experiments, diagnose failures, or narrow claims
a rebuttal needs a concise visual answer
an advisor meeting needs figures that make the decision obvious

Do not use this skill to design experiments from scratch. Use experiment-design-planner before results exist. Use result-diagnosis when the primary issue is why a result is surprising or broken. Use conference-writing-adapter when the main task is prose style after the evidence is already accepted.

Pair this skill with:

paper-result-asset-builder when a paper-facing figure needs to be generated or regenerated from CSV result files before visual review
paper-evidence-gap-miner when the figure review reveals a missing result and existing CSVs may already contain the needed evidence
paper-evidence-board when figures must be linked to paper claims, sections, reviewer risks, and actions
result-diagnosis when a plotted result is suspicious, unstable, negative, or contradictory
baseline-selection-audit when the visual exposes missing, weak, or unfair baselines
experiment-design-planner when the fix requires new experiments, ablations, controls, or metrics
experiment-report-writer when raw results need a structured report before figure review
conference-writing-adapter when the final figure narrative or visual style must be adapted to a target venue
research-project-memory when claim/evidence/provenance/risk/action/handoff updates should persist across sessions

Skill Directory Layout

<installed-skill-dir>/
├── SKILL.md
├── templates/
│   └── visual-style.md
└── references/
    ├── caption-and-narrative.md
    ├── claim-support.md
    ├── memory-writeback.md
    ├── paper-visual-style.md
    ├── report-template.md
    ├── statistical-evidence.md
    ├── style-memory.md
    └── visual-integrity.md

Progressive Loading

Always read references/claim-support.md, references/visual-integrity.md, and references/statistical-evidence.md.
Read references/paper-visual-style.md and references/style-memory.md when figures are intended for a paper, slide deck, rebuttal, camera-ready, or venue-specific rewrite.
Use templates/visual-style.md when initializing paper/.agent/visual-style.md.
Read references/caption-and-narrative.md when revising captions, result prose, slide text, or paper figure callouts.
Read references/report-template.md before writing the final review.
Read references/memory-writeback.md when the project has memory/, component .agent/ folders, or the user asks for persistent project memory.
If the expected plotting conventions depend on a target venue, benchmark, or recent paper style, verify with current accepted papers, official benchmark protocols, or user-provided exemplars.
If the actual image cannot be inspected, audit the provided data/caption/prose and clearly mark visual-layout judgments as unverified.

Core Principles

A figure is evidence for a specific claim, not decoration.
A figure bundle has separate layers: rendered asset, LaTeX wrapper, visual description, caption, main-text callout, and provenance.
Figure description and caption are different artifacts. Describe what the image shows before interpreting why it matters.
Every plot should answer one reviewer question.
The main comparison should be visually and numerically easy to find.
Captions must state enough setup for the result to be interpreted without searching the paper.
Statistical uncertainty, seeds, and variance matter when the claim depends on small differences.
Compute, data, baseline, and protocol fairness must be visible when they affect interpretation.
Paper figures should share a deliberate visual language. Style choices are part of writing because they control what reviewers notice first.
Visual style should evolve as memory: record one-off lessons, promote repeated preferences into project contracts, and only then generalize them into reusable skill rules.
A beautiful plot that does not support the claim should be revised or cut.
New results must update claims, writing, reviewer risks, and next actions.
When a figure causes paper layout trouble, first localize the affected page and figure wrapper. Prefer local wrapper/prose/placement fixes over global float spacing or paragraph settings; route broader submission-layout debugging to submit-paper.

Step 1 - Recover Evidence Context

Collect:

figure file path, screenshot, raw data, caption, or result prose
rendered asset path such as figures/fig_name.pdf or figures/fig_name.png
LaTeX wrapper path such as figures/fig_name.tex, if the paper uses wrapper files
paper claim or section the result is meant to support
experiment setup: dataset, model, baseline, metric, seed, split, hyperparameters, protocol
plotting setup: plotted variables, filters, smoothing, transforms, axis ranges, colormap, annotation rules, aggregation, and code/config path when available
target audience: paper, advisor meeting, slide, rebuttal, internal report, or appendix
target venue or benchmark expectations
current paper location, if any
linked project memory IDs such as CLM-###, EVD-###, FIG-###, TAB-###, RSK-###, or ACT-###

Rewrite the intended evidence relation:

This figure is supposed to show that [claim] because [metric/comparison/trend] under [setup].

If that sentence cannot be written, route to paper-evidence-board before polishing the visual.

Step 2 - Resolve the Figure Bundle

For paper figures, identify the bundle by shared stem:

figures/fig_name.pdf or figures/fig_name.png  # rendered asset
figures/fig_name.tex                          # LaTeX wrapper

Inspect both layers when available:

rendered asset: what the reader visually sees
wrapper .tex: \includegraphics, width, placement, \caption{}, \label{}, subfigure layout, notes, and whether the asset filename matches the intended figure
nearby paper source: where the wrapper is \input{} or \include{} and how the main text calls it out

If a wrapper exists without a matching asset, or an asset exists without a wrapper, flag the bundle as incomplete. If multiple asset formats exist, identify which one the wrapper includes.

Step 3 - Describe Before Interpreting

For each figure, produce a visual description before writing or judging the caption.

The visual description should state:

figure type and panel structure
axes, units, scales, and directionality
methods, datasets, variables, colors, markers, and legends
visible trends, comparisons, outliers, missing values, and uncertainty
plotting parameters needed to reproduce what is shown
experiment parameters needed to interpret the result: dataset split, model/checkpoint, baselines, metric, seed count, sampling budget, hyperparameters, compute budget, or protocol
what is directly visible versus inferred from logs, configs, filename, caption, or user statement

Do not put the full visual description into the paper caption. Use it as the audit record that checks whether the caption and paper prose are faithful to the figure.

Step 4 - Audit Claim Support

Read references/claim-support.md.

For each figure, answer:

what exact claim does it support?
is the displayed evidence sufficient for that claim?
is the claim too broad for the measured setup?
are baselines, ablations, controls, or diagnostics missing?
does the result contradict another figure or section?
is the result main-paper material, appendix material, diagnostic material, or not ready?

Assign one status:

supports-claim
supports-narrower-claim
ambiguous
contradicts-claim
diagnostic-only
not-ready

Step 5 - Audit Visual Integrity

Read references/visual-integrity.md.

Check:

axes, labels, units, scales, and transformations
legend readability and method names
ordering of methods, datasets, metrics, and ablations
whether the main result is visually salient
whether color, markers, line styles, or hatching remain readable in grayscale
whether figure size works for one-column, two-column, slide, or appendix usage
whether captions and labels match the actual plotted data
whether figure wrapper width, cropping, subfigure order, and labels match the rendered asset and visual description
whether the wrapper creates local whitespace, page-bottom instability, or fragile wrap behavior that should be fixed near this figure rather than by global LaTeX tuning

Flag any issue that could cause a reviewer to misread the result.

Step 6 - Audit Paper Visual Style

Read references/paper-visual-style.md and references/style-memory.md when the output is paper-facing.

Check:

color palette and colorblind/grayscale robustness
stable method-to-color and method-to-marker mapping across all figures
line width, marker size, hatch, symbol, and notation consistency
font size, tick density, label length, and final-column readability
figure dimensions for one-column, two-column, appendix, or slide use
whether visual emphasis matches the paper's claim hierarchy
whether the main method is recognizable without relying only on color
whether theorem/method symbols in plots match the paper notation

If the paper has no visual style policy, propose one from templates/visual-style.md and record it in paper/.agent/visual-style.md or .agent/conference-writing/project-style.md when appropriate.

For typography and final sizing, check the contract rather than the notebook preview:

final LaTeX insertion width, such as \columnwidth, \linewidth, or \textwidth
generated figure size in inches
axis label, tick label, legend, annotation, and panel-label sizes in points
whether LaTeX scales the asset enough to distort the intended typography hierarchy
whether code-side plot_style.yaml and paper-side visual-style.md agree when both exist

If a style issue is discovered, classify it as lesson, preference, project contract, or reusable skill rule candidate before writing it back.

Step 7 - Audit Statistical and Experimental Evidence

Read references/statistical-evidence.md.

Check:

number of seeds or repeated runs
variance, confidence intervals, standard deviation, or standard error
significance or effect-size interpretation when differences are small
data split and leakage risk
metric definition and averaging
baseline fairness and tuning budget
compute or speed reporting when efficiency is part of the claim
failure cases or negative results that should be shown
whether plotting parameters and experiment parameters are recoverable from the figure wrapper, caption, result report, logs, config, or paper text

If the plot lacks necessary uncertainty, decide whether to rerun, add error bars, weaken the claim, or move the result to appendix/diagnostic status.

Step 8 - Review Caption and Result Narrative

Read references/caption-and-narrative.md when output text needs revision.

For each figure, produce:

visual description
provenance summary: rendered asset, wrapper .tex, plotting parameters, experiment parameters, and source certainty
caption-image alignment diagnosis
caption diagnosis
revised caption or caption outline
one-sentence paper callout
claims to avoid in nearby prose
reviewer question answered
missing setup details to add

Captions should not oversell. They should state the setup, comparison, metric, and takeaway.

Step 9 - Route Fixes

For every issue, route to one or more actions:

fix-wrapper: wrong asset path, stale caption, label mismatch, width/crop/layout issue, or missing subfigure mapping in figures/*.tex
edit-figure: labels, ordering, scale, legend, layout, or visual emphasis
rewrite-caption: setup, metric, takeaway, caveat, or claim alignment
write-description: missing visual description or missing provenance record
rewrite-results-text: nearby paper prose overclaims or misses the takeaway
define-visual-style: missing or inconsistent paper visual style policy
record-style-lesson: new typography, sizing, legend, marker, color, export, or wrapper lesson should be appended to style memory before becoming a hard rule
restyle-figure: color, marker, line width, font size, symbol, panel layout, or emphasis
build-result-asset: raw CSV evidence exists but the paper-facing figure or wrapper needs to be generated with provenance
mine-existing-results: the figure lacks evidence that may already exist in CSVs or reports
rerun: missing seeds, variance, baseline, metric, or protocol after existing results are checked
diagnose-result: suspicious, negative, unstable, or contradictory pattern
baseline-audit: missing or unfair baseline
narrow-claim: evidence only supports a smaller statement
move-to-appendix: useful but not central enough for main paper
cut: visual does not support a paper need

Name the next skill when appropriate.

Step 10 - Write the Review Report

Read references/report-template.md.

If saving to a project and no path is given, use:

docs/results/figure_results_review_YYYY-MM-DD_<short-name>.md

The report must include:

figure inventory
figure bundle map: rendered asset, wrapper .tex, paper callout location, label
visual descriptions
plotting and experiment parameter provenance
claim-support status
visual integrity issues
visual style policy and consistency issues
new style lessons or preference-to-contract promotions
statistical evidence issues
caption and narrative fixes
reviewer-risk forecast
routed actions and next skills
memory update section

Step 11 - Write Back to Project Memory

Read references/memory-writeback.md when memory exists.

Update the smallest useful set of entries:

memory/evidence-board.md: figure evidence status, rendered asset, wrapper .tex, setup, plotting parameters, experiment parameters, and linked claims
memory/claim-board.md: claims supported, narrowed, contradicted, or not ready
memory/risk-board.md: reviewer risks from visual ambiguity, missing uncertainty, weak baselines, or overclaiming
memory/action-board.md: figure edits, reruns, caption fixes, result diagnosis, or claim revisions
paper/.agent/: figure map, asset/wrapper pairings, paper locations, visual descriptions, caption state, provenance gaps, and stale visual warnings
paper/.agent/visual-style.md or paper/.agent/style-lessons.md: style lessons, preferences, and project contracts for typography, sizing, encodings, exports, and wrapper behavior
.agent/conference-writing/project-style.md: venue-facing figure style decisions when conference adaptation is active
worktree .agent/worktree-status.md: result-generation or plotting tasks and exit conditions

Use certainty labels:

verified for values checked against raw data, logs, or source figures
user-stated for user-supplied context
inferred for reviewer-risk and narrative judgments
unverified for visual or statistical claims that could not be inspected

Final Sanity Check

Before finalizing:

every figure has a linked claim and reviewer question
every paper figure has a resolved asset/wrapper bundle when the project uses figures/*.tex
every reviewed figure has a visual description separate from its caption
plotting parameters and experiment parameters are recorded or explicitly marked unknown
main comparison is easy to find
axes, units, legends, captions, and labels are unambiguous
colors, markers, fonts, symbols, and figure sizes are consistent across the paper
style lessons are recorded at the right promotion level rather than silently becoming broad rules
uncertainty is present or the lack of uncertainty is justified
baseline and compute fairness are visible when relevant
overclaims are narrowed
fixes are routed to concrete next actions or skills
project memory is updated when present

figure-results-review

Figure Results Review

Skill Directory Layout

Progressive Loading

Core Principles

Step 1 - Recover Evidence Context

Step 2 - Resolve the Figure Bundle

Step 3 - Describe Before Interpreting

Step 4 - Audit Claim Support

Step 5 - Audit Visual Integrity

Step 6 - Audit Paper Visual Style

Step 7 - Audit Statistical and Experimental Evidence

Step 8 - Review Caption and Result Narrative

Step 9 - Route Fixes

Step 10 - Write the Review Report

Step 11 - Write Back to Project Memory

Final Sanity Check

More from a-green-hand-jack/ml-research-skills

project-init

project-sync

add-git-tag

update-docs

new-workspace

init-latex-project