figure-results-review
Figure Results Review
Audit figures, plots, captions, and result narratives before they become paper evidence or meeting material.
Use this skill when:
- the user has a figure, plot, result screenshot, figure caption, result section, or slide with experimental evidence
- the paper stores each figure as a rendered asset such as
figures/fig_name.pdforfigures/fig_name.pngplus a LaTeX wrapper such asfigures/fig_name.tex - a paper claim needs to be checked against the actual displayed evidence
- a plot may be missing baselines, error bars, seeds, labels, units, or fairness context
- paper figures need a consistent visual style: color palette, markers, symbols, line widths, fonts, sizing, and notation
- new results require deciding whether to update writing, rerun experiments, diagnose failures, or narrow claims
- a rebuttal needs a concise visual answer
- an advisor meeting needs figures that make the decision obvious
Do not use this skill to design experiments from scratch. Use experiment-design-planner before results exist. Use result-diagnosis when the primary issue is why a result is surprising or broken. Use conference-writing-adapter when the main task is prose style after the evidence is already accepted.
Pair this skill with:
paper-result-asset-builderwhen a paper-facing figure needs to be generated or regenerated from CSV result files before visual reviewpaper-evidence-gap-minerwhen the figure review reveals a missing result and existing CSVs may already contain the needed evidencepaper-evidence-boardwhen figures must be linked to paper claims, sections, reviewer risks, and actionsresult-diagnosiswhen a plotted result is suspicious, unstable, negative, or contradictorybaseline-selection-auditwhen the visual exposes missing, weak, or unfair baselinesexperiment-design-plannerwhen the fix requires new experiments, ablations, controls, or metricsexperiment-report-writerwhen raw results need a structured report before figure reviewconference-writing-adapterwhen the final figure narrative or visual style must be adapted to a target venueresearch-project-memorywhen claim/evidence/provenance/risk/action/handoff updates should persist across sessions
Skill Directory Layout
<installed-skill-dir>/
├── SKILL.md
├── templates/
│ └── visual-style.md
└── references/
├── caption-and-narrative.md
├── claim-support.md
├── memory-writeback.md
├── paper-visual-style.md
├── report-template.md
├── statistical-evidence.md
├── style-memory.md
└── visual-integrity.md
Progressive Loading
- Always read
references/claim-support.md,references/visual-integrity.md, andreferences/statistical-evidence.md. - Read
references/paper-visual-style.mdandreferences/style-memory.mdwhen figures are intended for a paper, slide deck, rebuttal, camera-ready, or venue-specific rewrite. - Use
templates/visual-style.mdwhen initializingpaper/.agent/visual-style.md. - Read
references/caption-and-narrative.mdwhen revising captions, result prose, slide text, or paper figure callouts. - Read
references/report-template.mdbefore writing the final review. - Read
references/memory-writeback.mdwhen the project hasmemory/, component.agent/folders, or the user asks for persistent project memory. - If the expected plotting conventions depend on a target venue, benchmark, or recent paper style, verify with current accepted papers, official benchmark protocols, or user-provided exemplars.
- If the actual image cannot be inspected, audit the provided data/caption/prose and clearly mark visual-layout judgments as unverified.
Core Principles
- A figure is evidence for a specific claim, not decoration.
- A figure bundle has separate layers: rendered asset, LaTeX wrapper, visual description, caption, main-text callout, and provenance.
- Figure description and caption are different artifacts. Describe what the image shows before interpreting why it matters.
- Every plot should answer one reviewer question.
- The main comparison should be visually and numerically easy to find.
- Captions must state enough setup for the result to be interpreted without searching the paper.
- Statistical uncertainty, seeds, and variance matter when the claim depends on small differences.
- Compute, data, baseline, and protocol fairness must be visible when they affect interpretation.
- Paper figures should share a deliberate visual language. Style choices are part of writing because they control what reviewers notice first.
- Visual style should evolve as memory: record one-off lessons, promote repeated preferences into project contracts, and only then generalize them into reusable skill rules.
- A beautiful plot that does not support the claim should be revised or cut.
- New results must update claims, writing, reviewer risks, and next actions.
- When a figure causes paper layout trouble, first localize the affected page and figure wrapper. Prefer local wrapper/prose/placement fixes over global float spacing or paragraph settings; route broader submission-layout debugging to
submit-paper.
Step 1 - Recover Evidence Context
Collect:
- figure file path, screenshot, raw data, caption, or result prose
- rendered asset path such as
figures/fig_name.pdforfigures/fig_name.png - LaTeX wrapper path such as
figures/fig_name.tex, if the paper uses wrapper files - paper claim or section the result is meant to support
- experiment setup: dataset, model, baseline, metric, seed, split, hyperparameters, protocol
- plotting setup: plotted variables, filters, smoothing, transforms, axis ranges, colormap, annotation rules, aggregation, and code/config path when available
- target audience: paper, advisor meeting, slide, rebuttal, internal report, or appendix
- target venue or benchmark expectations
- current paper location, if any
- linked project memory IDs such as
CLM-###,EVD-###,FIG-###,TAB-###,RSK-###, orACT-###
Rewrite the intended evidence relation:
This figure is supposed to show that [claim] because [metric/comparison/trend] under [setup].
If that sentence cannot be written, route to paper-evidence-board before polishing the visual.
Step 2 - Resolve the Figure Bundle
For paper figures, identify the bundle by shared stem:
figures/fig_name.pdf or figures/fig_name.png # rendered asset
figures/fig_name.tex # LaTeX wrapper
Inspect both layers when available:
- rendered asset: what the reader visually sees
- wrapper
.tex:\includegraphics, width, placement,\caption{},\label{}, subfigure layout, notes, and whether the asset filename matches the intended figure - nearby paper source: where the wrapper is
\input{}or\include{}and how the main text calls it out
If a wrapper exists without a matching asset, or an asset exists without a wrapper, flag the bundle as incomplete. If multiple asset formats exist, identify which one the wrapper includes.
Step 3 - Describe Before Interpreting
For each figure, produce a visual description before writing or judging the caption.
The visual description should state:
- figure type and panel structure
- axes, units, scales, and directionality
- methods, datasets, variables, colors, markers, and legends
- visible trends, comparisons, outliers, missing values, and uncertainty
- plotting parameters needed to reproduce what is shown
- experiment parameters needed to interpret the result: dataset split, model/checkpoint, baselines, metric, seed count, sampling budget, hyperparameters, compute budget, or protocol
- what is directly visible versus inferred from logs, configs, filename, caption, or user statement
Do not put the full visual description into the paper caption. Use it as the audit record that checks whether the caption and paper prose are faithful to the figure.
Step 4 - Audit Claim Support
Read references/claim-support.md.
For each figure, answer:
- what exact claim does it support?
- is the displayed evidence sufficient for that claim?
- is the claim too broad for the measured setup?
- are baselines, ablations, controls, or diagnostics missing?
- does the result contradict another figure or section?
- is the result main-paper material, appendix material, diagnostic material, or not ready?
Assign one status:
supports-claimsupports-narrower-claimambiguouscontradicts-claimdiagnostic-onlynot-ready
Step 5 - Audit Visual Integrity
Read references/visual-integrity.md.
Check:
- axes, labels, units, scales, and transformations
- legend readability and method names
- ordering of methods, datasets, metrics, and ablations
- whether the main result is visually salient
- whether color, markers, line styles, or hatching remain readable in grayscale
- whether figure size works for one-column, two-column, slide, or appendix usage
- whether captions and labels match the actual plotted data
- whether figure wrapper width, cropping, subfigure order, and labels match the rendered asset and visual description
- whether the wrapper creates local whitespace, page-bottom instability, or fragile wrap behavior that should be fixed near this figure rather than by global LaTeX tuning
Flag any issue that could cause a reviewer to misread the result.
Step 6 - Audit Paper Visual Style
Read references/paper-visual-style.md and references/style-memory.md when the output is paper-facing.
Check:
- color palette and colorblind/grayscale robustness
- stable method-to-color and method-to-marker mapping across all figures
- line width, marker size, hatch, symbol, and notation consistency
- font size, tick density, label length, and final-column readability
- figure dimensions for one-column, two-column, appendix, or slide use
- whether visual emphasis matches the paper's claim hierarchy
- whether the main method is recognizable without relying only on color
- whether theorem/method symbols in plots match the paper notation
If the paper has no visual style policy, propose one from templates/visual-style.md and record it in paper/.agent/visual-style.md or .agent/conference-writing/project-style.md when appropriate.
For typography and final sizing, check the contract rather than the notebook preview:
- final LaTeX insertion width, such as
\columnwidth,\linewidth, or\textwidth - generated figure size in inches
- axis label, tick label, legend, annotation, and panel-label sizes in points
- whether LaTeX scales the asset enough to distort the intended typography hierarchy
- whether code-side
plot_style.yamland paper-sidevisual-style.mdagree when both exist
If a style issue is discovered, classify it as lesson, preference, project contract, or reusable skill rule candidate before writing it back.
Step 7 - Audit Statistical and Experimental Evidence
Read references/statistical-evidence.md.
Check:
- number of seeds or repeated runs
- variance, confidence intervals, standard deviation, or standard error
- significance or effect-size interpretation when differences are small
- data split and leakage risk
- metric definition and averaging
- baseline fairness and tuning budget
- compute or speed reporting when efficiency is part of the claim
- failure cases or negative results that should be shown
- whether plotting parameters and experiment parameters are recoverable from the figure wrapper, caption, result report, logs, config, or paper text
If the plot lacks necessary uncertainty, decide whether to rerun, add error bars, weaken the claim, or move the result to appendix/diagnostic status.
Step 8 - Review Caption and Result Narrative
Read references/caption-and-narrative.md when output text needs revision.
For each figure, produce:
- visual description
- provenance summary: rendered asset, wrapper
.tex, plotting parameters, experiment parameters, and source certainty - caption-image alignment diagnosis
- caption diagnosis
- revised caption or caption outline
- one-sentence paper callout
- claims to avoid in nearby prose
- reviewer question answered
- missing setup details to add
Captions should not oversell. They should state the setup, comparison, metric, and takeaway.
Step 9 - Route Fixes
For every issue, route to one or more actions:
fix-wrapper: wrong asset path, stale caption, label mismatch, width/crop/layout issue, or missing subfigure mapping infigures/*.texedit-figure: labels, ordering, scale, legend, layout, or visual emphasisrewrite-caption: setup, metric, takeaway, caveat, or claim alignmentwrite-description: missing visual description or missing provenance recordrewrite-results-text: nearby paper prose overclaims or misses the takeawaydefine-visual-style: missing or inconsistent paper visual style policyrecord-style-lesson: new typography, sizing, legend, marker, color, export, or wrapper lesson should be appended to style memory before becoming a hard rulerestyle-figure: color, marker, line width, font size, symbol, panel layout, or emphasisbuild-result-asset: raw CSV evidence exists but the paper-facing figure or wrapper needs to be generated with provenancemine-existing-results: the figure lacks evidence that may already exist in CSVs or reportsrerun: missing seeds, variance, baseline, metric, or protocol after existing results are checkeddiagnose-result: suspicious, negative, unstable, or contradictory patternbaseline-audit: missing or unfair baselinenarrow-claim: evidence only supports a smaller statementmove-to-appendix: useful but not central enough for main papercut: visual does not support a paper need
Name the next skill when appropriate.
Step 10 - Write the Review Report
Read references/report-template.md.
If saving to a project and no path is given, use:
docs/results/figure_results_review_YYYY-MM-DD_<short-name>.md
The report must include:
- figure inventory
- figure bundle map: rendered asset, wrapper
.tex, paper callout location, label - visual descriptions
- plotting and experiment parameter provenance
- claim-support status
- visual integrity issues
- visual style policy and consistency issues
- new style lessons or preference-to-contract promotions
- statistical evidence issues
- caption and narrative fixes
- reviewer-risk forecast
- routed actions and next skills
- memory update section
Step 11 - Write Back to Project Memory
Read references/memory-writeback.md when memory exists.
Update the smallest useful set of entries:
memory/evidence-board.md: figure evidence status, rendered asset, wrapper.tex, setup, plotting parameters, experiment parameters, and linked claimsmemory/claim-board.md: claims supported, narrowed, contradicted, or not readymemory/risk-board.md: reviewer risks from visual ambiguity, missing uncertainty, weak baselines, or overclaimingmemory/action-board.md: figure edits, reruns, caption fixes, result diagnosis, or claim revisionspaper/.agent/: figure map, asset/wrapper pairings, paper locations, visual descriptions, caption state, provenance gaps, and stale visual warningspaper/.agent/visual-style.mdorpaper/.agent/style-lessons.md: style lessons, preferences, and project contracts for typography, sizing, encodings, exports, and wrapper behavior.agent/conference-writing/project-style.md: venue-facing figure style decisions when conference adaptation is active- worktree
.agent/worktree-status.md: result-generation or plotting tasks and exit conditions
Use certainty labels:
verifiedfor values checked against raw data, logs, or source figuresuser-statedfor user-supplied contextinferredfor reviewer-risk and narrative judgmentsunverifiedfor visual or statistical claims that could not be inspected
Final Sanity Check
Before finalizing:
- every figure has a linked claim and reviewer question
- every paper figure has a resolved asset/wrapper bundle when the project uses
figures/*.tex - every reviewed figure has a visual description separate from its caption
- plotting parameters and experiment parameters are recorded or explicitly marked unknown
- main comparison is easy to find
- axes, units, legends, captions, and labels are unambiguous
- colors, markers, fonts, symbols, and figure sizes are consistent across the paper
- style lessons are recorded at the right promotion level rather than silently becoming broad rules
- uncertainty is present or the lack of uncertainty is justified
- baseline and compute fairness are visible when relevant
- overclaims are narrowed
- fixes are routed to concrete next actions or skills
- project memory is updated when present
More from a-green-hand-jack/ml-research-skills
project-init
Initialize an ML research project control root. Use for paper/code/slides repos, shared memory, GitHub Project alignment, agent guidance, worktree policy, and lifecycle handoffs.
37project-sync
Sync verified code-side experiment results into paper memory. Use when logs, reports, run docs, or user-confirmed metrics should become paper-facing evidence.
36add-git-tag
Create annotated Git milestone tags. Use when completing a phase, releasing a version, or marking a research checkpoint.
36update-docs
Refresh project documentation after code changes. Use after implementing features, changing behavior, or preparing a milestone commit.
36new-workspace
Create Git branches or worktrees for research code and paper versions. Use for experiments, baselines, rebuttal fixes, arXiv/camera-ready branches, and worktree memory.
36init-latex-project
Initialize a LaTeX academic paper project. Use for new conference or journal papers needing templates, macros, venue preambles, and writing guidance.
36