table-results-review
Table Results Review
Audit standalone paper tables before they become paper evidence, meeting material, or rebuttal material.
Use this skill when:
- the paper stores tables as independent files such as
tables/results.texand inserts them into sections with\input{tables/results} - the user has a benchmark, ablation, speed, model-spec, metric-definition, oracle, sanity-check, or appendix table
- a table caption may not match the table content
- row/column meanings, metric direction, bolding rules, footnotes, or missing values are unclear
- table numbers need provenance: result CSV, logs, configs, seeds, aggregation, rounding, or manual edits
- a paper claim depends on table evidence
Do not use this skill for rendered figure assets or plot styling. Use figure-results-review for figures/*.pdf, figures/*.png, and figures/*.tex figure bundles. Use paper-evidence-board when the main task is linking many figures and tables to claims across the whole paper.
Pair this skill with:
paper-result-asset-builderwhen a paper-facing table needs to be generated or regenerated from CSV result files before table reviewpaper-evidence-gap-minerwhen the table review reveals a missing comparison, slice, variance, or baseline and existing CSVs may already contain itpaper-evidence-boardwhen tables must be linked to paper claims, sections, reviewer risks, and actionsbaseline-selection-auditwhen a comparison table may miss important baselines or use unfair settingsresult-diagnosiswhen table numbers are surprising, unstable, negative, or contradictoryexperiment-design-plannerwhen a table exposes missing controls, seeds, metrics, or ablationsexperiment-report-writerwhen raw logs need a structured report before table reviewconference-writing-adapterwhen final table narrative or compactness must match a target venueresearch-project-memorywhen claim/evidence/provenance/risk/action/handoff updates should persist across sessions
Core Principles
- A table is evidence for a specific claim, not a dump of numbers.
- A table bundle has separate layers: standalone
.texsource, table description, caption, main-text callout, and provenance. - Table description and table caption are different artifacts. Describe what the table reports before interpreting why it matters.
- The reader's comparison path should be obvious from row/column grouping.
- Bolding, underlining, arrows, missing values, and footnotes must have explicit rules.
- Numeric provenance matters. Record where values came from, how they were aggregated, how they were rounded, and whether they were manually edited.
- A strong table caption states the setup, comparison, metric, key parameter, and takeaway without becoming a full experiment report.
- When a table causes paper layout trouble, first localize the affected page and table source. Prefer local table/prose/placement fixes over global float spacing or paragraph settings; route broader submission-layout debugging to
submit-paper.
Step 1 - Recover Table Context
Collect:
- standalone table path such as
tables/results.tex - paper section where it is inserted with
\input{tables/results}or equivalent - current caption, label, and main-text callout
- intended paper claim or reviewer question
- table purpose: main result, ablation, baseline comparison, speed/compute, model spec, metric definition, oracle, sanity check, or appendix detail
- experiment setup: dataset, split, model/checkpoint, baselines, metric, seeds, sampling budget, hyperparameters, compute budget, protocol
- table-generation setup: source CSV/log/report, aggregation rule, row/column selection, sorting, bolding rule, rounding rule, missing-value convention, and manual edits
- linked project memory IDs such as
CLM-###,EVD-###,TAB-###,RSK-###, orACT-###
Rewrite the intended evidence relation:
This table supports [claim] by showing [comparison/ranking/trend/tradeoff] under [setup].
If that sentence cannot be written, route to paper-evidence-board before polishing the table.
Step 2 - Resolve the Table Bundle
For paper tables, identify the standalone source:
tables/table_name.tex
Inspect:
- table environment:
tableortable* - tabular structure:
tabular,tabularx,longtable,booktabs,resizebox,small, or custom macros \caption{},\label{}, footnotes, arrows, bold/underline, row groups, column groups, and missing values- whether values appear hand-entered, macro-generated, or imported from a script
- nearby paper source: where the table is input and how the main text calls it out
Flag the bundle as incomplete if it lacks caption, label, callout, source provenance, or a clear bolding/rounding/missing-value rule.
Step 3 - Describe Before Interpreting
Produce a table description before judging the caption.
The table description should state:
- table purpose and paper location
- row groups, column groups, metrics, units, and directionality
- comparison path: which rows/columns the reader should compare first
- highlighted values and the bolding/underlining rule
- missing values, footnotes, caveats, and decimal precision
- visible pattern: best method, strongest baseline, tradeoff, ablation trend, failure case, or inconsistency
- experiment parameters needed to interpret the numbers
- source of key values when available, and what is hand-entered versus generated
- what is directly visible versus inferred from logs, configs, filename, caption, or user statement
Do not put the full table description into the caption. Use it as the audit record that checks whether the caption and paper prose are faithful to the table.
Step 4 - Audit Claim Support
For each table, answer:
- what exact claim does it support?
- is the table sufficient for that claim?
- is the claim too broad for the measured setup?
- are baselines, ablations, controls, seeds, or metrics missing?
- does the table contradict another figure, table, or section?
- is the table main-paper material, appendix material, diagnostic material, or not ready?
Assign one status:
supports-claimsupports-narrower-claimambiguouscontradicts-claimdiagnostic-onlynot-ready
Step 5 - Audit Table Integrity
Check:
- rows and columns follow the reader's comparison path
- main method and primary baselines are easy to compare
- metric direction is shown with arrows or text
- bolding and underlining rules are defined and not misleading
- decimal precision matches metric noise
- missing values and failed runs have footnotes
- compute, parameters, data, or NFE columns appear when relevant
- main results and ablations are not mixed in a confusing way
- appendix tables do not hide essential comparisons
- caption, label, row/column names, and paper callout match the table source
- table placement does not rely on fragile wrap behavior near a page boundary;
[H]still may leave vertical skips, so fix local whitespace in or around the table before changing global settings
Flag any issue that could cause a reviewer to misread the result.
Step 6 - Audit Statistical and Experimental Evidence
Check:
- number of seeds or repeated runs
- variance, confidence interval, standard deviation, or standard error when differences are small
- metric definition and aggregation
- data split and leakage risk
- baseline fairness and tuning budget
- compute or speed reporting when efficiency is part of the claim
- table-generation parameters: source file, filtering, aggregation, sorting, rounding, bolding, missing-value convention, and manual edits
If the table lacks necessary uncertainty or provenance, decide whether to rerun, add columns/footnotes, weaken the claim, or move the table to appendix/diagnostic status.
Step 7 - Review Caption and Result Narrative
For each table, produce:
- table description
- provenance summary: table
.tex, source data/log/config/report, table-generation parameters, experiment parameters, and source certainty - caption-table alignment diagnosis
- caption diagnosis
- revised caption or caption outline
- one-sentence paper callout
- claims to avoid in nearby prose
- reviewer question answered
- missing setup details to add
Caption pattern:
[What the table reports.] We compare [methods] on [task/dataset] using [metrics; direction] under [key experiment parameters].
[Grouping or fairness detail.] [Takeaway tied to the claim]. Bold marks [bolding rule].
For model-spec, metric-definition, or method-comparison tables:
[What the table defines or compares.] Columns summarize [fields] used in [paper section or experiment].
[Interpretive note.] [Takeaway tied to the claim or reader task].
Do not put every hyperparameter in the caption. Include the parameters needed to interpret the claim. Put full provenance in the review report, appendix, artifact, or paper/.agent/ record.
Step 8 - Route Fixes
For every issue, route to one or more actions:
fix-table-wrapper: stale caption, label mismatch, unclear bolding rule, wrong resize, broken footnote, or row/column mismatch intables/*.texedit-table: grouping, decimals, bolding, footnotes, missing values, row/column order, or metric arrowsrewrite-caption: setup, metric, takeaway, caveat, bolding rule, or claim alignmentwrite-description: missing table description or missing provenance recordrewrite-results-text: nearby paper prose overclaims or misses the takeawaybuild-result-asset: raw CSV evidence exists but the paper-facing table needs to be generated with documented aggregation, rounding, and provenancemine-existing-results: missing comparison, slice, variance, or baseline may already exist in CSVs or reportsrerun: missing seeds, variance, baseline, metric, or protocol after existing results are checkeddiagnose-result: suspicious, negative, unstable, or contradictory numbersbaseline-audit: missing or unfair baselinenarrow-claim: evidence only supports a smaller statementmove-to-appendix: useful but not central enough for main papercut: table does not support a paper need
Name the next skill when appropriate.
Step 9 - Write the Review Report
If saving to a project and no path is given, use:
docs/results/table_results_review_YYYY-MM-DD_<short-name>.md
The report must include:
- table inventory
- table bundle map: source
.tex, input location, label, caption, paper callout location - table descriptions
- table-generation and experiment parameter provenance
- claim-support status
- table integrity issues
- statistical evidence issues
- caption and narrative fixes
- reviewer-risk forecast
- routed actions and next skills
- memory update section
Step 10 - Write Back to Project Memory
When memory exists, update the smallest useful set of entries:
memory/evidence-board.md: table evidence status, source.tex, setup, table-generation parameters, experiment parameters, and linked claimsmemory/claim-board.md: claims supported, narrowed, contradicted, or not readymemory/risk-board.md: reviewer risks from table ambiguity, missing uncertainty, weak baselines, missing provenance, or overclaimingmemory/action-board.md: table edits, reruns, caption fixes, result diagnosis, baseline audit, or claim revisionspaper/.agent/: table map, source/input pairings, paper locations, table descriptions, caption state, provenance gaps, and stale table warnings- worktree
.agent/worktree-status.md: result-generation or table-generation tasks and exit conditions
Use certainty labels:
verifiedfor values checked against raw data, logs, generated table, or paper textuser-statedfor user-supplied contextinferredfor reviewer-risk and narrative judgmentsunverifiedfor numeric or statistical claims that could not be inspected
Final Sanity Check
Before finalizing:
- every table has a linked claim and reviewer question
- every paper table has a resolved standalone source when the project uses
tables/*.tex - every reviewed table has a table description separate from its caption
- table-generation parameters and experiment parameters are recorded or explicitly marked unknown
- row/column meanings and metric directions are unambiguous
- bolding, underlining, footnotes, rounding, and missing values have rules
- uncertainty is present or the lack of uncertainty is justified
- baseline and compute fairness are visible when relevant
- overclaims are narrowed
- fixes are routed to concrete next actions or skills
- project memory is updated when present
More from a-green-hand-jack/ml-research-skills
project-init
Initialize an ML research project control root. Use for paper/code/slides repos, shared memory, GitHub Project alignment, agent guidance, worktree policy, and lifecycle handoffs.
37project-sync
Sync verified code-side experiment results into paper memory. Use when logs, reports, run docs, or user-confirmed metrics should become paper-facing evidence.
36add-git-tag
Create annotated Git milestone tags. Use when completing a phase, releasing a version, or marking a research checkpoint.
36update-docs
Refresh project documentation after code changes. Use after implementing features, changing behavior, or preparing a milestone commit.
36init-latex-project
Initialize a LaTeX academic paper project. Use for new conference or journal papers needing templates, macros, venue preambles, and writing guidance.
36new-workspace
Create Git branches or worktrees for research code and paper versions. Use for experiments, baselines, rebuttal fixes, arXiv/camera-ready branches, and worktree memory.
36