Experiment Report Writer

Turn experiment evidence into a clear research report that a reader can evaluate without rerunning the experiment.

Use this skill to write a standalone document, a section for a paper or lab note, a mentor-facing update, or a presentation-ready experiment summary.

Pair this skill with research-project-memory when completed results should update project claims, evidence, risks, actions, figures, or worktree decisions.

Skill Directory Layout

<installed-skill-dir>/
├── SKILL.md
└── templates/
    └── experiment-report.md

Progressive Loading

Use templates/experiment-report.md as the default Markdown skeleton when saving a report.
If the user only wants a draft in chat, follow the same section order without needing to read or copy the template verbatim.

Core Principles

Ground every claim in evidence: configs, commands, logs, metrics, tables, figures, commit hashes, or user-provided notes.
Separate observed results from interpretation. Do not present a hypothesis as a measured fact.
Make the report reproducible enough that another researcher can identify what was run.
Explain why the experiment matters before listing numbers.
Compare against the right reference point: baseline, previous run, ablation control, expected behavior, or published number.
Preserve uncertainty. If evidence is missing, mark it as missing and ask for the smallest useful clarification.
Write for the intended audience. A lab notebook can be dense; a mentor update should emphasize decisions, evidence, and next steps.

Step 1 - Classify the Report

Identify the report mode:

single-experiment: one run or one controlled comparison
ablation-report: several variants testing one factor
batch-summary: many related runs from a sweep or experiment batch
mentor-update: concise progress report with decision-oriented discussion
paper-section: polished text intended to become part of a paper

Also identify:

audience
output format: Markdown, LaTeX, slide outline, or chat draft
save path, if the user wants a file
expected length
whether figures, tables, configs, logs, or notebooks are available

If the user gives no format, default to Markdown. If they ask for a file and no path is given, use:

docs/reports/experiment_report_YYYY-MM-DD.md

Step 2 - Gather Evidence

Prefer primary evidence over memory.

Look for:

experiment commands or scripts
config files and parameter overrides
random seeds and number of runs
dataset name, split, preprocessing, and sample count
model, method variant, checkpoint, or algorithm version
hardware and runtime if relevant
metrics, logs, result tables, figures, and failure cases
git commit hash or code version, when available

Useful local checks include:

git rev-parse --short HEAD
find . -maxdepth 3 -type f \( -name "*.yaml" -o -name "*.yml" -o -name "*.json" -o -name "*.csv" -o -name "*.md" \)
find . -maxdepth 4 -type f \( -name "*.png" -o -name "*.jpg" -o -name "*.pdf" -o -name "*.svg" \)

If the user only provides informal notes, use them but label missing reproducibility details explicitly.

Step 3 - Extract the Experiment Story

Before drafting, organize the experiment into:

question: what was this experiment trying to learn?
motivation: why does the question matter?
hypothesis: what did we expect and why?
method: what changed compared with the baseline?
controls: what stayed fixed?
measurement: which metrics answer the question?
outcome: what happened?
interpretation: what does the outcome suggest?
decision: what should happen next?

For ablations or sweeps, make the independent variable explicit and keep the comparison fair.

Required Report Structure

Use these sections unless the user requests a different format:

# [Experiment Report Title]

## Summary
## 1. Experiment Motivation
## 2. Experiment Setup
## 3. Core Algorithm or Method
## 4. Metrics
## 5. Results
## 6. How to Read the Figures
## 7. Interpretation
## 8. Conclusion and Discussion
## 9. Limitations and Caveats
## 10. Next Steps
## Reproducibility Notes

If there is no core algorithm, write "Not applicable" and briefly explain whether the experiment changes data, hyperparameters, evaluation, infrastructure, or analysis instead.

If there are no figures, omit "How to Read the Figures" or replace it with "How to Read the Tables" when tables are the main evidence.

Section Guidance

Summary

Write 3-6 bullets covering:

experiment question
most important setup details
headline result
interpretation
recommended next step

1. Experiment Motivation

Explain the research or engineering reason for the experiment:

problem being tested
expected mechanism
why the result would affect the project
what decision the experiment supports

2. Experiment Setup

Include enough detail to reproduce or audit the run:

dataset, split, preprocessing
baseline and compared variants
key hyperparameters and parameter changes
training/evaluation command, config file, or run ID
random seed and number of trials
hardware, runtime, and code version when relevant

Use a table for parameters when there are more than five important settings.

3. Core Algorithm or Method

Describe the algorithm only at the level needed to understand the experiment:

what input it consumes
what output it produces
key steps or objective
what is new or different from the baseline
complexity, assumptions, or implementation details that affect interpretation

Do not over-explain standard background unless the audience needs it.

4. Metrics

For each metric, explain:

definition
direction: higher is better, lower is better, or target range
unit
aggregation: mean, median, best checkpoint, final epoch, confidence interval, or standard deviation
why it is relevant to the experiment question

Flag metrics that can conflict with each other.

5. Results

Present results before interpretation.

Use:

tables for exact numeric comparisons
figures for trends, distributions, or qualitative examples
short text for the main deltas

Always identify the baseline and report absolute values plus meaningful deltas when possible.

6. How to Read the Figures

For every figure, explain:

what the figure is meant to show
x-axis: variable, unit, and scale
y-axis: metric, unit, and direction
legend: method names, groups, colors, markers, or line styles
error bars or shaded regions, if present
whether points are individual runs, averages, checkpoints, epochs, or samples
the main visual pattern the reader should notice

If an axis is log-scaled, normalized, clipped, or unitless, say so explicitly.

7. Interpretation

Connect the observed results back to the motivation:

whether the hypothesis was supported
what changed relative to the baseline
likely explanation
alternative explanations
surprising or negative results
whether the evidence is strong enough to act on

Use cautious wording when there is only one seed, weak statistical evidence, or missing controls.

8. Conclusion and Discussion

State the practical conclusion:

what we learned
what decision this supports
whether to keep, reject, or further test the method
how the result affects the broader project

9. Limitations and Caveats

Include risks that could change the conclusion:

small number of seeds
narrow dataset or subset
missing baseline
unstable training
possible implementation bug
metric mismatch
data leakage or evaluation contamination risk
hardware/runtime constraints

10. Next Steps

Recommend concrete follow-ups:

one immediate verification step
one high-value extension
one cleanup or documentation task when needed

Tie each next step to the uncertainty it resolves.

Project Memory Writeback

If the project uses research-project-memory, write back the result after the report is drafted:

memory/evidence-board.md: completed EVD-### summary, source paths, linked claim IDs, limitations, and certainty
memory/provenance-board.md: source-to-evidence traceability from run pointers, CSVs, reports, aggregation rules, and produced paper-facing candidates
memory/claim-board.md: move linked claims through the lifecycle, such as supported, weakened, revised, evidence-needed, provisional, or cut
memory/risk-board.md: close mitigated risks or add new risks exposed by the result
memory/action-board.md: next steps from the report, including rerun, write, revise-method, park, or kill decisions
memory/handoff-board.md: create a handoff when results are ready for paper-result-asset-builder, paper-evidence-board, result-diagnosis, or writing
memory/phase-dashboard.md: update the active gate if the project moved into evidence production, paper asset building, drafting, or regressed because the result weakened a claim
memory/current-status.md: latest reliable experiment state and next session entry point
worktree .agent/worktree-status.md: latest result and exit condition if the experiment belongs to a worktree

Do not write an interpretation as a measured fact. Use observed for metrics from logs/tables and inferred for explanations.

Output Quality Checklist

Before finalizing, check that:

the report states the experiment question and decision context
all key parameters and baselines are named
metrics include direction and units
results are separated from interpretation
every figure/table has reading guidance
missing evidence is labeled instead of invented
conclusions do not overclaim beyond the data
next steps are actionable
project memory is updated when present and relevant

experiment-report-writer