Experiment Design

Design rigorous machine learning experiments that produce credible, reproducible results.

Design Checklist

Work through each section before writing any code.

1. Hypothesis

State the hypothesis as a falsifiable claim:

"We claim that [method X] achieves [metric Y] on [dataset Z] because [mechanism]."

If the hypothesis is vague, help the user sharpen it before proceeding.

2. Independent and Dependent Variables

Independent variable: What is being changed (e.g., architecture, loss function, data augmentation)?
Dependent variable: What is being measured (e.g., accuracy, FID score, wall-clock time)?
Controlled variables: List everything held constant.

3. Baselines

Select baselines at three levels:

Naive: A trivially simple method (majority class, mean predictor)
Standard: The most widely-used existing approach
Strong: The current state-of-the-art on the chosen benchmark

Justify each choice. Avoid strawman baselines.

4. Datasets and Splits

Name the dataset, version, and source.
Specify train/val/test splits. Use standard splits if they exist.
Flag any data leakage risks.
Note dataset limitations (bias, domain coverage, size).

5. Metrics

Choose metrics that align with the task objective.
Prefer metrics with established semantics over novel ones.
Report multiple metrics when they capture different aspects.
Specify statistical significance: report means ± standard deviation over N seeds.

6. Compute Budget

State the hardware, estimated runtime, and number of seeds. This enables reproducibility and contextualizes cost.

7. Ablations

Design ablations that isolate each component's contribution. Each ablation should remove or replace exactly one thing.

8. Failure Modes

Identify at least two ways the experiment could give misleading results, and how to detect or mitigate them.

Output Format

Produce a structured experiment plan as a markdown document with all sections above filled in. Highlight any section where the user needs to make a decision before proceeding.

experiment-design

Experiment Design

Design Checklist

1. Hypothesis

2. Independent and Dependent Variables

3. Baselines

4. Datasets and Splits

5. Metrics

6. Compute Budget

7. Ablations

8. Failure Modes

Output Format

More from aviskaar/open-org

cfo-finance

payroll-compensation

accounts-payable

tax-compliance

invoice-management

account-intelligence