Analytic Workbench

Human-directed, AI-operated analysis. Use early when analysis may become a real workflow.

Start Here

Read AGENTS.md or agents.md if present
Inspect the repo before changing structure
Separate repo facts from agent assumptions
Separate frozen inputs from live pulls

Then establish per area: current mode, likely next mode, review surface, first boundary-sensitive change.

Modes

A mode is a workflow shape, not what you compute. Different areas of work can be in different modes simultaneously.

Mode	Use when	Read next	Common mistake
`probe`	first contact, uncertain framing	this file only	over-structuring early or treating probes as durable
`explore`	work worth keeping formally	`module-conventions.md`, `marimo-patterns.md`	logic in cells, hidden acquisition
`experiment`	reruns, comparisons, sweeps	`hydra-config.md`	adding Hydra before repeated runs exist
`operate`	scale, schedules, team work	`mlflow-guide.md` + orchestration docs	overbuilding early

All references above are in references/.

Operators

Five operators drive all work, firing from conversational intent. Default rhythm: plan -> run -> review -> summary, with promote inserting when the current mode's wiring is no longer sufficient.

Each operator must produce a visible declaration.

plan

Set up or revise the workflow. Fires at start of substantive work or direction change.

Plan
- Area: ...          - Mode: ...
- Likely next mode: ...
- Primary surface: marimo | qmd | chat-only | other
- Review surface: ...
- First boundary-sensitive change: ...
- Steering docs read: ...

Add for ML/causal: Sampling strategy: ... and Target framing: binary | multiclass | regression | ranking | causal.

For probe: Area, Mode: probe, First probe, Review surface, Risk.

run

Execute the next planned step.

Run                          Run outcome
- Area: ...                  - What ran: ...
- Step: ...                  - Output: ...
- Expected output: ...       - Anomaly or caveat: ...

review

Sanity-check outputs before presenting — fires automatically, not on request.

Review
- Area: ...
- Artifacts reviewed: ...
- Findings: ...
- Caveats: ...
- Verdict: proceed | revise | back up to [phase]

summary

Report status at natural stopping points.

EDA Update
- Area: ...       - What ran: ...
- Step: ...       - Outcome: ...
- Caveat: ...     - Next step: ...

promote

Advance to a stronger mode. This is where mode transitions happen — probe to explore when work is worth keeping, explore to experiment when reruns matter. Transitions can also emerge from user intent (e.g., asking for comparisons). Transitions are additive wiring, not rewrites.

Promote
- Area: ...             - From mode: ...
- To mode: ...          - What survives: ...
- What changes: ...

Core Workflow

The analytical spine, used within any mode:

Frame -> Acquire -> Profile -> Hypothesize -> Model or Analyze -> Review -> Promote

Frame: question, decision, constraints, review surface
Acquire: data access as workflow shape, not hidden setup
Profile: coverage, missingness, slices, label shape
Hypothesize: likely drivers before over-engineering
Review: findings early enough to redirect
Promote: worth-keeping work into modules, config, runs, reports, and the runbook

Backtrack when stuck:

Unusable data → Acquire or reframe
All hypotheses fail → Profile with fresh slices
Suspicious metrics → check leakage before celebrating
Wrong assumption surfaced → Frame

Guardrails

Prefer what is on disk over generic assumptions
Do not silently change folder layout, artifact semantics, or execution path — breaks resumption and review
Once structure exists, stop using throwaway snippets — they diverge from the notebook contract
First live extraction is usually a run artifact, not a frozen input
Do not split a phase across half-wired surfaces — hides state, blocks promotion
Do not let AutoML choose the framing — it optimizes whatever metric you hand it
Match sampling/CV to temporal, grouped, spatial, or event structure — naive splits leak and inflate
Keep project knowledge in repo docs, not agent memory

Acquisition and DAG Shape

explore+ should follow references/module-conventions.md: pure functions in src/<project>/analysis/, function name = output name, parameter name = dependency name, manual driver fine. Hamilton-style naming encouraged; the package is optional.

probe: direct fetches acceptable
explore+: reusable command/tool/pipeline if the pull will be rerun or handed off
Notebooks read saved artifacts, not embed live fetch logic

ML and Causal Rules

Read references/modeling-guardrails.md before modeling. Key rules:

Frame target before any model runs — unexamined framing wastes downstream work
Match sampling/CV to data structure
AutoML as comparison accelerator after framing, not as problem framing
Predictive importance is not causal effect — keep claims separate
Interpretability artifacts for important outputs (references/interpretability.md)
Causal design discipline: references/causal-analysis.md

Reference Guide

Read only what the current mode or task needs.

Reference	When to read	Key rules
`environment-setup.md`	packaging, imports, deps	Use real package setup; do not patch with editor settings
`module-conventions.md`	before `explore` code	Pure functions, output-oriented names, manual driver; no durable logic in cells
`marimo-patterns.md`	marimo surface	Verify in marimo itself; explicit paths; no hidden cross-cell state
`hydra-config.md`	entering `experiment`	Config separate from computation; no Hydra for one-off probes
`review-workflow.md`	presenting results	Review before presenting; marimo for interactive, qmd for narrative
`artifact-strategy.md`	repeated outputs, comparisons	Materialize clearly; do not mix frozen inputs with run outputs
`dvc-guide.md` / `kedro-guide.md`	reproducibility, lineage	Add only when beyond ordinary experiment needs

All references above are in references/.

Runbook

The runbook (runbook.md at the repo root) is the human-readable reproduction guide. It is derived from project structure — configs, scripts, notebooks, runs — not maintained as a separate log.

Generate or update at promote time: milestones, handoff requests, or mode transitions. Rewrite stale sections rather than appending.

Contents:

Prerequisites — env setup, data acquisition, external tools
Pipeline overview — ASCII diagram of data flow
Numbered steps — one per major pipeline step, each with: the exact command, expected runtime, and what to inspect afterward (including expected results so the reader knows what "correct" looks like)
Configuration reference — table of config files and what they control
Key directories — paths, contents, git-tracked or not
Troubleshooting — known failure modes and fixes

The runbook ties per-run artifacts together: config.yaml says what params were used, metrics.json says what happened, the runbook says which command produced the run, what results to expect, and how to open the review surface.

If .aw/ bookkeeping is active, note the runbook's last-updated timestamp in .aw/status.json.

Bookkeeping

Maintain .aw/ when work continues across sessions:

.aw/
  status.json          # current state across all areas
  stages/
    <area-name>/       # one folder per area of work
      plan.md          # framing and plan revisions
      status.json      # machine-readable state
      review.md        # review notes and approvals

Create or advance a stage folder only on mode changes or user-requested checkpoints. Plan revisions, reruns, and parameter changes belong in the existing folder, not a new one.

Joining mid-project:

.aw/status.json exists → authoritative
stages/ but no root status → reconstruct from folders
Hydra conf//multirun/ without .aw/ → likely experiment, confirm
Notebook + modules, no config management → likely explore
Nothing structured → probe

analytic-workbench