analytic-workbench
Analytic Workbench
Human-directed, AI-operated analysis. Use early when analysis may become a real workflow.
Start Here
- Read
AGENTS.mdoragents.mdif present - Inspect the repo before changing structure
- Separate repo facts from agent assumptions
- Separate frozen inputs from live pulls
Then establish per area: current mode, likely next mode, review surface, first boundary-sensitive change.
Modes
A mode is a workflow shape, not what you compute. Different areas of work can be in different modes simultaneously.
| Mode | Use when | Read next | Common mistake |
|---|---|---|---|
probe |
first contact, uncertain framing | this file only | over-structuring early or treating probes as durable |
explore |
work worth keeping formally | module-conventions.md, marimo-patterns.md |
logic in cells, hidden acquisition |
experiment |
reruns, comparisons, sweeps | hydra-config.md |
adding Hydra before repeated runs exist |
operate |
scale, schedules, team work | mlflow-guide.md + orchestration docs |
overbuilding early |
All references above are in references/.
Operators
Five operators drive all work, firing from conversational intent. Default
rhythm: plan -> run -> review -> summary, with promote inserting when the
current mode's wiring is no longer sufficient.
Each operator must produce a visible declaration.
plan
Set up or revise the workflow. Fires at start of substantive work or direction change.
Plan
- Area: ... - Mode: ...
- Likely next mode: ...
- Primary surface: marimo | qmd | chat-only | other
- Review surface: ...
- First boundary-sensitive change: ...
- Steering docs read: ...
Add for ML/causal: Sampling strategy: ... and
Target framing: binary | multiclass | regression | ranking | causal.
For probe: Area, Mode: probe, First probe, Review surface, Risk.
run
Execute the next planned step.
Run Run outcome
- Area: ... - What ran: ...
- Step: ... - Output: ...
- Expected output: ... - Anomaly or caveat: ...
review
Sanity-check outputs before presenting — fires automatically, not on request.
Review
- Area: ...
- Artifacts reviewed: ...
- Findings: ...
- Caveats: ...
- Verdict: proceed | revise | back up to [phase]
summary
Report status at natural stopping points.
EDA Update
- Area: ... - What ran: ...
- Step: ... - Outcome: ...
- Caveat: ... - Next step: ...
promote
Advance to a stronger mode. This is where mode transitions happen — probe to explore when work is worth keeping, explore to experiment when reruns matter. Transitions can also emerge from user intent (e.g., asking for comparisons). Transitions are additive wiring, not rewrites.
Promote
- Area: ... - From mode: ...
- To mode: ... - What survives: ...
- What changes: ...
Core Workflow
The analytical spine, used within any mode:
Frame -> Acquire -> Profile -> Hypothesize -> Model or Analyze -> Review -> Promote
- Frame: question, decision, constraints, review surface
- Acquire: data access as workflow shape, not hidden setup
- Profile: coverage, missingness, slices, label shape
- Hypothesize: likely drivers before over-engineering
- Review: findings early enough to redirect
- Promote: worth-keeping work into modules, config, runs, reports, and the runbook
Backtrack when stuck:
- Unusable data → Acquire or reframe
- All hypotheses fail → Profile with fresh slices
- Suspicious metrics → check leakage before celebrating
- Wrong assumption surfaced → Frame
Guardrails
- Prefer what is on disk over generic assumptions
- Do not silently change folder layout, artifact semantics, or execution path — breaks resumption and review
- Once structure exists, stop using throwaway snippets — they diverge from the notebook contract
- First live extraction is usually a run artifact, not a frozen input
- Do not split a phase across half-wired surfaces — hides state, blocks promotion
- Do not let AutoML choose the framing — it optimizes whatever metric you hand it
- Match sampling/CV to temporal, grouped, spatial, or event structure — naive splits leak and inflate
- Keep project knowledge in repo docs, not agent memory
Acquisition and DAG Shape
explore+ should follow references/module-conventions.md: pure functions in
src/<project>/analysis/, function name = output name, parameter name =
dependency name, manual driver fine. Hamilton-style naming encouraged; the
package is optional.
probe: direct fetches acceptableexplore+: reusable command/tool/pipeline if the pull will be rerun or handed off- Notebooks read saved artifacts, not embed live fetch logic
ML and Causal Rules
Read references/modeling-guardrails.md before modeling. Key rules:
- Frame target before any model runs — unexamined framing wastes downstream work
- Match sampling/CV to data structure
- AutoML as comparison accelerator after framing, not as problem framing
- Predictive importance is not causal effect — keep claims separate
- Interpretability artifacts for important outputs (
references/interpretability.md) - Causal design discipline:
references/causal-analysis.md
Reference Guide
Read only what the current mode or task needs.
| Reference | When to read | Key rules |
|---|---|---|
environment-setup.md |
packaging, imports, deps | Use real package setup; do not patch with editor settings |
module-conventions.md |
before explore code |
Pure functions, output-oriented names, manual driver; no durable logic in cells |
marimo-patterns.md |
marimo surface | Verify in marimo itself; explicit paths; no hidden cross-cell state |
hydra-config.md |
entering experiment |
Config separate from computation; no Hydra for one-off probes |
review-workflow.md |
presenting results | Review before presenting; marimo for interactive, qmd for narrative |
artifact-strategy.md |
repeated outputs, comparisons | Materialize clearly; do not mix frozen inputs with run outputs |
dvc-guide.md / kedro-guide.md |
reproducibility, lineage | Add only when beyond ordinary experiment needs |
All references above are in references/.
Runbook
The runbook (runbook.md at the repo root) is the human-readable reproduction
guide. It is derived from project structure — configs, scripts, notebooks,
runs — not maintained as a separate log.
Generate or update at promote time: milestones, handoff requests, or mode transitions. Rewrite stale sections rather than appending.
Contents:
- Prerequisites — env setup, data acquisition, external tools
- Pipeline overview — ASCII diagram of data flow
- Numbered steps — one per major pipeline step, each with: the exact command, expected runtime, and what to inspect afterward (including expected results so the reader knows what "correct" looks like)
- Configuration reference — table of config files and what they control
- Key directories — paths, contents, git-tracked or not
- Troubleshooting — known failure modes and fixes
The runbook ties per-run artifacts together: config.yaml says what params
were used, metrics.json says what happened, the runbook says which command
produced the run, what results to expect, and how to open the review surface.
If .aw/ bookkeeping is active, note the runbook's last-updated timestamp in
.aw/status.json.
Bookkeeping
Maintain .aw/ when work continues across sessions:
.aw/
status.json # current state across all areas
stages/
<area-name>/ # one folder per area of work
plan.md # framing and plan revisions
status.json # machine-readable state
review.md # review notes and approvals
Create or advance a stage folder only on mode changes or user-requested checkpoints. Plan revisions, reruns, and parameter changes belong in the existing folder, not a new one.
Joining mid-project:
.aw/status.jsonexists → authoritativestages/but no root status → reconstruct from folders- Hydra
conf//multirun/without.aw/→ likelyexperiment, confirm - Notebook + modules, no config management → likely
explore - Nothing structured →
probe