inno-experiment-dev
inno-experiment-dev
Canonical Summary
Creates implementation plan, writes project code with judge feedback loop, and submits final experiment run. Use after code-survey in both Idea and Plan branches.
Trigger Rules
Use this skill when the user request matches its research workflow scope. Prefer the bundled resources instead of recreating templates or reference material. Keep outputs traceable to project files, citations, scripts, or upstream evidence.
Resource Use Rules
- Read from
references/only when the current task needs the extra detail.
Execution Contract
- Resolve every relative path from this skill directory first.
- Prefer inspection before mutation when invoking bundled scripts.
- If a required runtime, CLI, credential, or API is unavailable, explain the blocker and continue with the best manual fallback instead of silently skipping the step.
- Do not write generated artifacts back into the skill directory; save them inside the active project workspace.
Upstream Instructions
Inno Experiment Dev (Planning, Implementation, and Submission)
Merges the former inno-implementation-plan, inno-ml-dev-iteration, and the submit step of inno-experiment-submit-refine. Mirrors _create_implementation_plan (830-858), _implement_and_iterate (861-920), and the submit portion of _submit_and_refine_experiments (922-945) in run_infer_idea_ours.py.
Inputs
| Variable | Source | Description |
|---|---|---|
survey_res |
inno-idea-generation or user | The finalized selected idea (or refined_for_downstream) |
references |
pipeline config | Pre-formatted string of source papers |
updated_prepare_res |
inno-prepare-resources | JSON with reference_codebases and reference_paths |
code_survey_res |
inno-code-survey | Comprehensive implementation report / model survey notes |
dataset_description |
from prepare step / context | Description of available datasets (not in instance.json) |
core_code |
instance.json Experiment.core_code |
Absolute path when created by Dr. Claw (e.g. <project_path>/Experiment/core_code); use as-is or resolve with path.join(project_path, value) if relative |
code_references |
instance.json Experiment.code_references |
Absolute path when created by Dr. Claw (e.g. <project_path>/Experiment/code_references); use as-is or resolve if relative |
max_iter_times |
pipeline config | Max judge-iteration rounds (default 2) |
context_variables |
shared state | Mutable dict carrying state across agents |
Plan mode additionally uses ideas and survey-specific prompt variants (build_plan_query_with_survey, build_iteration_query_for_plan, etc.).
Outputs
| Variable | Description |
|---|---|
plan_res |
Detailed implementation plan with dataset, model, training, and testing sections |
ml_dev_res |
Final ML Agent implementation result |
judge_res |
Final Judge Agent feedback |
judge_messages |
Full conversation thread (preserved for inno-experiment-analysis) |
submit_res |
Experiment submission result with statistical outputs |
context_variables |
Updated with dataset_plan, training_plan, testing_plan, suggestion_dict, raw_error_stats |
Cache Artifacts
| File | Agent | Content |
|---|---|---|
Experiment/core_code/logs/coding_plan_agent.json |
Coding Plan Agent | context_variables + messages from planning phase |
Experiment/core_code/logs/machine_learning_agent.json |
ML Agent | Initial implementation messages (+ _iter_{N}.json for judge iterations) |
Experiment/core_code/logs/judge_agent.json |
Judge Agent | Evaluation messages (+ _iter_{N}.json for iterations) |
Experiment/core_code/logs/machine_learning_agent_iter_submit.json |
ML Agent | Submission run messages and results |
Instructions
Phase 1: Create Implementation Plan
Mirrors _create_implementation_plan.
-
Optional pre-step (Idea mode only): If refining the idea for implementation clarity, call the idea refinement agent to produce
refined_for_downstreamwith tensor interfaces and forward-pass sketch. -
Build plan query:
- Idea mode:
plan_query = build_plan_query(survey_res, references, updated_prepare_res, code_survey_res, dataset_description)(seeprompts/build_plan_query.md) - Plan mode: Use
build_plan_query_with_survey(ideas, references, prepare_res, code_survey_res, dataset_description)
- Idea mode:
-
Call Coding Plan Agent with
messages = [{"role": "user", "content": plan_query}].- The agent reviews codebases using
tree/cat, then creates structured plans viaplan_dataset,plan_training,plan_testing. - Calls
case_resolvedto merge plans. - Set
plan_res = plan_messages[-1]["content"]. - See
references/coding_plan_agent.mdfor agent details.
- The agent reviews codebases using
-
Verify the plan has clear sections: dataset, model, training, evaluation, file layout.
Phase 2: Implement and Iterate
Mirrors _implement_and_iterate.
-
Initial implementation: Build
ml_dev_query = build_ml_dev_query(survey_res, prepare_res, code_survey_res, plan_res, dataset_description, core_code, code_references)(seeprompts/build_ml_dev_query.md). Use paths frominstance.json:Experiment.core_code,Experiment.code_references(absolute in Dr. Claw–created projects; use as-is or resolve with project path if relative). Call ML Agent withmessages = [{"role": "user", "content": ml_dev_query}]. Setml_dev_res = ml_messages[-1]["content"].- See
references/ml_agent_instructions.mdfor agent details.
- See
-
Initial judge evaluation: Build
judge_query = build_judge_query(survey_res, prepare_res, plan_res, ml_dev_res)(seeprompts/build_judge_query.md). Call Judge Agent withinput_messages = [{"role": "user", "content": judge_query}]. Setjudge_res = judge_messages[-1]["content"].- See
references/judge_agent_instructions.mdfor agent details.
- See
-
Iteration loop (for i in 0..max_iter_times - 1): a. Build
iteration_query = build_iteration_query(survey_res, prepare_res, code_survey_res, plan_res, ml_dev_res, judge_res, core_code, code_references)(seeprompts/build_iteration_query.md). Use paths from instance.json (absolute in Dr. Claw–created projects; use as-is or resolve if relative). Plan mode usesbuild_iteration_query_for_plan. b. Append as user message tojudge_messages. Call ML Agent withiter_times=i+1. Updateml_dev_res. c. Buildjudge_simple_query = build_judge_simple_query(survey_res, prepare_res, plan_res, ml_dev_res)(seeprompts/build_judge_simple_query.md). Plan mode usesbuild_judge_simple_query_for_plan. d. Append as user message tojudge_messages. Call Judge Agent withiter_times=i+1. Updatejudge_res. e. If"fully_correct": truein last message, break early. -
Preserve
judge_messagesfor the submit step and for downstreaminno-experiment-analysis.
Phase 3: Submit Experiment
Mirrors the submit portion of _submit_and_refine_experiments.
-
Build submit query:
submit_query = build_submit_query(survey_res, ml_dev_res, judge_res, core_code)(seeprompts/build_submit_query.md). Resolvecore_codefrominstance.Experiment.core_code. Plan mode usesbuild_submit_query_for_plan. -
Append to
judge_messagesas user message. Call ML Agent withiter_times="submit".- The agent adjusts epochs (3-10), runs
run_training_testing.py, ensures checkpoints are saved. - Set
submit_res = judge_messages[-1]["content"].
- The agent adjusts epochs (3-10), runs
-
If the implementation is not runnable, ML Agent calls
case_not_resolved. Otherwise,case_resolvedwith statistical results and analysis.
Tool Mappings
All custom Python tools map to Claude Code built-in capabilities:
| Original Tool | Claude Code Equivalent |
|---|---|
execute_command |
Shell tool (direct execution) |
run_python |
python <script> via Shell tool |
create_file / write_file |
Write tool |
read_file |
Read tool or cat <path> |
create_directory |
mkdir -p <path> |
list_files |
ls <path> |
gen_code_tree_structure |
tree -L 3 <path> |
diagnose_code_error |
Analyze stderr output + inspect code |
rollback_and_reimplement |
Re-write file with different approach |
view_error_history |
Track error fingerprints in agent memory |
plan_dataset / plan_training / plan_testing |
Structure plan sections in agent response |
case_resolved / case_not_resolved |
Agent returns result / failure reason |
Checklist
- Optional idea refinement applied if desired (Idea mode).
- Correct
build_plan_queryvariant used for Idea vs Plan mode. - Coding Plan Agent called;
plan_reshas clear dataset/model/training/testing sections. - ML Agent initial implementation completed;
ml_dev_resrecorded. - Judge Agent initial evaluation completed;
judge_resrecorded. - Iteration loop runs with correct prompt variants; early exit on
fully_correct. -
judge_messagespreserved across all phases. - Submit query appended to
judge_messages; ML Agent submission run completed. - Final model checkpoint saved to
Experiment/core_code/checkpoints/model_final.pth. - Cache artifacts saved to
Experiment/core_code/logs/:coding_plan_agent.json,machine_learning_agent.json,judge_agent.json,machine_learning_agent_iter_submit.json.
References
run_infer_idea_ours.py:_create_implementation_plan(830-858),_implement_and_iterate(861-920),_submit_and_refine_experimentssubmit step (922-945)prompt_templates.py:build_plan_query(203-233),build_ml_dev_query(236-381),build_judge_query(384-417),build_iteration_query(420-468),build_judge_simple_query(471-494),build_submit_query(497-527)- Agent definitions:
plan_agent.py,ml_agent.py,judge_agent.pyininno/agents/inno_agent/