inno-code-survey
inno-code-survey
Canonical Summary
Acquires missing code repositories for the selected idea (Phase A) and conducts comprehensive code survey mapping academic concepts to implementations (Phase B). Outputs acquired_code_repos, updated_prepare_res, and model_survey for downst...
Trigger Rules
Use this skill when the user request matches its research workflow scope. Prefer the bundled resources instead of recreating templates or reference material. Keep outputs traceable to project files, citations, scripts, or upstream evidence.
Resource Use Rules
- Read from
references/only when the current task needs the extra detail. - Treat
scripts/as optional helpers. Run them only when their dependencies are available, keep outputs in the project workspace, and explain a manual fallback if execution is blocked.
Execution Contract
- Resolve every relative path from this skill directory first.
- Prefer inspection before mutation when invoking bundled scripts.
- If a required runtime, CLI, credential, or API is unavailable, explain the blocker and continue with the best manual fallback instead of silently skipping the step.
- Do not write generated artifacts back into the skill directory; save them inside the active project workspace.
Upstream Instructions
Inno Code Survey (Repo Acquisition + Code Survey)
Merges _acquire_missing_repos, _update_prepare_res_with_new_repos, and _conduct_code_survey from run_infer_idea_ours.py (lines 639–828, 1038–1052) into a single two-phase skill.
Directory structure
skills/inno-code-survey/
├── SKILL.md ← this file
├── prompts/
│ ├── build_repo_acquisition_query.md ← Phase A query template
│ └── build_code_survey_query.md ← Phase B query template
├── references/
│ ├── repo_acquisition_agent.md ← Phase A agent system prompt & tools
│ └── code_survey_agent.md ← Phase B agent system prompt & tools
└── scripts/
└── github_search_clone.py ← GitHub search + clone helper
Path conventions
All file paths use semantic directory names under the project root:
| Path | Contents |
|---|---|
Ideation/references/papers/ |
Downloaded arXiv LaTeX sources (.tex, .txt, .md) |
Experiment/code_references/<repo_name>/ |
Cloned GitHub repositories |
Experiment/code_references/model_survey.md |
Code survey implementation report |
Experiment/code_references/logs/ |
Phase A & B agent cache files |
Inputs
These are aligned with outputs from inno-idea-generation and inno-prepare-resources:
| Input | Source | Description |
|---|---|---|
selected_idea |
Ideation/ideas/selected_idea.txt or final_selected_idea_data |
The finalized selected idea (full markdown) |
download_res |
inno-prepare-resources output |
Result log from downloading arXiv paper sources |
prepare_res |
inno-prepare-resources output (JSON) |
Contains reference_codebases and reference_paths |
context_variables |
Shared context dict | Accumulated pipeline context |
instance.json |
<project_path>/instance.json |
Paths are absolute when created by Dr. Claw (Experiment.code_references, Ideation.references); use as-is or resolve with path.join(project_path, value) if relative. Also date_limit from context. |
Outputs
| Output | Description | Consumer |
|---|---|---|
acquired_code_repos |
Dict of {name: path} for newly cloned repos |
Phase B, cache |
updated_prepare_res |
prepare_res JSON with new repos merged into reference_codebases / reference_paths |
Downstream pipeline |
extra_repo_info |
Formatted string listing acquired repos | Phase B query |
model_survey |
Comprehensive code survey implementation report | inno-experiment-dev |
Phase A — Repo Acquisition
Full template & parameter docs: prompts/build_repo_acquisition_query.md Agent system prompt & tools: references/repo_acquisition_agent.md
Maps to _acquire_missing_repos (lines 745–792) + _update_prepare_res_with_new_repos (lines 639–686).
Step A1: Analyze the selected idea and identify gaps
Read selected_idea and identify 2–3 missing technical components — novel or specialized parts that are likely NOT in the standard repos already present in Experiment/code_references/.
Step A2: Search GitHub using the "Cascade" strategy
For each missing component, perform 6 distinct queries using progressive decomposition:
- Level 1 (Specific): Search for the exact mechanism name
- Level 2 (Broad): Strip context adjectives, search core technique
- Level 3 (Atomic): Search for 3 base mathematical operators
Use the helper script or GitHub API directly:
# Option 1: Helper script
python scripts/github_search_clone.py --query "sinkhorn attention pytorch" --limit 5 --date-limit 2025-12-31
# Option 2: Direct GitHub API via curl
curl -s "https://api.github.com/search/repositories?q=sinkhorn+attention&per_page=5" \
-H "Accept: application/vnd.github.v3+json"
Step A3: Clone selected repos
Clone the best candidate for each gap into Experiment/code_references/:
GIT_TERMINAL_PROMPT=0 git clone --depth 1 <clone_url> Experiment/code_references/<repo_name>
Step A4: Verify each clone
For each cloned repo:
- Read
README.md:cat Experiment/code_references/<repo_name>/README.md - Check language and domain relevance
- Reject repos that don't match (wrong domain, empty, HTML-only)
Step A5: Build acquired_code_repos and update prepare_res
- Build
acquired_code_reposdict from verified clones:{ "repo_name_1": "Experiment/code_references/repo_name_1", "repo_name_2": "Experiment/code_references/repo_name_2" } - Set
context_variables["acquired_code_repos"] = acquired_code_repos - Parse
prepare_resJSON, ensurereference_codebasesandreference_pathsarrays exist - For each entry in
acquired_code_repos, ifpathnot already inreference_paths:- Append repo name to
reference_codebases - Append repo path to
reference_paths
- Append repo name to
- Serialize back to JSON as
updated_prepare_res
Step A6: Save Phase A cache
-
Build
extra_repo_infostring:- Name: <name1> | Path: <path1> - Name: <name2> | Path: <path2>(Empty string if no repos acquired)
-
Write
Experiment/code_references/logs/repo_acquisition_agent.json:{ "context_variables": { "code_references_path": "<instance.Experiment.code_references if absolute (Dr. Claw), else path.join(project_path, ...)>", "references_path": "<instance.Ideation.references if absolute (Dr. Claw), else path.join(project_path, ...)>", "date_limit": "YYYY-MM-DD", "prepare_result": { ... }, "acquired_code_repos": { "<name>": "<path>", ... }, "updated_prepare_res": "<JSON string of updated prepare_res>" } }
Phase B — Code Survey
Full template & parameter docs: prompts/build_code_survey_query.md Agent system prompt & tools: references/code_survey_agent.md
Maps to _conduct_code_survey (lines 794–828).
Step B1: Build the code survey query
Construct the query using selected_idea, download_res, and extra_repo_info (from Phase A):
I have an innovative idea related to machine learning:
{selected_idea}
I have carefully gone through these papers' github repositories and found download
some of them in my local machine, in the directory `Experiment/code_references/`, use `ls`, `tree`,
and `find` to navigate the directory.
And I have also downloaded the corresponding paper (LaTeX sources, markdown, txt),
with the following information:
{download_res}
{extra_repo_info_block}
Your task is to carefully understand the innovative idea, and thoroughly review
codebases and generate a comprehensive implementation report for the innovative
idea. You can NOT stop to review the codebases until you have get all academic
concepts in the innovative idea.
Note that the code implementation should be as complete as possible.
Step B2: Survey all repos in Experiment/code_references/
Use Linux commands to navigate and read code:
| Action | Command |
|---|---|
| List repos | ls Experiment/code_references/ or tree Experiment/code_references/ -L 1 |
| View repo structure | tree Experiment/code_references/<repo>/ -L 3 |
| Find Python files | find Experiment/code_references/<repo>/ -name "*.py" -type f |
| Read source file | cat Experiment/code_references/<repo>/model/attention.py |
| Search across repos | rg "class.*Attention" Experiment/code_references/ or grep -rn "sinkhorn" Experiment/code_references/ |
| Read specific lines | sed -n '100,200p' Experiment/code_references/<repo>/file.py |
Step B3: Map each innovative module to code
For each atomic academic concept in the idea:
- Identify the mathematical formula
- Locate the corresponding implementation across repos
- Extract complete code snippets with file paths and function signatures
Step B4: Generate comprehensive implementation report
The report must include for each concept:
- Academic definition — the concept name
- Mathematical formula — precise formulation
- Code implementation — real code from repos (not pseudocode)
- Reference papers — which papers define this
- Reference codebases — which repos implement it, with file paths
Step B5: Store result
Set context_variables["model_survey"] = code_survey_response (the full implementation report text).
Step B6: Save Phase B cache
Write Experiment/code_references/logs/code_survey_agent.json:
{
"context_variables": {
"code_references_path": "<instance.Experiment.code_references if absolute (Dr. Claw), else path.join(project_path, ...)>",
"references_path": "<instance.Ideation.references if absolute (Dr. Claw), else path.join(project_path, ...)>",
"date_limit": "YYYY-MM-DD",
"prepare_result": { ... },
"acquired_code_repos": { ... },
"notes": [
{
"definition": "<atomic concept>",
"math_formula": "<formula>",
"code_implementation": "<code snippet>",
"reference_papers": ["<paper1>"],
"reference_codebases": ["<repo1>"]
}
],
"model_survey": "<FULL text of the comprehensive implementation report>"
}
}
IMPORTANT: The model_survey field must contain the complete report text — never a summary or abbreviation.
Tool mappings (reference -> Linux/Claude Code)
| Reference tool | Replacement |
|---|---|
search_github_repos_wrapper |
python scripts/github_search_clone.py --query "..." --limit 5 or curl to GitHub API |
tracked_execute_command (git clone) |
GIT_TERMINAL_PROMPT=0 git clone --depth 1 <url> Experiment/code_references/<name> |
list_files |
ls, find, tree |
read_file |
cat, head, tail, sed -n |
gen_code_tree_structure |
tree -L 3 |
terminal_page_down/up/to |
N/A (not needed with cat/less) |
search_github_code |
rg, grep -rn across local repos |
Checklist
Phase A (Repo Acquisition)
- Selected idea analyzed; 2-3 missing components identified
- Cascade search performed (6 queries per gap across 3 levels)
- Best candidates cloned into
Experiment/code_references/ - Each clone verified (README.md read, domain/language checked)
-
context_variables["acquired_code_repos"]set as dict{name: path} -
prepare_resupdated with newreference_codebases/reference_paths -
extra_repo_infostring built for Phase B -
Experiment/code_references/logs/repo_acquisition_agent.jsonwritten
Phase B (Code Survey)
- Code survey query built with
selected_idea+download_res+extra_repo_info - All repos in
Experiment/code_references/surveyed usingtree,cat,grep,find - Every atomic academic concept in the idea has matching code identified
- Implementation report includes: code snippets, file paths, function signatures, formula-to-code mappings
-
context_variables["model_survey"]set with full report text -
Experiment/code_references/logs/code_survey_agent.jsonwritten with completemodel_survey
References
run_infer_idea_ours.py:_acquire_missing_repos(745–792),_update_prepare_res_with_new_repos(639–686),_conduct_code_survey(794–828)- Prompts:
build_repo_acquisition_query(prompt_templates.py:153–171),build_code_survey_query(prompt_templates.py:173–200) - Agents:
repo_agent.py(Repo Acquisition Agent definition + tools),survey_agent.py(Code Survey Agent definition) - Cache examples:
repo_acquisition_agent.json,code_survey_agent.jsonfrom reference pipeline output