inno-code-survey

Canonical Summary

Acquires missing code repositories for the selected idea (Phase A) and conducts comprehensive code survey mapping academic concepts to implementations (Phase B). Outputs acquired_code_repos, updated_prepare_res, and model_survey for downst...

Trigger Rules

Use this skill when the user request matches its research workflow scope. Prefer the bundled resources instead of recreating templates or reference material. Keep outputs traceable to project files, citations, scripts, or upstream evidence.

Resource Use Rules

Read from references/ only when the current task needs the extra detail.
Treat scripts/ as optional helpers. Run them only when their dependencies are available, keep outputs in the project workspace, and explain a manual fallback if execution is blocked.

Execution Contract

Resolve every relative path from this skill directory first.
Prefer inspection before mutation when invoking bundled scripts.
If a required runtime, CLI, credential, or API is unavailable, explain the blocker and continue with the best manual fallback instead of silently skipping the step.
Do not write generated artifacts back into the skill directory; save them inside the active project workspace.

Upstream Instructions

Inno Code Survey (Repo Acquisition + Code Survey)

Merges _acquire_missing_repos, _update_prepare_res_with_new_repos, and _conduct_code_survey from run_infer_idea_ours.py (lines 639–828, 1038–1052) into a single two-phase skill.

Directory structure

skills/inno-code-survey/
├── SKILL.md                                          ← this file
├── prompts/
│   ├── build_repo_acquisition_query.md               ← Phase A query template
│   └── build_code_survey_query.md                    ← Phase B query template
├── references/
│   ├── repo_acquisition_agent.md                     ← Phase A agent system prompt & tools
│   └── code_survey_agent.md                          ← Phase B agent system prompt & tools
└── scripts/
    └── github_search_clone.py                        ← GitHub search + clone helper

Path conventions

All file paths use semantic directory names under the project root:

Path	Contents
`Ideation/references/papers/`	Downloaded arXiv LaTeX sources (`.tex`, `.txt`, `.md`)
`Experiment/code_references/<repo_name>/`	Cloned GitHub repositories
`Experiment/code_references/model_survey.md`	Code survey implementation report
`Experiment/code_references/logs/`	Phase A & B agent cache files

Inputs

These are aligned with outputs from inno-idea-generation and inno-prepare-resources:

Input	Source	Description
`selected_idea`	`Ideation/ideas/selected_idea.txt` or `final_selected_idea_data`	The finalized selected idea (full markdown)
`download_res`	`inno-prepare-resources` output	Result log from downloading arXiv paper sources
`prepare_res`	`inno-prepare-resources` output (JSON)	Contains `reference_codebases` and `reference_paths`
`context_variables`	Shared context dict	Accumulated pipeline context
`instance.json`	`<project_path>/instance.json`	Paths are absolute when created by Dr. Claw (`Experiment.code_references`, `Ideation.references`); use as-is or resolve with `path.join(project_path, value)` if relative. Also `date_limit` from context.

Outputs

Output	Description	Consumer
`acquired_code_repos`	Dict of `{name: path}` for newly cloned repos	Phase B, cache
`updated_prepare_res`	`prepare_res` JSON with new repos merged into `reference_codebases` / `reference_paths`	Downstream pipeline
`extra_repo_info`	Formatted string listing acquired repos	Phase B query
`model_survey`	Comprehensive code survey implementation report	`inno-experiment-dev`

Phase A — Repo Acquisition

Full template & parameter docs: prompts/build_repo_acquisition_query.md Agent system prompt & tools: references/repo_acquisition_agent.md

Maps to _acquire_missing_repos (lines 745–792) + _update_prepare_res_with_new_repos (lines 639–686).

Step A1: Analyze the selected idea and identify gaps

Read selected_idea and identify 2–3 missing technical components — novel or specialized parts that are likely NOT in the standard repos already present in Experiment/code_references/.

Step A2: Search GitHub using the "Cascade" strategy

For each missing component, perform 6 distinct queries using progressive decomposition:

Level 1 (Specific): Search for the exact mechanism name
Level 2 (Broad): Strip context adjectives, search core technique
Level 3 (Atomic): Search for 3 base mathematical operators

Use the helper script or GitHub API directly:

# Option 1: Helper script
python scripts/github_search_clone.py --query "sinkhorn attention pytorch" --limit 5 --date-limit 2025-12-31

# Option 2: Direct GitHub API via curl
curl -s "https://api.github.com/search/repositories?q=sinkhorn+attention&per_page=5" \
  -H "Accept: application/vnd.github.v3+json"

Step A3: Clone selected repos

Clone the best candidate for each gap into Experiment/code_references/:

GIT_TERMINAL_PROMPT=0 git clone --depth 1 <clone_url> Experiment/code_references/<repo_name>

Step A4: Verify each clone

For each cloned repo:

Read README.md: cat Experiment/code_references/<repo_name>/README.md
Check language and domain relevance
Reject repos that don't match (wrong domain, empty, HTML-only)

Step A5: Build `acquired_code_repos` and update `prepare_res`

Build acquired_code_repos dict from verified clones:

{
  "repo_name_1": "Experiment/code_references/repo_name_1",
  "repo_name_2": "Experiment/code_references/repo_name_2"
}

Set context_variables["acquired_code_repos"] = acquired_code_repos
Parse prepare_res JSON, ensure reference_codebases and reference_paths arrays exist
For each entry in acquired_code_repos, if path not already in reference_paths:
- Append repo name to reference_codebases
- Append repo path to reference_paths
Serialize back to JSON as updated_prepare_res

Step A6: Save Phase A cache

Build extra_repo_info string:

- Name: <name1> | Path: <path1>
- Name: <name2> | Path: <path2>

(Empty string if no repos acquired)

Write Experiment/code_references/logs/repo_acquisition_agent.json:

{
  "context_variables": {
    "code_references_path": "<instance.Experiment.code_references if absolute (Dr. Claw), else path.join(project_path, ...)>",
    "references_path": "<instance.Ideation.references if absolute (Dr. Claw), else path.join(project_path, ...)>",
    "date_limit": "YYYY-MM-DD",
    "prepare_result": { ... },
    "acquired_code_repos": {
      "<name>": "<path>",
      ...
    },
    "updated_prepare_res": "<JSON string of updated prepare_res>"
  }
}

Phase B — Code Survey

Full template & parameter docs: prompts/build_code_survey_query.md Agent system prompt & tools: references/code_survey_agent.md

Maps to _conduct_code_survey (lines 794–828).

Step B1: Build the code survey query

Construct the query using selected_idea, download_res, and extra_repo_info (from Phase A):

I have an innovative idea related to machine learning:
{selected_idea}

I have carefully gone through these papers' github repositories and found download
some of them in my local machine, in the directory `Experiment/code_references/`, use `ls`, `tree`,
and `find` to navigate the directory.
And I have also downloaded the corresponding paper (LaTeX sources, markdown, txt),
with the following information:
{download_res}

{extra_repo_info_block}

Your task is to carefully understand the innovative idea, and thoroughly review
codebases and generate a comprehensive implementation report for the innovative
idea. You can NOT stop to review the codebases until you have get all academic
concepts in the innovative idea.

Note that the code implementation should be as complete as possible.

Step B2: Survey all repos in `Experiment/code_references/`

Use Linux commands to navigate and read code:

Action	Command
List repos	`ls Experiment/code_references/` or `tree Experiment/code_references/ -L 1`
View repo structure	`tree Experiment/code_references/<repo>/ -L 3`
Find Python files	`find Experiment/code_references/<repo>/ -name "*.py" -type f`
Read source file	`cat Experiment/code_references/<repo>/model/attention.py`
Search across repos	`rg "class.*Attention" Experiment/code_references/` or `grep -rn "sinkhorn" Experiment/code_references/`
Read specific lines	`sed -n '100,200p' Experiment/code_references/<repo>/file.py`

Step B3: Map each innovative module to code

For each atomic academic concept in the idea:

Identify the mathematical formula
Locate the corresponding implementation across repos
Extract complete code snippets with file paths and function signatures

Step B4: Generate comprehensive implementation report

The report must include for each concept:

Academic definition — the concept name
Mathematical formula — precise formulation
Code implementation — real code from repos (not pseudocode)
Reference papers — which papers define this
Reference codebases — which repos implement it, with file paths

Step B5: Store result

Set context_variables["model_survey"] = code_survey_response (the full implementation report text).

Step B6: Save Phase B cache

Write Experiment/code_references/logs/code_survey_agent.json:

{
  "context_variables": {
    "code_references_path": "<instance.Experiment.code_references if absolute (Dr. Claw), else path.join(project_path, ...)>",
    "references_path": "<instance.Ideation.references if absolute (Dr. Claw), else path.join(project_path, ...)>",
    "date_limit": "YYYY-MM-DD",
    "prepare_result": { ... },
    "acquired_code_repos": { ... },
    "notes": [
      {
        "definition": "<atomic concept>",
        "math_formula": "<formula>",
        "code_implementation": "<code snippet>",
        "reference_papers": ["<paper1>"],
        "reference_codebases": ["<repo1>"]
      }
    ],
    "model_survey": "<FULL text of the comprehensive implementation report>"
  }
}

IMPORTANT: The model_survey field must contain the complete report text — never a summary or abbreviation.

Tool mappings (reference -> Linux/Claude Code)

Reference tool	Replacement
`search_github_repos_wrapper`	`python scripts/github_search_clone.py --query "..." --limit 5` or `curl` to GitHub API
`tracked_execute_command` (git clone)	`GIT_TERMINAL_PROMPT=0 git clone --depth 1 <url> Experiment/code_references/<name>`
`list_files`	`ls`, `find`, `tree`
`read_file`	`cat`, `head`, `tail`, `sed -n`
`gen_code_tree_structure`	`tree -L 3`
`terminal_page_down/up/to`	N/A (not needed with `cat`/`less`)
`search_github_code`	`rg`, `grep -rn` across local repos

Checklist

Phase A (Repo Acquisition)

Selected idea analyzed; 2-3 missing components identified
Cascade search performed (6 queries per gap across 3 levels)
Best candidates cloned into Experiment/code_references/
Each clone verified (README.md read, domain/language checked)
context_variables["acquired_code_repos"] set as dict {name: path}
prepare_res updated with new reference_codebases / reference_paths
extra_repo_info string built for Phase B
Experiment/code_references/logs/repo_acquisition_agent.json written

Phase B (Code Survey)

Code survey query built with selected_idea + download_res + extra_repo_info
All repos in Experiment/code_references/ surveyed using tree, cat, grep, find
Every atomic academic concept in the idea has matching code identified
Implementation report includes: code snippets, file paths, function signatures, formula-to-code mappings
context_variables["model_survey"] set with full report text
Experiment/code_references/logs/code_survey_agent.json written with complete model_survey

References

run_infer_idea_ours.py: _acquire_missing_repos (745–792), _update_prepare_res_with_new_repos (639–686), _conduct_code_survey (794–828)
Prompts: build_repo_acquisition_query (prompt_templates.py:153–171), build_code_survey_query (prompt_templates.py:173–200)
Agents: repo_agent.py (Repo Acquisition Agent definition + tools), survey_agent.py (Code Survey Agent definition)
Cache examples: repo_acquisition_agent.json, code_survey_agent.json from reference pipeline output

inno-code-survey

inno-code-survey

Canonical Summary

Trigger Rules

Resource Use Rules

Execution Contract

Upstream Instructions

Inno Code Survey (Repo Acquisition + Code Survey)

Directory structure

Path conventions

Inputs

Outputs

Phase A — Repo Acquisition

Step A1: Analyze the selected idea and identify gaps

Step A2: Search GitHub using the "Cascade" strategy

Step A3: Clone selected repos

Step A4: Verify each clone

Step A5: Build `acquired_code_repos` and update `prepare_res`

Step A6: Save Phase A cache

Phase B — Code Survey

Step B1: Build the code survey query

Step B2: Survey all repos in `Experiment/code_references/`

Step B3: Map each innovative module to code

Step B4: Generate comprehensive implementation report

Step B5: Store result

Step B6: Save Phase B cache

Tool mappings (reference -> Linux/Claude Code)

Checklist

Phase A (Repo Acquisition)

Phase B (Code Survey)

References

More from ligphidonk/oh-my--paper

biorxiv-database

inno-paper-reviewer

ml-paper-writing

paper-analyzer

research-news

inno-figure-gen

inno-code-survey

inno-code-survey

Canonical Summary

Trigger Rules

Resource Use Rules

Execution Contract

Upstream Instructions

Inno Code Survey (Repo Acquisition + Code Survey)

Directory structure

Path conventions

Inputs

Outputs

Phase A — Repo Acquisition

Step A1: Analyze the selected idea and identify gaps

Step A2: Search GitHub using the "Cascade" strategy

Step A3: Clone selected repos

Step A4: Verify each clone

Step A5: Build acquired_code_repos and update prepare_res

Step A6: Save Phase A cache

Phase B — Code Survey

Step B1: Build the code survey query

Step B2: Survey all repos in Experiment/code_references/

Step B3: Map each innovative module to code

Step B4: Generate comprehensive implementation report

Step B5: Store result

Step B6: Save Phase B cache

Tool mappings (reference -> Linux/Claude Code)

Checklist

Phase A (Repo Acquisition)

Phase B (Code Survey)

References

More from ligphidonk/oh-my--paper

biorxiv-database

inno-paper-reviewer

ml-paper-writing

paper-analyzer

research-news

inno-figure-gen

Step A5: Build `acquired_code_repos` and update `prepare_res`

Step B2: Survey all repos in `Experiment/code_references/`