auto-review-loop-llm
Auto Review Loop (Generic LLM): Autonomous Research Improvement
Autonomously iterate: review → implement fixes → re-review, until the external reviewer gives a positive assessment or MAX_ROUNDS is reached.
Context: $ARGUMENTS
Constants
- MAX_ROUNDS = 4
- POSITIVE_THRESHOLD: score >= 6/10, or verdict contains "accept", "sufficient", "ready for submission"
- REVIEW_DOC:
review-stage/AUTO_REVIEW.md(cumulative log) (fall back to./AUTO_REVIEW.mdfor legacy projects)
LLM Configuration
This skill uses any OpenAI-compatible API for external review via the llm-chat MCP server.
Configuration via MCP Server (Recommended)
Add to ~/.claude/settings.json:
{
"mcpServers": {
"llm-chat": {
"command": "/usr/bin/python3",
"args": ["/Users/yourname/.claude/mcp-servers/llm-chat/server.py"],
"env": {
"LLM_API_KEY": "your-api-key",
"LLM_BASE_URL": "https://api.deepseek.com/v1",
"LLM_MODEL": "deepseek-chat"
}
}
}
}
Supported Providers
| Provider | LLM_BASE_URL | LLM_MODEL |
|---|---|---|
| OpenAI | https://api.openai.com/v1 |
gpt-4o, o3 |
| DeepSeek | https://api.deepseek.com/v1 |
deepseek-chat, deepseek-reasoner |
| MiniMax | https://api.minimax.io/v1 |
MiniMax-M2.7 |
| Kimi (Moonshot) | https://api.moonshot.cn/v1 |
moonshot-v1-8k, moonshot-v1-32k |
| ZhiPu (GLM) | https://open.bigmodel.cn/api/paas/v4 |
glm-4, glm-4-plus |
| SiliconFlow | https://api.siliconflow.cn/v1 |
Qwen/Qwen2.5-72B-Instruct |
| 阿里云百炼 | https://dashscope.aliyuncs.com/compatible-mode/v1 |
qwen-max |
| 零一万物 | https://api.lingyiwanwu.com/v1 |
yi-large |
API Call Method
Primary: MCP Tool
mcp__llm-chat__chat:
prompt: |
[Review prompt content]
model: "deepseek-chat"
system: "You are a senior ML reviewer..."
Fallback: curl
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer..."},
{"role": "user", "content": "[review prompt]"}
],
"max_tokens": 4096
}'
State Persistence (Compact Recovery)
Persist state to review-stage/REVIEW_STATE.json after each round:
{
"round": 2,
"status": "in_progress",
"last_score": 5.0,
"last_verdict": "not ready",
"pending_experiments": [],
"timestamp": "2026-03-15T10:00:00"
}
Write this file at the end of every Phase E (after documenting the round).
On completion, set "status": "completed".
Workflow
Initialization
- Check
review-stage/REVIEW_STATE.jsonfor recovery (fall back to./REVIEW_STATE.jsonif not found — legacy path) - Read project context and prior reviews
- Initialize round counter
Loop (up to MAX_ROUNDS)
Phase A: Review
If MCP available:
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
[Full research context: claims, methods, results, known weaknesses]
[Changes since last round, if any]
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
If MCP NOT available:
curl -s "${LLM_BASE_URL}/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${LLM_API_KEY}" \
-d '{
"model": "${LLM_MODEL}",
"messages": [
{"role": "system", "content": "You are a senior ML reviewer (NeurIPS/ICML level)."},
{"role": "user", "content": "[Full review prompt]"}
],
"max_tokens": 4096
}'
Phase B: Parse Assessment
CRITICAL: Save the FULL raw response verbatim. Then extract:
- Score (numeric 1-10)
- Verdict ("ready" / "almost" / "not ready")
- Action items (ranked list of fixes)
STOP: If score >= 6 AND verdict contains "ready/almost"
Phase C: Implement Fixes
Priority: metric additions > reframing > new experiments
Phase D: Wait for Results
Monitor remote experiments
Phase E: Document Round
Append to review-stage/AUTO_REVIEW.md:
## Round N (timestamp)
### Assessment (Summary)
- Score: X/10
- Verdict: [ready/almost/not ready]
- Key criticisms: [bullet list]
### Reviewer Raw Response
<details>
<summary>Click to expand full reviewer response</summary>
[Paste the COMPLETE raw response here — verbatim, unedited.]
</details>
### Actions Taken
- [what was implemented/changed]
### Results
- [experiment outcomes, if any]
### Status
- [continuing to round N+1 / stopping]
Write review-stage/REVIEW_STATE.json with current state.
Termination
- Set
review-stage/REVIEW_STATE.jsonstatus to "completed" - Write final summary
Key Rules
-
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (
cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently. -
Anti-hallucination citations: When adding references, NEVER fabricate BibTeX. Use DBLP → CrossRef →
[VERIFY]chain. Do NOT generate BibTeX from memory. -
Be honest about weaknesses
-
Implement fixes BEFORE re-reviewing
-
Document everything
-
Include previous context in round 2+ prompts
-
Prefer MCP tool over curl when available
Prompt Template for Round 2+
mcp__llm-chat__chat:
system: "You are a senior ML reviewer (NeurIPS/ICML level)."
prompt: |
[Round N/MAX_ROUNDS of autonomous review loop]
## Previous Review Summary (Round N-1)
- Previous Score: X/10
- Previous Verdict: [ready/almost/not ready]
- Previous Key Weaknesses: [list]
## Changes Since Last Review
1. [Action 1]: [result]
2. [Action 2]: [result]
## Updated Results
[paste updated metrics/tables]
Please re-score and re-assess:
1. Score this work 1-10 for a top venue
2. List remaining critical weaknesses (ranked by severity)
3. For each weakness, specify the MINIMUM fix
4. State clearly: is this READY for submission? Yes/No/Almost
Be brutally honest. If the work is ready, say so clearly.
Output Protocols
Follow these shared protocols for all output files:
- Output Versioning Protocol — write timestamped file first, then copy to fixed name
- Output Manifest Protocol — log every output to MANIFEST.md
- Output Language Protocol — respect the project's language setting
More from shaun-z/auto-claude-code-research-in-sleep
arxiv
Search, download, and summarize academic papers from arXiv. Use when user says "search arxiv", "download paper", "fetch arxiv", "arxiv search", "get paper pdf", or wants to find and save papers from arXiv to the local paper library.
9research-pipeline
Full research pipeline: Workflow 1 (idea discovery) → implementation → Workflow 2 (auto review loop) → Workflow 3 (paper writing, optional). Goes from a broad research direction all the way to a polished PDF. Use when user says \"全流程\", \"full pipeline\", \"从找idea到投稿\", \"end-to-end research\", or wants the complete autonomous research lifecycle.
9mermaid-diagram
Generate Mermaid diagrams from user requirements. Saves .mmd and .md files to figures/ directory with syntax verification. Supports flowcharts, sequence diagrams, class diagrams, ER diagrams, Gantt charts, and 18 more diagram types.
9paper-writing
Workflow 3: Full paper writing pipeline. Orchestrates paper-plan → paper-figure → figure-spec/paper-illustration/mermaid-diagram → paper-write → paper-compile → auto-paper-improvement-loop to go from a narrative report to a polished, submission-ready PDF. Use when user says \"写论文全流程\", \"write paper pipeline\", \"从报告到PDF\", \"paper writing\", or wants the complete paper generation workflow.
8research-lit
Search and analyze research papers, find related work, summarize key ideas. Use when user says "find papers", "related work", "literature review", "what does this paper say", or needs to understand academic papers.
8auto-review-loop
Autonomous multi-round research review loop. Repeatedly reviews via Codex MCP, implements fixes, and re-reviews until positive assessment or max rounds reached. Use when user says "auto review loop", "review until it passes", or wants autonomous iterative improvement.
8