skills/aradotso/trending-skills/autoresearchclaw-autonomous-research

autoresearchclaw-autonomous-research

SKILL.md

AutoResearchClaw — Autonomous Research Pipeline

Skill by ara.so — Daily 2026 Skills collection.

AutoResearchClaw is a fully autonomous 23-stage research pipeline that takes a natural language topic and produces a complete academic paper: real arXiv/Semantic Scholar citations, sandboxed experiments, statistical analysis, multi-agent peer review, and conference-ready LaTeX (NeurIPS/ICML/ICLR). No hallucinated references. No human babysitting.


Installation

# Clone and install
git clone https://github.com/aiming-lab/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .

# Verify CLI is available
researchclaw --help

Requirements: Python 3.11+


Configuration

cp config.researchclaw.example.yaml config.arc.yaml

Minimum config (config.arc.yaml)

project:
  name: "my-research"

research:
  topic: "Your research topic here"

llm:
  provider: "openai"
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"
export OPENAI_API_KEY="$YOUR_OPENAI_KEY"

OpenRouter config (200+ models)

llm:
  provider: "openrouter"
  api_key_env: "OPENROUTER_API_KEY"
  primary_model: "anthropic/claude-3.5-sonnet"
  fallback_models:
    - "google/gemini-pro-1.5"
    - "meta-llama/llama-3.1-70b-instruct"
export OPENROUTER_API_KEY="$YOUR_OPENROUTER_KEY"

ACP (Agent Client Protocol) — no API key needed

llm:
  provider: "acp"
  acp:
    agent: "claude"   # or: codex, gemini, opencode, kimi
    cwd: "."

The agent CLI (e.g. claude) handles its own authentication.

OpenClaw bridge (optional advanced capabilities)

openclaw_bridge:
  use_cron: true              # Scheduled research runs
  use_message: true           # Progress notifications
  use_memory: true            # Cross-session knowledge persistence
  use_sessions_spawn: true    # Parallel sub-sessions
  use_web_fetch: true         # Live web search in literature review
  use_browser: false          # Browser-based paper collection

Key CLI Commands

# Basic run — fully autonomous, no prompts
researchclaw run --topic "Your research idea" --auto-approve

# Run with explicit config file
researchclaw run --config config.arc.yaml --topic "Mixture-of-experts routing efficiency" --auto-approve

# Run with topic defined in config (omit --topic flag)
researchclaw run --config config.arc.yaml --auto-approve

# Interactive mode — pauses at gate stages for approval
researchclaw run --config config.arc.yaml --topic "Your topic"

# Check pipeline status / resume a run
researchclaw status --run-id rc-20260315-120000-abc123

# List past runs
researchclaw list

Gate stages (5, 9, 20) pause for human approval in interactive mode. Pass --auto-approve to skip all gates.


Python API

from researchclaw.pipeline import Runner
from researchclaw.config import load_config

# Load config and run
config = load_config("config.arc.yaml")
config.research.topic = "Efficient attention mechanisms for long-context LLMs"
config.auto_approve = True

runner = Runner(config)
result = runner.run()

# Access outputs
print(result.artifact_dir)          # artifacts/rc-YYYYMMDD-HHMMSS-<hash>/
print(result.deliverables_dir)      # .../deliverables/
print(result.paper_draft_path)      # .../deliverables/paper_draft.md
print(result.latex_path)            # .../deliverables/paper.tex
print(result.bibtex_path)           # .../deliverables/references.bib
print(result.verification_report)  # .../deliverables/verification_report.json
# Run specific stages only
from researchclaw.pipeline import Runner, StageRange

runner = Runner(config)
result = runner.run(stages=StageRange(start="LITERATURE_COLLECT", end="KNOWLEDGE_EXTRACT"))
# Access knowledge base after a run
from researchclaw.knowledge import KnowledgeBase

kb = KnowledgeBase.load(result.artifact_dir)
findings = kb.get("findings")
literature = kb.get("literature")
decisions = kb.get("decisions")

Output Structure

After a run, all outputs land in artifacts/rc-YYYYMMDD-HHMMSS-<hash>/:

artifacts/rc-20260315-120000-abc123/
├── deliverables/
│   ├── paper_draft.md          # Full academic paper (Markdown)
│   ├── paper.tex               # Conference-ready LaTeX
│   ├── references.bib          # Real BibTeX — auto-pruned to inline citations
│   ├── verification_report.json # 4-layer citation integrity report
│   └── reviews.md              # Multi-agent peer review
├── experiment_runs/
│   ├── run_001/
│   │   ├── code/               # Generated experiment code
│   │   ├── results.json        # Structured metrics
│   │   └── sandbox_output.txt  # Execution logs
├── charts/
│   └── *.png                   # Auto-generated comparison charts
├── evolution/
│   └── lessons.json            # Self-learning lessons for future runs
└── knowledge_base/
    ├── decisions.json
    ├── experiments.json
    ├── findings.json
    ├── literature.json
    ├── questions.json
    └── reviews.json

Pipeline Stages Reference

Phase Stage # Name Notes
A 1 TOPIC_INIT Parse and scope research topic
A 2 PROBLEM_DECOMPOSE Break into sub-problems
B 3 SEARCH_STRATEGY Build search queries
B 4 LITERATURE_COLLECT Real API calls to arXiv + Semantic Scholar
B 5 LITERATURE_SCREEN Gate — approve/reject literature
B 6 KNOWLEDGE_EXTRACT Extract structured knowledge
C 7 SYNTHESIS Synthesize findings
C 8 HYPOTHESIS_GEN Multi-agent debate to form hypotheses
D 9 EXPERIMENT_DESIGN Gate — approve/reject design
D 10 CODE_GENERATION Generate experiment code
D 11 RESOURCE_PLANNING GPU/MPS/CPU auto-detection
E 12 EXPERIMENT_RUN Sandboxed execution
E 13 ITERATIVE_REFINE Self-healing on failure
F 14 RESULT_ANALYSIS Multi-agent analysis
F 15 RESEARCH_DECISION PROCEED / REFINE / PIVOT
G 16 PAPER_OUTLINE Structure paper
G 17 PAPER_DRAFT Write full paper
G 18 PEER_REVIEW Evidence-consistency check
G 19 PAPER_REVISION Incorporate review feedback
H 20 QUALITY_GATE Gate — final approval
H 21 KNOWLEDGE_ARCHIVE Save lessons to KB
H 22 EXPORT_PUBLISH Emit LaTeX + BibTeX
H 23 CITATION_VERIFY 4-layer anti-hallucination check

Common Patterns

Pattern: Quick paper on a topic

export OPENAI_API_KEY="$OPENAI_API_KEY"
researchclaw run \
  --topic "Self-supervised learning for protein structure prediction" \
  --auto-approve

Pattern: Reproducible run with full config

# config.arc.yaml
project:
  name: "protein-ssl-research"

research:
  topic: "Self-supervised learning for protein structure prediction"

llm:
  provider: "openai"
  api_key_env: "OPENAI_API_KEY"
  primary_model: "gpt-4o"
  fallback_models: ["gpt-4o-mini"]

experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"
  max_iterations: 3
  timeout_seconds: 300
researchclaw run --config config.arc.yaml --auto-approve

Pattern: Use Claude via OpenRouter for best reasoning

export OPENROUTER_API_KEY="$OPENROUTER_API_KEY"

cat > config.arc.yaml << 'EOF'
project:
  name: "my-research"
llm:
  provider: "openrouter"
  api_key_env: "OPENROUTER_API_KEY"
  primary_model: "anthropic/claude-3.5-sonnet"
  fallback_models: ["google/gemini-pro-1.5"]
experiment:
  mode: "sandbox"
  sandbox:
    python_path: ".venv/bin/python"
EOF

researchclaw run --config config.arc.yaml \
  --topic "Efficient KV cache compression for transformer inference" \
  --auto-approve

Pattern: Resume after a failed run

# List runs to find the run ID
researchclaw list

# Resume from last completed stage
researchclaw run --resume rc-20260315-120000-abc123

Pattern: Programmatic batch research

import asyncio
from researchclaw.pipeline import Runner
from researchclaw.config import load_config

topics = [
    "LoRA fine-tuning on limited hardware",
    "Speculative decoding for LLM inference",
    "Flash attention variants comparison",
]

config = load_config("config.arc.yaml")
config.auto_approve = True

for topic in topics:
    config.research.topic = topic
    runner = Runner(config)
    result = runner.run()
    print(f"[{topic}] → {result.deliverables_dir}")

Pattern: OpenClaw one-liner (if using OpenClaw agent)

Share the repo URL with OpenClaw, then say:
"Research mixture-of-experts routing efficiency"

OpenClaw auto-reads RESEARCHCLAW_AGENTS.md, clones, installs, configures, and runs the full pipeline.


Compile the LaTeX Output

# Navigate to deliverables
cd artifacts/rc-*/deliverables/

# Compile (requires a LaTeX distribution)
pdflatex paper.tex
bibtex paper
pdflatex paper.tex
pdflatex paper.tex

# Or upload paper.tex + references.bib directly to Overleaf

Troubleshooting

researchclaw: command not found

# Make sure the venv is active and package is installed
source .venv/bin/activate
pip install -e .
which researchclaw

API key errors

# Verify env var is set
echo $OPENAI_API_KEY
# Should print your key (not empty)

# Set it explicitly for the session
export OPENAI_API_KEY="sk-..."

Experiment sandbox failures

The pipeline self-heals at Stage 13 (ITERATIVE_REFINE). If it keeps failing:

# Increase timeout and iterations in config
experiment:
  max_iterations: 5
  timeout_seconds: 600
  sandbox:
    python_path: ".venv/bin/python"

Citation hallucination warnings

Stage 23 (CITATION_VERIFY) runs a 4-layer check. If references are pruned:

  • This is expected behaviour — fake citations are removed automatically
  • Check verification_report.json for details on which citations were rejected and why

PIVOT loop running indefinitely

Stage 15 (RESEARCH_DECISION) may pivot multiple times. To cap iterations:

research:
  max_pivots: 2
  max_refines: 3

LaTeX compilation errors

# Check for missing packages
pdflatex paper.tex 2>&1 | grep "File.*not found"

# Install missing packages (TeX Live)
tlmgr install <package-name>

Out of memory during experiments

# Force CPU mode in config
experiment:
  sandbox:
    device: "cpu"
    max_memory_gb: 4

Key Concepts

  • PIVOT/REFINE Loop: Stage 15 autonomously decides PROCEED, REFINE (tweak params), or PIVOT (new hypothesis direction). All artifacts are versioned.
  • Multi-Agent Debate: Stages 8, 14, 18 use structured multi-perspective debate — not a single LLM pass.
  • Self-Learning: Each run extracts lessons with 30-day time decay. Future runs on similar topics benefit from past mistakes.
  • Sentinel Watchdog: Background monitor detects NaN/Inf in results, checks paper-evidence consistency, scores citation relevance, and guards against fabrication throughout the run.
  • 4-Layer Citation Verification: arXiv lookup → CrossRef lookup → DataCite lookup → LLM relevance scoring. A citation must pass all layers to survive.
Weekly Installs
1
First Seen
Today
Installed on
mcpjam1
claude-code1
replit1
junie1
windsurf1
zencoder1