AI Research Skills

86 skills powering autonomous AI research in 2026

Keyword: ai-research-skills · autoresearch · ml experiments Source: Orchestra-Research/AI-Research-SKILLs | Fork: akillness/AI-Research-SKILLs

When to use this skill

Conducting autonomous AI/ML research from idea to paper
Fine-tuning LLMs with Axolotl, LLaMA-Factory, PEFT, or Unsloth
Running post-training (RLHF, GRPO, DPO, SimPO, verl)
Distributed training with Megatron-Core, DeepSpeed, FSDP, or Accelerate
Optimizing inference with vLLM, TensorRT-LLM, llama.cpp, or SGLang
Building RAG pipelines (Chroma, FAISS, Pinecone, Qdrant)
Mechanistic interpretability with TransformerLens, SAELens, pyvene
Writing ML papers (LaTeX templates for NeurIPS, ICML, ICLR, ACL)
Running ML benchmarks and evaluations (lm-eval-harness, BigCode, NeMo Evaluator)
Multimodal tasks: CLIP, Whisper, LLaVA, Stable Diffusion, SAM

Do not use this skill when

You need a simple code fix unrelated to ML/AI research
You want general software engineering workflows (use omg, bmad, or ralph instead)

Overview: 86 Skills × 22 Categories

Category	Count	Key Skills
Autoresearch	1	Autonomous research orchestration (central layer)
Model Architecture	5	LitGPT, Mamba, RWKV, NanoGPT, TorchTitan
Fine-Tuning	4	Axolotl, LLaMA-Factory, PEFT, Unsloth
Post-Training	8	TRL, GRPO, OpenRLHF, SimPO, verl, slime, miles, torchforge
Distributed Training	6	DeepSpeed, FSDP, Accelerate, Megatron-Core, Lightning, Ray Train
Optimization	6	Flash Attention, bitsandbytes, GPTQ, AWQ, HQQ, GGUF
Inference & Serving	4	vLLM, TensorRT-LLM, llama.cpp, SGLang
RAG	5	Chroma, FAISS, Pinecone, Qdrant, Sentence Transformers
Multimodal	7	CLIP, Whisper, LLaVA, BLIP-2, SAM, Stable Diffusion, AudioCraft
Mech Interp	4	TransformerLens, SAELens, pyvene, nnsight
Safety & Alignment	4	Constitutional AI, LlamaGuard, NeMo Guardrails, Prompt Guard
Evaluation	3	lm-eval-harness, BigCode, NeMo Evaluator
MLOps	3	W&B, MLflow, TensorBoard
Agents	4	LangChain, LlamaIndex, CrewAI, AutoGPT
Prompt Engineering	4	DSPy, Instructor, Guidance, Outlines
Observability	2	LangSmith, Phoenix
Infrastructure	3	Modal, Lambda Labs, SkyPilot
Data Processing	2	NeMo Curator, Ray Data
Tokenization	2	HuggingFace Tokenizers, SentencePiece
Emerging Techniques	6	MoE, Model Merging, Long Context, Speculative Decoding, Distillation, Pruning
ML Paper Writing	1	LaTeX templates (NeurIPS, ICML, ICLR, ACL, AAAI, COLM)
Ideation	2	Research Brainstorming, Creative Thinking

Instructions

Step 1: Install the library

# Interactive installer (auto-detects Claude Code, Codex, Gemini, Cursor)
npx @orchestra-research/ai-research-skills

# Install all 86 skills non-interactively
npx @orchestra-research/ai-research-skills install --all

# Or use the install script from this skill
bash scripts/install.sh

After installation, restart your agent session so skills are loaded.

Step 2: Start autonomous research (autoresearch)

For full autonomous research (idea → experiments → paper):

Read the autoresearch SKILL.md and follow its instructions to begin.

The autoresearch skill orchestrates:

Literature survey and ideation
Experiment design and execution (routes to domain skills)
Results synthesis and benchmarking
Paper writing with LaTeX templates

Step 3: Use domain skills directly

For targeted work on a specific framework, call the skill by keyword:

# Fine-tuning
fine-tune with axolotl   # → activates axolotl skill

# Post-training / RLHF
run grpo training        # → activates GRPO skill

# Inference optimization
optimize with vllm       # → activates vLLM skill

# Distributed training
setup deepspeed          # → activates DeepSpeed skill

Step 4: Claude Code marketplace (alternative install)

# Add marketplace
/plugin marketplace add orchestra-research/AI-research-SKILLs

# Install by category
/plugin install fine-tuning@ai-research-skills
/plugin install post-training@ai-research-skills
/plugin install inference-serving@ai-research-skills
/plugin install distributed-training@ai-research-skills
/plugin install optimization@ai-research-skills

Step 5: Update or manage skills

# Update all installed skills
npx @orchestra-research/ai-research-skills update

# List installed skills
npx @orchestra-research/ai-research-skills list

Autonomous Research Loop

The autoresearch skill uses a two-loop architecture:

Outer Loop (Synthesis):
  ↓ Research question → Literature survey → Hypothesis
  ↓ Route to domain skills
Inner Loop (Optimization):
  ↓ Run experiment → Collect results → Analyze → Adjust
  ↑ Ratchet improvements via git
  ↓ Synthesize findings → Write paper

This enables fully autonomous overnight GPU experiments (Karpathy-style ratchet via git).

Examples

Example 1: Start autonomous research

Activate ai-research-skills.
Read the autoresearch SKILL.md and begin research on:
"Does LoRA training stability correlate with layer-wise norm heterogeneity?"

The agent will: survey literature → design experiments → fine-tune with LoRA → run benchmarks → analyze results → write paper.

Example 2: Fine-tune Llama 3 with LoRA

Use the fine-tuning skill (axolotl) to fine-tune Llama-3.1-8B with LoRA
on my dataset at ./data/train.jsonl with 4-bit quantization.

Example 3: Optimize inference with vLLM

Set up vLLM for serving Mistral-7B with tensor parallelism on 2 GPUs,
with continuous batching and PagedAttention. Target: <50ms TTFT.

Example 4: Run GRPO post-training

Implement GRPO training for my reward model using TRL.
Dataset: ./data/preferences.json. Base: Llama-3.1-8B-Instruct.

Architecture: Skill Structure

Each of the 86 skills follows this structure:

skill-name/
├── SKILL.md          # Expert guidance (200–600 lines)
├── references/       # Official docs, API refs, GitHub issues, release notes
│   ├── README.md
│   ├── api.md
│   ├── tutorials.md
│   ├── issues.md     # Real GitHub issues with solutions
│   └── releases.md
├── scripts/          # Helper scripts (optional)
└── templates/        # Code templates (optional)

Best practices

Start with autoresearch — it routes to the right domain skills automatically
Restart after install — skills load at session start; restart if newly installed skills aren't recognized
Use the two-loop architecture — let the inner loop optimize, outer loop synthesize
Reference real GitHub issues — each skill's references/issues.md contains battle-tested solutions
Combine with oh-my-gods orchestration — use ralph for persistence, bmad for structured phases, survey for landscape scanning before research

Integration with oh-my-gods

oh-my-gods skill	Integration
`survey`	Pre-research landscape scan before launching autoresearch
`ralph`	Persistent loop — keep autoresearch running until paper complete
`bmad`	Structured phases for the research lifecycle
`autoresearch`	Native skill within this library (enhanced)

ai-research-skills