autoresearch

"The researcher's job shifts from writing Python to writing Markdown." — Andrej Karpathy

Autoresearch is an autonomous ML experimentation framework. An AI agent iteratively modifies train.py, runs fixed 5-minute GPU experiments, evaluates with a single metric (val_bpb), and commits only improvements via git ratcheting. The result: wake up to 100+ experiments logged and a monotonically better model.

When to use this skill

Setting up autoresearch on a GPU machine for the first time
Writing or refining program.md research directives for the agent
Launching an overnight autonomous experiment loop
Interpreting results.tsv to understand what the agent found
Configuring the system for constrained hardware (limited VRAM)
Understanding the ratcheting mechanism and git workflow
Porting to Apple Silicon (MLX) or Windows RTX

autoresearch

autoresearch

When to use this skill

Core Architecture