openclaw-rl-training
Installation
SKILL.md
OpenClaw-RL Training
Skill by ara.so — Daily 2026 Skills collection.
OpenClaw-RL is a fully asynchronous reinforcement learning framework that converts live multi-turn conversations into training signals for personalized AI agents. It wraps a self-hosted model as an OpenAI-compatible API via OpenClaw, intercepts conversations, and continuously optimizes the policy in the background without interrupting usage. It also supports scalable RL for terminal, GUI, SWE, and tool-call agents.
Architecture Overview
Four independent async loops that never block each other:
- Agent Serving — OpenClaw-compatible API serving rollouts
- Rollout Collection — Captures multi-turn conversations as training trajectories
- PRM/Judge Evaluation — Scores turns using next-state feedback (majority voting optional)
- Policy Training — GRPO/OPD/Combine training via slime or Tinker
Installation
git clone https://github.com/Gen-Verse/OpenClaw-RL
cd OpenClaw-RL