metaclaw-evolving-agent
MetaClaw Evolving Agent
Skill by ara.so — Daily 2026 Skills collection
MetaClaw is an OpenAI-compatible proxy agent that intercepts conversations, injects learned skills, and continuously improves itself through real-world interactions. It supports three modes: lightweight skills injection, immediate RL training, and a smart "madmax" scheduler that defers weight updates to idle/sleep windows.
Installation
# Minimal — skills injection only, no GPU required
pip install -e .
# Full RL training support (torch, transformers, tinker)
pip install -e ".[rl]"
# Skill evolution via LLM summarization
pip install -e ".[evolve]"
# Google Calendar scheduler for madmax mode
pip install -e ".[scheduler]"
# Recommended: everything
pip install -e ".[rl,evolve,scheduler]"
Quick Start
# One-time interactive config wizard
metaclaw setup
# Start in default madmax mode (skills + RL + smart scheduler)
metaclaw start
# Skills only — no GPU, no Tinker needed
metaclaw start --mode skills_only
# RL mode — trains immediately when batch is full
metaclaw start --mode rl
# RL without scheduler (same as above, explicit)
metaclaw start --mode rl
After metaclaw start, a local OpenAI-compatible proxy is running. Point your client (OpenClaw or any OpenAI SDK consumer) at http://localhost:<port> instead of the upstream LLM endpoint.
Configuration
metaclaw setup writes a config file (default: ~/.metaclaw/config.yaml). You can also edit it directly:
# ~/.metaclaw/config.yaml
proxy:
host: 0.0.0.0
port: 8080
llm:
provider: kimi # kimi | qwen | claude | minimax | openai | gemini
base_url: https://api.moonshot.cn/v1
model: moonshot-v1-8k
# api_key loaded from env: METACLAW_LLM_API_KEY
skills:
enabled: true
max_injected: 5 # max skills injected per turn
summarize_after_session: true
rl:
enabled: true
backend: auto # auto | tinker | mint
batch_size: 32
algorithm: grpo
opd_teacher: false # optional teacher distillation
scheduler: # madmax mode only
enabled: true
sleep_hours: [22, 7] # local 22:00–07:00
idle_timeout_minutes: 15
google_calendar: false # set true + configure OAuth for meeting detection
logging:
level: info
log_dir: ~/.metaclaw/logs
Environment Variables
export METACLAW_LLM_API_KEY="your-llm-api-key"
export METACLAW_TINKER_API_KEY="your-tinker-api-key" # rl mode
export METACLAW_MINT_API_KEY="your-mint-api-key" # if backend=mint
export GOOGLE_CALENDAR_CREDENTIALS_PATH="path/to/creds.json" # scheduler
Operating Modes
| Mode | Command | GPU Required | Description |
|---|---|---|---|
skills_only |
metaclaw start --mode skills_only |
No | Proxy + skills injection + auto-summarization |
rl |
metaclaw start --mode rl |
Via API | Skills + GRPO training when batch fills |
madmax |
metaclaw start |
Via API | Skills + RL + scheduler (trains only during idle/sleep/meetings) |
Python API
Programmatic startup
import asyncio
from metaclaw import MetaClawAgent, AgentConfig, Mode
async def main():
config = AgentConfig.from_yaml("~/.metaclaw/config.yaml")
agent = MetaClawAgent(config, mode=Mode.MADMAX)
await agent.start()
asyncio.run(main())
Manual skill injection
from metaclaw.skills import SkillStore, SkillInjector
store = SkillStore(path="~/.metaclaw/skills")
# Add a skill manually
store.add(
name="code-review-checklist",
content="Always check for: 1) error handling, 2) type hints, 3) docstrings.",
tags=["code", "review"]
)
# Retrieve top-k relevant skills for a query
injector = SkillInjector(store)
relevant = injector.retrieve(query="review my Python function", top_k=3)
for skill in relevant:
print(skill.name, skill.score)
Intercepting and recording conversations
from metaclaw.proxy import ConversationInterceptor
from metaclaw.memory import ExperienceBuffer
buffer = ExperienceBuffer(max_size=1000)
interceptor = ConversationInterceptor(
upstream_url="https://api.moonshot.cn/v1",
on_complete=buffer.record # called after each turn with (messages, response)
)
# buffer.record signature:
async def on_complete(messages: list[dict], response: dict) -> None:
...
Triggering RL training manually
from metaclaw.training import RLTrainer, TrainingConfig
trainer = RLTrainer(
config=TrainingConfig(
backend="tinker", # or "mint"
algorithm="grpo",
batch_size=32,
lora_rank=16,
)
)
# Collect a batch from the experience buffer and train
async def run_training(buffer):
batch = buffer.sample(n=32, split="support") # support/query separation
result = await trainer.train(batch)
print(f"Training complete. Loss: {result.loss:.4f}, Steps: {result.steps}")
Reward modeling
from metaclaw.rewards import RewardModel
reward_model = RewardModel(provider="llm") # uses configured LLM for scoring
async def score_turn(prompt: str, response: str) -> float:
score = await reward_model.score(prompt=prompt, response=response)
return score # float in [-1.0, 1.0]
Skills Lifecycle
Conversation turn
│
▼
SkillInjector.retrieve() ← vector search over SkillStore
│ injects top-k skills into system prompt
▼
LLM responds
│
▼
ExperienceBuffer.record() ← stores (context, response, metadata)
│
▼ (end of session)
SkillSummarizer.run() ← LLM extracts reusable patterns
│
▼
SkillStore.upsert() ← new/updated skills persisted to disk
Integration: OpenAI SDK as Client
Point any OpenAI SDK client at the MetaClaw proxy:
from openai import OpenAI
# MetaClaw proxy is running on localhost:8080
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-used-but-required-by-sdk"
)
response = client.chat.completions.create(
model="moonshot-v1-8k", # passed through to upstream
messages=[
{"role": "user", "content": "Review my pull request strategy."}
]
)
print(response.choices[0].message.content)
Skills are injected transparently — the client code does not change.
Scheduler (MadMax Mode)
The scheduler ensures RL weight updates never interrupt active use:
from metaclaw.scheduler import MadMaxScheduler, SchedulerConfig
scheduler = MadMaxScheduler(
config=SchedulerConfig(
sleep_hours=(22, 7), # train between 22:00–07:00 local time
idle_timeout_minutes=15, # train after 15 min of no conversations
google_calendar=True, # also train during calendar meetings
credentials_path="creds.json"
)
)
# Check if it's safe to train right now
if await scheduler.is_training_window():
await trainer.train(batch)
Google Calendar Setup
# 1. Enable Google Calendar API in Google Cloud Console
# 2. Download OAuth2 credentials as creds.json
# 3. Set path in config or env
export GOOGLE_CALENDAR_CREDENTIALS_PATH="/path/to/creds.json"
# 4. First run will open browser for OAuth consent
metaclaw start
Support/Query Set Separation
MetaClaw separates experience into support and query sets to prevent stale rewards from polluting updates:
from metaclaw.memory import ExperienceBuffer
buffer = ExperienceBuffer(
max_size=2000,
support_ratio=0.5 # 50% support, 50% query
)
# During training:
support_batch = buffer.sample(n=16, split="support") # used to compute reward signal
query_batch = buffer.sample(n=16, split="query") # used for gradient update
await trainer.train_meta(support=support_batch, query=query_batch)
RL Backends
Tinker (default)
rl:
backend: tinker
tinker_project: my-metaclaw-project
lora_rank: 16
learning_rate: 1e-4
MinT
# Install MinT compatibility layer separately
pip install metaclaw-mint
rl:
backend: mint
mint_endpoint: https://your-mint-endpoint
Auto-detection
rl:
backend: auto # tries tinker first, falls back to mint, errors if neither available
Troubleshooting
Proxy not reachable after metaclaw start
- Check port conflicts:
lsof -i :8080 - Change
proxy.portin config and restart
rl mode: "No training backend available"
- Ensure
pip install -e ".[rl]"completed successfully - Verify
METACLAW_TINKER_API_KEYorMETACLAW_MINT_API_KEYis set - Try
rl.backend: tinkerexplicitly instead ofauto
Skills not persisting between sessions
- Confirm
skills.summarize_after_session: truein config - Check write permissions on
~/.metaclaw/skills/ - Run
metaclaw skills listto inspect stored skills
Madmax mode never trains
- Verify
scheduler.sleep_hourscovers your timezone's night - Lower
scheduler.idle_timeout_minutesfor testing (e.g.,1) - Check scheduler logs:
~/.metaclaw/logs/scheduler.log
Google Calendar integration fails
- Re-run OAuth flow: delete
~/.metaclaw/token.jsonand restart - Ensure Calendar API is enabled in your Google Cloud project
OPD teacher distillation errors
- Only supported with
rl.backend: tinker - Requires a separate teacher model endpoint in config:
rl: opd_teacher: true teacher_base_url: https://api.openai.com/v1 teacher_model: gpt-4o
CLI Reference
metaclaw setup # interactive config wizard
metaclaw start # start in madmax mode
metaclaw start --mode skills_only
metaclaw start --mode rl
metaclaw start --config path/to/config.yaml
metaclaw skills list # show all stored skills
metaclaw skills delete <name> # remove a skill
metaclaw skills export skills.json
metaclaw status # show proxy, scheduler, training status
metaclaw logs # tail all logs
metaclaw logs --component scheduler