monitor-experiment

SKILL.md

Monitor Experiment Results

Monitor: $ARGUMENTS

Workflow

Step 1: Check What's Running

ssh <server> "screen -ls"

Step 2: Collect Output from Each Screen

For each screen session, capture the last N lines:

ssh <server> "screen -S <name> -X hardcopy /tmp/screen_<name>.txt && tail -50 /tmp/screen_<name>.txt"

If hardcopy fails, check for log files or tee output.

Step 3: Check for JSON Result Files

ssh <server> "ls -lt <results_dir>/*.json 2>/dev/null | head -20"

If JSON results exist, fetch and parse them:

ssh <server> "cat <results_dir>/<latest>.json"

Step 4: Summarize Results

Present results in a comparison table:

| Experiment | Metric | Delta vs Baseline | Status |
|-----------|--------|-------------------|--------|
| Baseline  | X.XX   | —                 | done   |
| Method A  | X.XX   | +Y.Y              | done   |

Step 5: Interpret

  • Compare against known baselines
  • Flag unexpected results (negative delta, NaN, divergence)
  • Suggest next steps based on findings

Step 6: Feishu Notification (if configured)

After results are collected, check ~/.claude/feishu.json:

  • Send experiment_done notification: results summary table, delta vs baseline
  • If config absent or mode "off": skip entirely (no-op)

Key Rules

  • Always show raw numbers before interpretation
  • Compare against the correct baseline (same config)
  • Note if experiments are still running (check progress bars, iteration counts)
  • If results look wrong, check training logs for errors before concluding
Weekly Installs
8
GitHub Stars
1.3K
First Seen
4 days ago
Installed on
gemini-cli8
github-copilot8
codex8
kimi-cli8
cursor8
amp8