os-guide
Dependencies
This skill requires Python 3.8+ and standard library only. No external packages needed.
To install this skill's dependencies:
pip-compile ./requirements.in
pip install -r ./requirements.txt
See ../../requirements.txt for the dependency lockfile (currently empty — standard library only).
Agentic OS Guide
The core insight: LLMs are stateless functions. CLAUDE.md is the only file loaded by
default into every conversation. The Agentic OS pattern turns this constraint into a
full operating system metaphor.
| OS Concept | Agent Equivalent |
|---|---|
| Kernel | CLAUDE.md hierarchy (global -> org -> project -> local) |
| RAM | context/ folder (soul, user prefs, memory) |
| Disk | context/memory/YYYY-MM-DD.md dated session logs |
| Stdlib | skills/ procedural knowledge bundles |
| Processes | .claude/agents/ sub-agents with isolated context |
| Shell | .claude/commands/ slash commands |
| Cron | /loop + heartbeat.md scheduled background tasks |
| Boot | START_HERE.md + MEMORY.md bootstrap on session start |
| Autoresearch Loop | os-eval-runner + improvement-ledger.md |
Skill Categories (Mental Model)
| Category | Skill | One-liner |
|---|---|---|
| Orchestration | os-improvement-loop |
Multi-agent concurrent loop: ORCHESTRATOR + PEER + INNER |
| Evaluation | os-eval-runner |
Autoresearch eval engine — scores and gates SKILL.md iterations |
| Evaluation | os-eval-lab-setup |
Bootstraps isolated lab repos for eval runs |
| Evaluation | os-eval-backport |
Reviews lab results, applies approved changes to master |
| Mutation | os-skill-improvement |
RED-GREEN-REFACTOR routing accuracy improvement |
| Memory | os-memory-manager |
Session log writing, L2→L3 promotion, deduplication |
| Reporting | os-improvement-report |
Progress charts from results.tsv + improvement ledger |
| Bootstrap | os-init |
Deploys kernel.py, agents.json, flywheel files to new project |
| Utility | os-clean-locks |
Clears stale .locks/ directories after agent crash |
Agents (not skills): os-learning-loop (trigger/diagnostic), os-health-check (liveness), agentic-os-setup (bootstrap interview)
Execution Flow
Execute these phases in order. Do not skip phases. This skill uses Progressive Disclosure. Load only what you need:
- For CLAUDE.md scope rules and precedence -> read
references/claude-md-hierarchy.md - For context/ folder patterns (soul.md, user.md, memory.md) -> read
references/context-folder-patterns.md - For /loop and heartbeat.md scheduling -> read
references/loop-scheduler.md - For sub-agents, hooks, auto-memory -> read
references/sub-agents-and-hooks.md - For memory hygiene (write/promote/archive rules) -> read
references/memory-hygiene.md - For the full canonical directory tree -> read
references/canonical-file-structure.md - For the self-improving OS flywheel and 3-file autoresearch framework -> read
references/research/optimizer-engine-patterns.mdandreferences/research/karpathy-autoresearch-3-file-eval.md
Quick Orientation
Anthropic-Native vs Community-Layered
What Anthropic ships natively:
- CLAUDE.md layered discovery (global, org, project, local, subdirectory scopes - most specific wins)
- Auto-memory (
MEMORY.md) - Claude writes this itself with build commands, style prefs, architecture decisions /loopcommand for cron-style scheduling (up to 50 tasks per session, auto-expire after 3 days)- Agent Skills:
SKILL.md-based procedural knowledge bundles - Sub-agents in
.claude/agents/with isolated tool contexts
What the community layered on top:
context/soul.md,context/user.md,context/memory/{date}.mdfolder conventionsSTART_HERE.mdbootstrap prompt pattern- Lessons-learned -> update-skills-after-session loop
heartbeat.mdscheduled task definition files
Design Principle
Every line in
CLAUDE.mdcompetes for attention with actual work. Keep it under 300 lines. Focus on what Claude would get wrong without it. Use@import context/soul.mdto load identity on demand, not always.
Discovery: What Does the User Need?
Ask the user which aspect they need help with:
- Setting up a new Agentic OS from scratch -> read
references/canonical-file-structure.md, walk them through the setup - Understanding a specific layer (context/, hooks, /loop) -> load the matching reference file
- Memory management (what to record, promote, archive) -> invoke
os-memory-managerskill - Continuous Improvement (retrospectives, skill updates) -> invoke
os-learning-loopagent - Troubleshooting (context not loading, skills not triggering) -> read
references/claude-md-hierarchy.mdfor scope precedence
The Improvement Flywheel (Mandatory Close Protocol)
Every significant work session — especially eval runs, skill edits, backports, and agent loop completions — must close through this two-phase protocol. Do not consider a session complete without running both phases.
Session Lifecycle Invariant: The OUTER loop (
os-improvement-loop) owns session lifecycle. INNER loops (os-eval-runner,os-skill-improvement) never close a session. A session is incomplete until Phase 6 is executed.os-learning-loop(agent) is the trigger/diagnostic layer that feeds both flywheels — it detects friction and identifies targets;os-improvement-loop(skill) is the execution protocol the agents follow once a target is identified.
Work → Backport/Ship → Phase 6: Capture → Phase 7: Improve
Phase 6: Capture Learnings (os-memory-manager)
⚠️ Cross-Plugin Boundary —
os-memory-manageris part ofagent-agentic-os. Install:npx skills add agent-agentic-os
After any backport, eval run, or skill change:
Invoke os-memory-manager to write a dated session log and promote non-obvious
findings to long-term memory. Apply the non-obvious filter:
- CAPTURE: snags, footguns, scoring behaviors, architectural decisions, ADAPT patterns
- SKIP: routine score improvements, changes self-evident from the diff
What to capture:
- What was accomplished and what changed
- Any errors, workarounds, or unexpected behaviors encountered
- Key decisions and why (especially ACCEPT/ADAPT/REJECT rationale from backports)
- Open items and follow-up rounds
Where it writes:
context/memory/YYYY-MM-DD.md— dated session log (git-tracked, not temp/)context/memory.md— promoted long-term facts with dedup IDs- Agent's native
MEMORY.mdsystem — cross-session feedback entries - Survey save path rule: lab/eval sessions write surveys to
temp/retrospectives/; loop sessions (os-improvement-loop) write tocontext/memory/retrospectives/. post_run_metrics.py only scanscontext/memory/retrospectives/— lab surveys are not counted in loop metrics.
Phase 7: Continuous Skill Improvement (os-skill-improvement)
⚠️ Cross-Plugin Boundary —
os-skill-improvementis part ofagent-agentic-os. Install:npx skills add agent-agentic-os
After memory is captured, check whether any skill's routing or trigger accuracy was found to be weak during the session:
If any skill mis-triggered, failed to trigger, or scored below 0.90 on eval:
→ Invoke os-skill-improvement on that skill
→ Run RED-GREEN-REFACTOR to harden the description and trigger phrases
→ Run os-eval-runner to verify the score improvement before committing
Apply the improvement filter — only invoke if:
- A skill failed to route correctly during the session
- An eval score was below threshold (< 0.90 quality_score, or accuracy/F1 below the established baseline in results.tsv)
- A trigger description was found to be too broad or too narrow
- A new
<example>block or keyword was identified that would close a routing gap
Skip if:
- All skills routed correctly and eval scores held
- The session was purely additive (new files, no existing skill changes)
The Full Flywheel
1. Work / Eval Run / Backport
2. os-eval-backport → ACCEPT/ADAPT/REJECT each change, apply to master
3. os-memory-manager → Session log + promote non-obvious findings (Phase 6)
4. os-skill-improvement → Harden any skill whose routing was found weak (Phase 7)
5. Commit + push → Close the loop in git history
This flywheel is what makes the OS self-improving. Skipping Phase 6 or 7 means knowledge evaporates at session end and skill quality drifts.
Next Actions
- For memory write/promote/archive decisions -> invoke
os-memory-manager - To orchestrate an end-to-end setup of a new environment -> run
agentic-os-setup - To perform a retrospective and improve the OS -> run
os-learning-loop - To add a scheduled heartbeat -> read
references/loop-scheduler.md
Mandatory Close: Friction Signal (Every Invocation)
After answering the user's question, emit a friction event for anything that was unclear, missing from the references, or required more turns than expected to explain:
# Only emit if friction was encountered — do not emit if explanation was clean
python3 context/kernel.py emit_event --agent os-guide \
--type friction --action encountered \
--summary "step:[which-reference] cause:[what-was-unclear]"
Then answer: What one addition to the guide references would have made this explanation
clearer or faster? Record the answer as a comment in the next session log or flag it
to os-learning-loop if the same gap appears across multiple sessions.
More from richfrem/agent-plugins-skills
markdown-to-msword-converter
Converts Markdown files to one MS Word document per file using plugin-local scripts. V2 includes L5 Delegated Constraint Verification for strict binary artifact linting.
52excel-to-csv
>
32zip-bundling
Create technical ZIP bundles of code, design, and documentation for external review or context sharing. Use when you need to package multiple project files into a portable `.zip` archive instead of a single Markdown file.
29learning-loop
(Industry standard: Loop Agent / Single Agent) Primary Use Case: Self-contained research, content generation, and exploration where no inner delegation is required. Self-directed research and knowledge capture loop. Use when: starting a session (Orientation), performing research (Synthesis), or closing a session (Seal, Persist, Retrospective). Ensures knowledge survives across isolated agent sessions.
26ollama-launch
Start and verify the local Ollama LLM server. Use when Ollama is needed for RLM distillation, seal snapshots, embeddings, or any local LLM inference — and it's not already running. Checks if Ollama is running, starts it if not, and verifies the health endpoint.
26spec-kitty-checklist
A standard Spec-Kitty workflow routine.
26