aios-long-running-harness
AIOS Long-Running Harness
Overview
Use this harness to keep long tasks stable under UI drift, model variability, and partial failures. It maps Anthropic's long-running-agent harness ideas into this repository's file-based workflow.
Harness Loop
- Preflight: lock objective, stop conditions, budgets, and required artifacts.
- Plan: split into idempotent steps with explicit success/failure evidence.
- Execute: run one step at a time with tool output capture.
- Verify: assert completion from page evidence, not assumptions.
- Checkpoint: persist current state, artifacts, and next action.
- Recover: on failure, classify and retry only with a changed hypothesis.
- Complete: run final verification and write summary doc.
Pairing with Superpowers Skills
- Plan step should be produced through
superpowers:writing-plans(orsuperpowers:brainstormingfirst when scope is unclear). - For 2+ independent domains, use
superpowers:dispatching-parallel-agents; for coupled domains, run sequentially. - If the runtime has no true subagent tool, emulate dispatch with explicit per-domain task queues and only parallelize safe independent reads/checks.
- Always finish with
superpowers:verification-before-completionbefore claiming run success.
Orchestrate Live Notes
aios orchestrate --execute livecurrently supportsAIOS_SUBAGENT_CLIENT=codex-clionly.- Codex CLI v0.114+ structured exec outputs (
--output-schema,--output-last-message, stdin) are required for handoff parsing; schema fallback to raw stdout is rejected. - Transient
upstream_error/server_errorfailures are retried with exponential backoff viaAIOS_SUBAGENT_UPSTREAM_MAX_ATTEMPTSandAIOS_SUBAGENT_UPSTREAM_BACKOFF_MS.
Required Controls
- Time budget per step and per run.
- Retry budget per failure class.
- Human-gate checkpoints for login, payment, or policy-sensitive actions.
- Structured logs for every major transition.
Failure Classes
- Selector/UI drift.
- Authentication/session loss.
- Policy rejection/content moderation.
- Network/transient failures.
- Tool/runtime errors.
Completion Gate
Declare success only when all are true:
- Target action succeeded.
- Expected artifact exists.
- Evidence snapshot/log exists.
- Updated runbook reflects newly observed drift.
Resources
references/harness-checklist.md: operational checklist template.references/anthropic-mapping.md: principle-to-project mapping.
More from rexleimo/rex-cli
skill-creator
Create new skills, modify and improve existing skills, and measure skill performance. Use when users want to create a skill from scratch, update or optimize an existing skill, run evals to test a skill, benchmark skill performance with variance analysis, or optimize a skill's description for better triggering accuracy.
3contextdb-autopilot
Use when running tasks in Codex CLI, Claude Code, Gemini CLI, or opencode and you need automatic context persistence (init/session/event/checkpoint/context-pack) plus interactive auto-routing without manual contextdb commands.
3seed2-manga-drama
当用户要把单图或创意脚本做成AI漫剧短视频时使用。提供Seed2.0风格的四阶段流程:分镜脚本、主角设定、分镜生成、视频验收,并输出可直接投喂生成模型的结构化提示词。
3debug
Evidence-first runtime debugging for application bugs, regressions, flaky behavior, and unclear failures. Use when an agent is asked to debug an issue and should avoid speculative fixes by forming hypotheses, attaching to or starting a logging session, instrumenting code, collecting runtime logs, analyzing the recorded log file, applying only proven fixes, and verifying the result before removing instrumentation, especially for browser or frontend issues where logs should go directly to the active collector endpoint instead of app-local proxy APIs.
2find-skills
Helps users discover and install agent skills when they ask questions like "how do I do X", "find a skill for X", "is there a skill that can...", or express interest in extending capabilities. This skill should be used when the user is looking for functionality that might exist as an installable skill.
2xhs-ops-methods
当用户要学习或执行小红书运营方法时使用。提供可复用的“定方向-定人设-写排发互-复盘增长”流程,支持多账号协作,并强调人工审核与合规发布。
2