agentflow
Agentflow
Agentflow is a supervised local runtime for long-running coding work. Humans author a graph with intent and outcome boundaries; Codex CLI or Cursor CLI executes substantial nodes; the supervisor records bounded interventions; terminal runs produce a delivery package.
Must Know
- Graphs are execution contracts: intent, authority, context, artifacts, validation, supervision, and delivery.
- Every executable node (
agent,exec,check,checkpoint) needs anintentblock with a meaningfulgoaland non-emptyacceptance_criteria;intent.constraintsdefault to[]. intent.acceptance_criteriaare runtime-enforced by outcome verification for passingagentattempts and used by the supervisor to interpret deterministic nodes.- Context is prompt design. Prefer exact, high-signal material over broad dumps; validate real token cost with
agentflow validate --graph <path>. - Artifacts are durable handoffs. Downstream nodes should consume named artifacts, not raw logs or assumed workspace state.
- Inside
repeatloops, prior-iteration artifacts need explicit selectors such asiteration: "previous"oriteration: "latest_failed"; Agentflow also injectsrepeat_historyso retrying nodes can see what already happened. - Checks prove hard facts or gate control flow. Do not add AI checks just to repeat outcome verification.
- Agents are capable terminal users. Inventory useful local CLIs and let nodes use native commands when that is enough.
- Do not wrap mature CLIs or protocols just to make them "agent tools"; wrappers should add auth isolation, stable I/O, reuse, or auditability.
- Do not over-prescribe implementation mechanics. Give agents clear intent, authority, context, artifacts, and validation; let them decide exact files and approach unless the user specified them.
- In GitHub repos, consider rollout strategy before authoring: prefer small reviewable PRs,
establish_base -> parallel_prs, orcascading_prsover one large PR unless the user asks otherwise. - A graph is not complete until plugin resolution passes when needed and
agentflow validate --graph <path>reports the graph run-ready. Use--show-compiledor--output-dironly when inspection artifacts are useful. supervision.profileis required and must point at a real profile so supervisor verification and recovery have explicit harness/model settings.- Supervisor recovery is graph-causal: a failed node may be a symptom of an upstream node, artifact, context, workspace, validation strategy, or environment problem. Do not author supervisor authority pauses as planned workflow nodes.
repos,profiles, sandbox, and tools define authority. Constraints should be prohibition-style boundaries that start withDo not; put positive success requirements inacceptance_criteria.- Harness-native Codex/Cursor config is isolated by default. Declare native MCP/plugins/config only in
profiles.*.harness_config; useisolation: "inherit_user"only when accepting non-reproducible local harness behavior.
Route By Task
- Author or review a graph, choose primitive shape, or pressure-test graph quality: read references/graph-authoring.md.
- Need reusable authored workflow compositions that are not managed patterns: read references/common-patterns.md.
- Working in a GitHub repo or planning PR rollout: read references/github-rollout.md.
- Need exact fields: read references/graph-contract.md.
- Choose compiler-supported managed pattern nodes: read references/managed-workflows.md.
- Need CLI validation or launch behavior: read references/cli-and-validation.md.
- Debug failures, resume, or inspect delivery: read references/run-debugging.md.
- Need implementation mechanics: read
docs/technical/in the repository. - Need failure semantics: read references/failure-and-validation.md.
- Need examples: read references/examples.md.
- Need workflow eval suites, scenarios, criteria, environment simulation, trajectory checks, scorecards, benchmarks, or prompt-pack comparisons: use
agentflow-evals. - Need reusable plugin workflows or tools: use
agentflow-plugins.
Default Workflow
- Capture graph intent: goal, acceptance criteria, constraints, and out-of-scope boundaries.
- In GitHub repos, choose rollout shape before node shape: one focused PR,
establish_base -> parallel_prs, orcascading_prs. - Choose the graph shape: primitive flow, common authored pattern, or managed pattern.
- Define authority: repos, profiles, workspace backend, sandbox, tools, credentials, and high-impact limits.
- Inventory relevant local CLIs and decide what stays as ordinary terminal use versus plugin-bundled tools.
- Define node contracts: each executable node gets
intent.goal,intent.acceptance_criteria,intent.constraints, context when needed, and named artifacts when it must publish durable evidence. - Add checks, a required supervisor profile, and a bounded supervision budget that match risk; terminal delivery is automatic.
- Resolve plugins when declared, then run
agentflow validate --graph <path>before considering the graph complete. - Use
agentflow validate --graph <path> --show-compiledor--output-dir <path>when reviewers or downstream agents need the compiled contract and validation package. - After a run, inspect
delivery/reviewer-guide.md,delivery/manifest.json,delivery/run-map.md, and declared artifacts before raw runtime files.
Authoring Posture
- Treat the authored DAG as the human contract, not a prose plan.
- Use
contextfor node material andartifactsfor durable handoffs. - Treat
reposandprofilesas operational authority; put scope boundaries and out-of-scope notes in graph or nodeintent.constraints, phrased asDo not ...constraints. - Keep downstream references on named artifacts from public node ids.
- Treat
intent.acceptance_criteriaas a runtime contract: passingagentattempts are graded by the outcome verifier, and deterministic failures use the same contract for causal recovery. Vague criteria produce vague verification and weak recovery. - Do not author boilerplate iteration guidance ("iterate until done", "investigate ambiguity", "stop only when blocked") in graph or node
intent.constraints. The runtime injects a## Working Loopsection into every standard agent prompt that already covers this, and outcome verification will reject early-bailing. - Use deterministic checks for hard facts. Reach for AI checks only when another node depends on the gate or when the deterministic command is genuinely unavailable; do not stack an AI
checkafter every agent node to re-evaluate the same acceptance criteria. - Treat checks, outcome verification, supervisor
semantic_evaluation, managed pattern evaluation, andagentflow evalas separate lanes. Useagentflow-evalsfor the offline eval lane. - Make high-impact limits explicit in graph or node
intent.constraintsbefore granting credential-backed, external, or mutating tools, and phrase them asDo not ...boundaries. - Do not widen scope through supervisor behavior; use repeat-scoped checkpoints or graph edits for planned human decisions, and reserve
pause_for_humanfor authority boundaries the runtime must not infer.
Runtime CLI Posture
- Humans use
agentflow; agents inside running nodes useaf. afis injected into agent nodes onPATHand reads$AGENTFLOW_RUNTIME_METADATA.- Use
af --helpandaf <command> --helpfor exact runtime CLI arguments, defaults, output shape, examples, and safety notes. - Prefer
af status,af tools list, andaf context showwhen debugging what a node actually received. - During supervisor investigation, prefer
af diagnose ... --jsonfor stable run evidence andaf learn <failure-kind>for focused recovery playbooks. - Prefer
af artifact writefor declared handoffs instead of ad hoc output files. - Use
af log --typefor worker evidence and helper coordination notes, includingaf log --type decision --decision ... --rationale ... --evidence ...for major scope-affecting decisions, but keep durable conclusions in artifacts. - Treat
af spawnhelpers as supervised sessions with their own artifacts, not persistent coworkers. Use--purpose investigationfor read-only causal analysis and--purpose repaironly when the selected node authority allows scoped edits.
More from koji98/agentflow
agentflow-plugins
Use when creating, reviewing, resolving, or consuming Agentflow plugin workflows or plugin-bundled CLI tools, including workflow manifests, lockfiles, tool config, and credential policy.
6agentflow-run-debugging
Inspect, explain, and debug Agentflow runs. Use when a run failed, resumed unexpectedly, or needs artifact-level diagnosis; when tracing state.json, events.jsonl, execution logs, context packets, or execution artifacts; or when deciding why passed work did or did not preserve on resume.
3agentflow-graph-authoring
Design, review, and refine Agentflow execution graphs. Use when authoring or editing Agentflow graph JSON, choosing between primitive nodes and managed workflows, or checking topology, profiles, context flow, outputs, and validation against the shipped runtime contract.
3agentflow-managed-workflows
Author and review Agentflow managed workflows. Use when choosing between deep_research, spec_design, execute_spec, and review_change, or when filling their brief, context_policy, approval_policy, strategy, delivery, and runtime fields.
3agentflow-evals
Use when designing, validating, running, inspecting, or improving Agentflow eval suites, scenarios, variants, criteria, environment simulation, trajectory checks, trace packets, scorecards, benchmark reports, prompt-pack experiments, dogfood workflow-quality evals, capability-workflow local repo evals, or pinned real-world GitHub issue evals.
2agentflow-grill-me
Use when the user wants to be grilled, interviewed, pressure-tested, or questioned before creating an Agentflow plan, graph, workflow, feature design, or implementation plan.
1