fix
Fix
Intent
Make risky or unclear code safe with the smallest sound, validated change, then keep listening to the final self-review until no actionable self-review change remains.
When the validated changeset survives the post-self-review rerun, $fix stops at that boundary; broader architecture, product, or roadmap analysis belongs to another skill.
Double Diamond fit
fix spans the full Double Diamond, but keeps divergence mostly internal to stay autonomous:
- Discover: reproduce/characterize + collect counterexamples.
- Define: state the contract and invariants (before/after).
- Develop: (internal) consider 2-3 plausible fix strategies; pick the smallest sound one per default policy.
- Deliver: implement + prove with a real validation signal.
If a fix requires a product-sensitive choice (cannot be derived or characterized safely), stop and ask (or invoke $creative-problem-solver when multiple viable strategies exist).
Skill composition (required)
$invariant-ace: default invariant/protocol engine forfix; run it before invariant-affecting edits.$complexity-mitigator: default complexity analysis engine; use it for risk-driven complexity judgments before reshaping code.$refine: default path when the task is to improvefixitself (or other skills); apply$refineworkflow andquick_validate.- Delegation order when both are needed:
$invariant-acefirst, then$complexity-mitigator.
Delegation triggers (deterministic):
- Run
$invariant-acewhen the slice touches stateful boundaries (parse/construct/API/DB/lock/txn/retry ordering), ownership/lifetime, or any "should never happen" claim. - Run
$complexity-mitigatorwhen auditability is at risk (for example: branch soup, deep nesting, cross-file reasoning, or unclear ownership/control flow). - Run
$refinewhen the requested target is a skill artifact (for exampleSKILL.md,agents/openai.yaml, skill scripts/references/assets). - If both invariant and complexity triggers fire, run
$invariant-acefirst, then$complexity-mitigator.
Inputs
- User request text.
- Repo state (code + tests + scripts).
- Validation signal (failing test/repro/log) OR a proof hook you create.
Outputs (chat)
- Emit the exact sections in
Deliverable format (chat). - Use section headings verbatim (for example,
**Findings (severity order)**, notFindings). - During execution, emit one-line pass progress updates at pass start and pass end using:
Pass <n>/<total_planned>: <name> — <start|done>; edits=<yes|no|n/a>; signal=<cmd|n/a>; result=<ok|fail|n/a>. - In
Validation, include a machine-checkable JSON object with keys:baseline_cmd,baseline_result,proof_hook,final_cmd,final_result. - Keep self-review transcript shapes aligned with
references/self_review_loop_examples.mdwhen editing this contract.
Embedded mode (when $fix is invoked inside another skill)
- You may emit a compact Fix Record instead of the full deliverable.
- If another skill requires a primary artifact (for example patch-only diff), append Fix Record immediately after that artifact in the same assistant message.
- Do not mention “Using $fix” without emitting either the full deliverable or a Fix Record.
Hard rules (MUST / MUST NOT)
- MUST default to review + implement (unless review-only is requested).
- MUST triage and present findings in severity order: security > crash > corruption > logic.
- MUST NOT claim done without a passing validation signal.
- MUST NOT edit when no local signal/proof hook can be found or created under
Validation signal selection; fail fast and report blocker. - MUST NOT do product/feature work.
- MUST NOT do intentional semantic changes without clarifying, except correctness tightening.
- MUST resolve trade-offs using
Default policy (non-interactive). - MUST emit pass progress updates while running the multi-pass loop.
- MUST include a final
Pass tracesection in the deliverable/Fix Record with executed pass count and per-pass outcomes. - MUST include machine-checkable validation evidence keys in
Validation:baseline_cmd,baseline_result,proof_hook,final_cmd,final_result. - MUST use the exact heading names from
Deliverable format (chat)/Fix Record; do not alias or shorten heading labels. - MUST produce a complete finding record for every acted-on issue:
- counterexample
- invariant_before
- invariant_after
- fix
- proof
- proof_strength
- compatibility_impact
- MUST follow the delegation contract in
Skill composition (required)(including order:$invariant-ace->$complexity-mitigator). - MUST route skill-self edits (for example
codex/skills/fix) through$refineand runquick_validate. - MUST NOT put fixable items in
Residual risks / open questions; if it is fixable under the autonomy gate + guardrails, treat it as a finding and fix it. - MUST include a final
Self-review loop tracesection in the deliverable/Fix Record. - MUST run the final agent-directed self-review loop only after at least one change is applied and validation is passing.
- MUST run the self-review loop against the final validated changeset only.
- MUST invalidate and rerun the self-review loop if any non-self-review edit occurs after a self-review round.
- MUST ask internally exactly:
If you could change one thing about this changeset what would you change? - MUST treat the final agent-directed self-review phase as current-worktree scoped once the first validated changeset exists; do not reject a self-review suggestion solely because it broadens the diff.
- MUST treat a self-review answer as actionable whenever it identifies a concrete compatible/provable improvement on the current validated changeset, even if the improvement is ergonomic, structural, or API-shaping rather than a baseline bug fix.
- MUST answer that question internally, apply at most one new actionable self-review change per self-round, re-run validation, and repeat until a self-round yields no new actionable self-review change or only blocked changes.
- MUST NOT reject a self-review suggestion solely because the baseline is already green, the concern sounds architectural, or the change would reshape a public/API seam; if it can stay backward-compatible and be revalidated locally, implement it.
- MUST, when a self-review critique is broader than the smallest bug repair, apply the narrowest compatible/provable slice that materially addresses the critique before considering broader follow-up skills.
- MUST rerun the non-self-review
$fixpasses once after the self-review loop reachesno_new_actionable_changesorblocked. - MUST NOT use
scope_guardrailas the reason to reject a self-review suggestion once the self-review phase has started. - MUST NOT report a self-review answer that was already applied before the final self-review round; record
finding=noneonly when the current final validated changeset yields no concrete compatible/provable self-review change. - MUST NOT emit
If you could change one thing about this changeset what would you change?as a user-facing terminal line during normal successful completion. - MUST stop
$fixonce no new actionable self-review change remains and the post-self-review rerun is clean; do not continue under$fixinto broader architecture, product, roadmap, or conceptual analysis. - MUST, if the user asks for broader or bolder analysis after a clean or closed
$fixpass, close the$fixdeliverable first and recommend the next skill explicitly ($grill-me,$parse,$plan, or$creative-problem-solver) instead of continuing under$fix. - When paired with
$tkin wave execution, MUST treat$fixas the final mutating pass before artifactization:commit_first: hand off immediately to$commitafter passing validation.patch_first: hand off immediately to$patchafter passing validation.
Default policy (non-interactive)
Use these defaults to maximize autonomy (avoid asking).
Priority order (highest first):
- correctness + data safety + security
- compatibility for PROVEN_USED behavior
- performance
Definitions:
- PROVEN_USED = behavior with evidence (see checklist below).
- NON_PROVEN_USED = everything else.
PROVEN_USED evidence checklist (deterministic)
Goal: classify behavior as PROVEN_USED vs NON_PROVEN_USED without asking.
Algorithm:
- Enumerate affected behavior tokens:
- API symbols (functions/types/methods).
- CLI flags/commands.
- config keys / env vars.
- file/wire formats (field names, JSON keys).
- error codes/messages consumed by callers.
- Collect evidence in this order (stop at first match):
- User repro/signal references the token.
- Tests assert on the token/behavior.
- Docs/README/examples/CHANGELOG/config examples describe or demonstrate the token/behavior.
- Repo callsites use the token (non-test, non-doc).
- IF any evidence exists, THEN mark PROVEN_USED; ELSE mark NON_PROVEN_USED.
Evidence scan procedure (repo-local):
- Tests: search
tests/,test/,__tests__/,spec/, and files matching*test*/*.spec.*. - Docs/examples: search
README*,docs/,examples/,CHANGELOG*,config.example*, and*.md. - Callsites: search remaining source files.
- CI/config surfaces: also scan
.github/workflows/for env vars, flags, and script entrypoints.
Tooling hint: prefer rg -n "<token>" with --glob filters.
Deterministic scan command template (run in this order):
rg -n "<token>" tests test __tests__ spec --glob '*test*' --glob '*.spec.*'rg -n "<token>" README* docs examples CHANGELOG* config.example* --glob '*.md'rg -n "<token>" .github/workflowsrg -n "<token>" .
Externally-used surface checklist (deterministic)
Treat a surface as external if ANY is true:
- Token appears in docs/README/examples.
- Token is a CLI flag/command, config key, env var, or file format field.
- Token is exported/public API (examples: Rust
pub, TS/JSexport, Python in__all__, Go exported identifier).
If external use is plausible but uncertain:
- Prefer an additive compatibility path (wrapper/alias/adapter) if small.
- Ask only if a breaking change is unavoidable.
Compatibility rules:
- MUST preserve PROVEN_USED behavior unless it is unsafe (crash/corruption/security). If unsafe, tighten and return a clear error.
- For NON_PROVEN_USED behavior, treat it as undefined; tightening is allowed.
API/migration rules:
- IF a fix touches an externally-used surface (use
Externally-used surface checklist), THEN prefer additive/backward-compatible changes (new option/new function/adapter) over breaking changes. - IF a breaking change is unavoidable, THEN stop and ask.
Performance rules:
- MUST avoid obvious asymptotic regressions in hot paths.
- IF performance impact is plausible but unmeasurable locally,
THEN choose the smallest correctness-first fix and record the risk in
Residual risks / open questions(do not ask).
Autonomy gate (conviction)
Proceed without asking only if ALL are true:
- SIGNAL: you have (or can create) a local repro/signal without product ambiguity.
- CONTRACT: you can derive contract from repo evidence OR characterize current behavior without product ambiguity.
- INVARIANT: you can state invariant_before and invariant_after.
- DIFF: the change is localized and reviewable for the chosen strategy.
- PROOF: at least one validation signal passes after changes.
If ANY gate fails:
- Apply Defaults + contract derivation.
- Ask only if still blocked.
Correctness tightening (default)
Definition: tightening = rejecting invalid/ambiguous states earlier, removing silent failure paths, or making undefined behavior explicit.
Rule:
- IF tightening prevents crash/corruption/security issue, THEN apply it (even if PROVEN_USED), return a clear error, and record the compatibility impact.
- ELSE IF tightening affects only NON_PROVEN_USED inputs/states, THEN apply it without asking.
- ELSE (tightening might change PROVEN_USED behavior):
- prefer a backward-compatible shape (adapter/additive API), OR
- lock current behavior with a characterization test,
- ask only if compatibility cannot be preserved.
Allowed tightening moves (examples only):
- Parse/coerce: implicit coercion/partial parse -> explicit parse + error (JS implicit string->number; Rust
parse().ok()fallback). - Fallbacks: warn+fallback/default-without-signal -> explicit error OR explicit "unknown"/"invalid".
- Lossy conversion: truncation/clamp/wrap -> checked conversion + error (Rust
as; C casts; Go int conversions). - Partial updates: multi-step mutation -> atomic/transactional boundary OR explicit partial result.
- Ignored errors: ignore -> propagate with context; if intentionally ignored -> compensating signal (assert/test/metric/log).
- Sentinels:
-1/0/""/None-> richer returns.
Defaults (use only when semantics-preserving for valid inputs)
Ownership / lifetimes
- MUST inventory in-slice resources: allocations, handles, sockets, locks, txns, temp files.
- For each resource, record: acquire_site, owner, release_action.
- MUST ensure every exit releases exactly once.
- MUST free/release ONLY via the proven owner/allocator/handle.
- Never assume a "free" is safe without proving allocator/ownership.
- Arena/bump allocators: per-object
freeis safe only if explicitly documented as permitted/no-op.
Validation signal selection
Algorithm:
- IF the user provided a command, use it.
- ELSE choose the cheapest local signal (no network) in this order:
- README/QUICKSTART
scripts/check/scripts/testMakefile/justfile/Taskfile.yml
- ELSE create a proof hook in this order:
- focused regression/characterization test
- boundary assertion that fails loudly
- scoped + rate-limited diagnostic log tied to ONE invariant (never log secrets/PII)
No-signal fail-fast (deterministic):
- If no signal/proof hook is available after executing the selection algorithm once, stop before editing.
- Return one blocker with
blocked_by=no_repro_or_proofand include the exact commands you attempted.
PR/diff scope guardrail (default in review mode)
- Default slice = changed lines/paths (git diff/PR diff) + at most one boundary seam (parse/construct/API edge) required to make the fix sound.
- Do not fix pre-existing issues outside the slice unless:
- severity is security/crash/corruption AND
- the fix is localized and provable without widening the slice.
- Otherwise: record as
Residual risks / open questions.
Scope widening trigger (deterministic):
- Widen beyond the diff only when ALL are true:
- severity is
security,crash, orcorruption - you can show a concrete causal chain from a diff token to the out-of-slice location
- the widening is limited to one adjacent seam (at most one additional file/module boundary)
- one local validation signal can prove the widened fix
- severity is
- If any check fails, keep scope fixed and record
blocked_by=scope_guardrail. - This guardrail applies to the core review passes.
- Once the final agent-directed self-review loop starts, scope expands to the current repo/worktree and
scope_guardrailis not a valid reason to reject the self-review suggestion.
Generated / third-party code guardrail
- If a file appears generated or is under third-party/build output, do not edit it directly.
- Prefer to locate the source-of-truth + regeneration step; apply fixes there and regenerate as needed.
- Common no-edit zones (examples):
dist/,build/,vendor/,node_modules/.
Proof discipline for passing baselines
When the baseline signal is ok:
- For every acted-on finding, prefer a proof hook that fails before the fix (focused regression/characterization test).
- If you cannot produce a failing proof hook, do not edit; attempt to create one first by:
- turning the counterexample into a focused regression/characterization test
- enforcing the invariant at a single boundary seam (parse/refine once) and testing the new error
- reducing effects to a pure helper and unit-testing it
- using an existing fuzzer/property test harness if present
- Only if you still cannot create a proof hook without product ambiguity, record the blocker as residual risk.
- Self-review exception: when the self-review finding is a concrete compatible simplification or auditability improvement on an already-green changeset, a fresh failing hook is preferred but not mandatory; the primary validation bundle may serve as the proof hook if it still proves the improved invariant after the change.
Residual risks / open questions policy (last resort)
Residual risks are a record of what you could not safely fix (not a to-do list).
Rules:
- Before emitting a residual item, attempt to convert it into an actionable finding:
- produce a concrete counterexample
- attach a proof hook (test/assert) that fails before the fix
- apply the smallest localized fix within guardrails
- re-run the validation signal
- Only emit residual items that are truly blocked by one of these blockers:
product_ambiguity(semantics cannot be derived/characterized safely)breaking_change(no additive path; fix would be breaking)no_repro_or_proof(cannot create a repro/proof hook locally)scope_guardrail(outside diff slice and not severe enough to widen)generated_output(generated/third-party output; need source-of-truth + regen)external_dependency(needs network/creds/services/hardware)perf_unmeasurable(impact plausible; no local measurement)
- For self-review-originated changes, do not emit
blocked_by=scope_guardrail; if scope is the only blocker, widen and continue. - Every residual bullet MUST include: a location or token,
blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable>, andnext=<one action>. - If there are no residual items, output
- None.
Multi-pass loop (default)
Goal: reduce missed issues in PR/diff reviews without widening scope.
Run 3 core passes (mandatory). Run 2 additional delta passes only if pass 3 edits code.
Pass 1) Safety (highest severity)
- Scope: diff-driven slice + required boundary seams.
- Focus: security/crash/corruption hazards, unsafe tightening, missing error propagation.
- Change budget: smallest sound fix only.
Pass 2) Surface (compat + misuse)
- Scope: externally-used surfaces touched by the diff (exports/CLI/config/format/docs).
- Focus: additive compatibility, defuse top footguns, clearer errors.
- Change budget: additive/wrapper/adapter preferred; breaking change => stop and ask.
Pass 3) Audit (invariants + ownership + proof quality)
- Scope: final diff slice.
- Focus: invariants enforced at strongest cheap boundary; ownership release-on-all-paths; proof strength.
- Delegation: run
$invariant-acefirst for invariant framing, then$complexity-mitigatorfor complexity verdicts that affect auditability. - Change budget: no refactors unless they directly reduce risk/auditability of invariants.
Early exit (stop after pass 3):
- If pass 3 applies no edits, stop (after running/confirming the primary validation signal).
Delta passes (only if pass 3 applied edits)
Pass 4) Safety delta rescan
- Scope: ONLY lines/paths changed by passes 1-3 and their immediate boundaries.
- Focus: new security/crash/corruption hazards introduced by the fixes.
Pass 5) Surface + proof delta rescan
- Re-enumerate behavior tokens from the FINAL diff (including newly introduced errors/flags/exports/config keys).
- Re-run PROVEN_USED/external-surface checks for newly introduced tokens.
- Ensure proof is still strong for the final diff.
Rules:
- After any pass that edits code, run a local signal before continuing.
- Do not edit on suspicion: every edit must have a concrete counterexample, invariant_before/after, and a proof hook.
- During the self-review phase, the counterexample may be an auditability or responsibility-split failure in the current validated changeset; if the change is compatible and locally validated, that is actionable under
$fixeven when the baseline was already green. - Merge/de-duplicate findings across passes; final output format stays unchanged.
- Emit pass progress updates in real time at pass start/end; these updates are required and do not replace the final
Pass tracesection.
Clarify before changes
Stop and ask ONLY if any is true:
- Behavior is contradictory/product-sensitive AND cannot be derived from tests/docs/callsites AND cannot be characterized safely.
- A breaking API change or irreversible migration is unavoidable (no backward-compatible path).
- No local signal/proof hook can be found or created without product ambiguity.
Finding record schema (internal)
For every issue you act on, construct this record before editing:
- id:
F<number> - location:
file:line - severity:
security|crash|corruption|logic - issue:
- tokens: <affected behavior tokens; empty if none>
- proven_used: <yes/no + evidence if yes>
- external_surface: <yes/no + why>
- diff_touch: <yes/no>
- counterexample: <input/timeline>
- invariant_before: <what is allowed/assumed today>
- invariant_after: <what becomes guaranteed/rejected>
- fix:
- proof: <test/assert/log + validation command + result>
- proof_strength:
characterization|targeted_regression|property_or_fuzz - compatibility_impact:
none|tightening|additive|breaking
Canonical finding example (fully-filled)
Use this as the reference shape when writing findings.
F1 `src/config_loader.py:88` — crash — untrusted config type causes uncaught attribute access
- Surface: Tokens=config.path; PROVEN_USED=yes (tests/config/test_loader.py::test_reads_path); External=yes; Diff_touch=yes
- Counterexample: `{"path":null}` reaches `cfg.path.strip()` and raises `AttributeError`
- Invariant (before): loader assumes `path` is a non-empty string
- Invariant (after): loader accepts only non-empty string `path`; invalid type returns explicit error
- Fix: add boundary validation in `parse_config()` and reject invalid `path` before use
- Proof: `uv run pytest tests/config/test_loader.py::test_rejects_null_path` -> ok
- Proof strength: `targeted_regression`
- Compatibility impact: `tightening`
Residual risks / open questions
- `vendor/generated/config_schema.py` — blocked_by=generated_output — next=edit source schema and regenerate artifact
Workflow (algorithm)
0) Preflight
- Determine mode: review-only vs fix.
- If the requested target is this skill (or another skill), route through
$refine:- follow
$refineDiscover -> Define -> Develop -> Deliver, - apply minimal skill diffs,
- run
uv run --with pyyaml -- python3 codex/skills/.system/skill-creator/scripts/quick_validate.py codex/skills/<skill-name>. - if
codex/skills/fix/SKILL.mdwas touched, also runuv run python codex/skills/fix/scripts/lint_fix_skill_contract.py codex/skills/fix/SKILL.md. - Return to the rest of
fixonly if the request also includes code/runtime defects.
- follow
- Define slice:
- If in a git repo and there is a diff, derive slice/tokens from the diff (paths, changed symbols/strings).
- Otherwise: entrypoint, inputs, outputs, state.
- If the request is PR-scoped (for example "
$fix this PR", "$fix current branch", CI failure on a PR), anchor the slice to PR diff/base and keep the work PR-local unless severity widening is required. - Apply
PR/diff scope guardrailfor review mode. - Apply
Generated / third-party code guardrailbefore editing. - Select validation signal (or create proof hook).
- If signal/proof creation fails, stop before editing and return
blocked_by=no_repro_or_proofwith attempted commands.
1) Contract + baseline
- Determine PROVEN_USED behavior:
- Apply
PROVEN_USED evidence checklistto the behavior tokens affected by the slice.
- Apply
- Derive contract without asking (in this order):
- tests that exercise the slice
- callsites in the repo
- docs/README/examples
- if none: add a characterization test for current behavior
- Write contract (1 sentence): "Working means …".
- Run baseline signal once; record result.
- If baseline signal is ok, apply
Proof discipline for passing baselines.
2) Create initial findings
- Enumerate candidate failure modes for the slice.
- Rank security > crash > corruption > logic.
- For each issue you will act on, create a finding record.
3) Mandatory scans (multi-pass)
Run the Multi-pass loop (default) for the touched slice.
3a) Unsoundness scan
For each hazard class that applies:
- Identify the first failure point.
- Provide a concrete counterexample (input/timeline).
- Specify the smallest sound fix that removes the bug-class.
Hazard classes:
- Security boundaries: authz/authn, injection, path traversal, SSRF, unsafe deserialization, secrets/PII handling.
- Nullability/uninitialized state; sentinel values.
- Ownership/lifetime: leaks, double-free, use-after-close, lock not released.
- Concurrency/order/time: races, lock ordering, reentrancy, timeouts/retries, TOCTOU.
- Bounds/arithmetic: indexing/slicing, overflow/underflow, signedness.
- Persistence/atomicity: partial writes, torn updates, non-transactional multi-step updates.
- Encoding/units: bytes vs chars, timezone/locale, unit mismatches, lossy normalization.
- Error handling: swallowed errors, missing context, silent fallback.
Language cues (examples only):
- Rust:
unsafe,unwrap/expect,ascasts. - JS/TS:
any, implicit coercions,undefinedflows. - Python: bare
except, truthiness-based branching. - C/C++: raw pointers, unchecked casts, manual
malloc/free.
3b) Invariant strengthening scan
Delegate to $invariant-ace and import its artifacts into fix.
Default execution:
- Run
$invariant-aceCompact Mode first. - Capture at minimum:
CounterexampleInvariantsOwner and ScopeEnforcement BoundarySeam (Before -> After)Verification
- Map these to
fixfinding fields:invariant_beforefrom broken trace + prior scope,invariant_afterfrom chosen predicate(s) + holds scope,fixfrom seam,prooffrom verification signal.
- Escalate to full
$invariant-aceprotocol if Compact Mode does not yield an inductive predicate.
3c) Footgun scan + defusal
Trigger: you touched an API surface OR a caller can plausibly misuse the code.
For top-ranked misuse paths:
- Provide minimal misuse snippet + surprising behavior.
- Defuse by changing the surface:
- options structs / named params
- split booleans into enums or separate functions
- explicit units/encodings
- richer returns (no sentinels)
- separate pure computation from effects
- Lock with a regression test or boundary assertion.
3d) Complexity scan (risk-driven)
Delegate to $complexity-mitigator for analysis; implement only via fix.
Rule: reshape only when it reduces risk and improves auditability of invariants/ownership.
Algorithm:
- Run
$complexity-mitigatoron the touched slice (heat read + essential/incidental verdict + ranked options + TRACE). - Select the smallest viable cut that directly supports a finding (safety/surface/invariant ownership/proof quality).
- If simplification depends on an unstated invariant, run/refresh
$invariant-acefirst. - Keep complexity-only cleanup out of scope unless it closes a concrete risk.
4) Implement fixes (per finding)
For findings in severity order:
- Implement the smallest sound fix that removes the bug-class.
- Apply correctness tightening when allowed.
- Ensure invariants + ownership hold on all paths.
- Keep diff reviewable; avoid drive-by refactors.
5) Close the loop (required)
- Run the chosen validation signal.
- IF it fails:
- update findings with the new counterexample,
- apply the smallest additional fix,
- re-run the SAME signal.
- Repeat until the signal passes.
- If you cannot make progress after 3 repair cycles, stop and ask with:
- the last failing output (key lines/stack),
- what you tried,
- the smallest remaining decision that blocks you.
6) Residual risk sweep (required)
- Re-check
Residual risks / open questions policy. - Any item without a valid blocker becomes a finding and is fixed (with proof) or dropped.
7) Agent-directed self-review loop (required final step when a changeset exists)
- Precondition gate: run this step only when
Changes appliedis notNoneand the latest validation signal result isok. - Freeze the self-review baseline as the latest validated changeset.
- Ask internally exactly:
If you could change one thing about this changeset what would you change? - Summarize the self-round in
Self-review loop tracewith one delta row. - If the answer yields one new actionable self-review change:
- convert it into one concrete finding,
- treat compatible ergonomic/structural/API-shaping improvements as actionable when they materially improve the current validated changeset and can be proven locally,
- widen anywhere in the current repo/worktree as needed,
- apply the smallest sound change that materially addresses the critique,
- re-run the chosen validation signal and update findings/proof,
- record the
(validated_changeset_fingerprint, normalized_answer_summary)pair so repeated suggestions cannot loop forever, - repeat from step 2.
- If the answer yields only blocked changes, record
stop_reason=blocked, carry blockers toResidual risks / open questions, and continue to step 8. - If the answer yields no new actionable self-review change, record
stop_reason=no_new_actionable_changesand continue to step 8.- This check is against the current final validated changeset, not an earlier pre-delta review state.
already green,architectural,not fix-shaped, orno failing proof hookare not sufficient reasons by themselves when a concrete compatible/provable improvement still exists.
- Run the non-self-review
$fixpasses once against the full resulting changeset. - If that rerun edits code, discard the stale self-review state, revalidate, and restart from step 2 against the new final changeset.
- If that rerun applies no edits, exit the loop and proceed to output.
- Skip gate: if
Changes appliedisNoneor the run is blocked before edits, output- None (skip_gate)inSelf-review loop traceand proceed to output.
7a) Post-fix boundary (required)
- If a clean or closed
$fixpass surfaces broader non-fix opportunities, do not continue exploring them under$fix. - If the user explicitly asks for broader or deeper follow-up after closure, recommend the next skill explicitly:
$parsefor architecture or purpose analysis$grill-mefor interrogation, pressure-testing, or narrowing the next move$planfor a decision-complete roadmap$creative-problem-solverfor wider option generation
- Do not place those broader opportunities in
Residual risks / open questionsunless a valid blocker from the allowed set applies.
8) Output lock (required)
Before sending the final message, verify all are true:
- Heading set is exact and complete (
Contract,Findings (severity order),Changes applied,Pass trace,Validation,Self-review loop trace,Residual risks / open questions). Pass traceincludes planned/executed counts and P1/P2/P3 lines (plus P4/P5 when executed).- Runtime pass updates (
Pass <n>/<total_planned>: ...) were emitted during execution. - If embedded mode was used, include Fix Record in the same assistant message (after any required artifact).
Validationincludes machine-checkable keys:baseline_cmd,baseline_result,proof_hook,final_cmd,final_result.- Every acted-on finding includes
Proof strengthandCompatibility impactusing the allowed enums. - Every residual
blocked_byuses only:product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable. - If
Changes appliedis notNoneAND the latest validation result isok,Self-review loop traceincludes a terminal row withstop_reason=<no_new_actionable_changes|blocked>and the user-facing final line is not the question.
Deliverable format (chat)
Output exactly these sections in this order.
If no findings:
- Findings (severity order):
None. - Changes applied:
None. - Self-review loop trace:
- None (skip_gate). - Residual risks / open questions:
- None. - Still include Pass trace and Validation.
Contract
Findings (severity order) For each finding:
F#<file:line>—<security|crash|corruption|logic>—- Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
- Counterexample: <input/timeline>
- Invariant (before): <what was assumed/allowed>
- Invariant (after): <what is now guaranteed/rejected>
- Fix:
- Proof: <test/assert/log + validation command> -> <ok/fail>
- Proof strength:
<characterization|targeted_regression|property_or_fuzz> - Compatibility impact:
<none|tightening|additive|breaking>
Changes applied
- —
Pass trace
- Core passes planned:
3; core passes executed:<3> - Delta passes planned:
<0|2>; delta passes executed:<0|2> - Total passes executed:
<3|5> P1 Safety-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>P2 Surface-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>P3 Audit-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>- If delta passes executed, also include
P4andP5lines in the same format.
Validation
- -> <ok/fail>
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"}(single-line JSON)
Self-review loop trace
- If none:
- None (skip_gate) - Otherwise:
S#prompt=If you could change one thing about this changeset what would you change?; answer_summary=<...>; finding=<F#|none>; change_applied=<yes|no>; proof=<cmd|n/a>; result=<ok|fail|n/a>; stop_reason=<continue|no_new_actionable_changes|blocked>
Residual risks / open questions
- If none:
- None - Otherwise:
- <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>
Fix Record (embedded mode only)
Use only when $fix is invoked inside another skill.
Findings (severity order) For each finding:
F#<file:line>—<security|crash|corruption|logic>—- Surface: Tokens=<...>; PROVEN_USED=<yes/no + evidence>; External=<yes/no>; Diff_touch=<yes/no>
- Counterexample: <input/timeline>
- Invariant (before): <what was assumed/allowed>
- Invariant (after): <what is now guaranteed/rejected>
- Fix:
- Proof: <test/assert/log + validation command> -> <ok/fail>
- Proof strength:
<characterization|targeted_regression|property_or_fuzz> - Compatibility impact:
<none|tightening|additive|breaking>
Changes applied
- —
Pass trace
- Core passes planned:
3; core passes executed:<3> - Delta passes planned:
<0|2>; delta passes executed:<0|2> - Total passes executed:
<3|5> P1 Safety-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>P2 Surface-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>P3 Audit-><done>; edits=<yes|no>; signal=<cmd|n/a>; result=<ok|fail|n/a>- If delta passes executed, also include
P4andP5lines in the same format.
Validation
- -> <ok/fail>
{"baseline_cmd":"<cmd|n/a>","baseline_result":"<ok|fail|n/a>","proof_hook":"<test/assert/log|n/a>","final_cmd":"<cmd>","final_result":"<ok|fail>"}(single-line JSON)
Self-review loop trace
- If none:
- None (skip_gate) - Otherwise:
S#prompt=If you could change one thing about this changeset what would you change?; answer_summary=<...>; finding=<F#|none>; change_applied=<yes|no>; proof=<cmd|n/a>; result=<ok|fail|n/a>; stop_reason=<continue|no_new_actionable_changes|blocked>
Residual risks / open questions
- If none:
- None - Otherwise:
- <file:line or token> — blocked_by=<product_ambiguity|breaking_change|no_repro_or_proof|scope_guardrail|generated_output|external_dependency|perf_unmeasurable> — next=<one action>
Pitfalls
- Vague advice without locations.
- "Nice-to-have" refactors that don’t reduce risk.
- Feature creep.
- Continuing broad repo critique or planning under a completed
$fixlabel. - Asking for risk tolerance instead of applying the default policy.
- Tightening presented as optional.
- Ownership fixes without proven allocator/owner.
- Editing generated/third-party outputs instead of source-of-truth.
- Code edits without a failing proof hook when baseline was ok.
- Using
Residual risks / open questionsas a substitute for fixable findings.
Activation cues
- "resolve" / "fix" / "crash" / "data corruption"
- "$fix this PR" / "$fix current branch" / "fix this branch"
- "CI failed" / "fix the red checks" / "repair failing PR"
- "footgun" / "misuse" / "should never happen"
- "invariant" / "lifetime" / "nullable surprise"
- "too complex" / "branch soup" / "cross-file hops"
- "refine $fix" / "improve fix skill workflow" / "update fix skill docs"