Meta-governance — 6 decision-layer patterns

Self-Evolving Skill: If any pattern here misled decisions, update the section AND append to references/evolution-log.md. Don't defer.

These patterns are meta-level — they're about the investigation itself, not its content. Invoke when a decision must be made: pivot vs persist, kill vs narrow, ship vs hold.

1. Physical-constraint-first pivot

When brute force yields null, extract the execution constraint and redesign the hypothesis class to fit it. Don't iterate on a hypothesis that ignores reality.

Session example: 17 directional-signal null campaigns → user pivoted:

"What's the best strategy for a highly random walk market?" "I can only trade on a traditional MT5 broker that allows hedging positions."

From this came the synthetic straddle (BUY_STOP + SELL_STOP pending orders, OCO). Constraint-driven design unlocked the strategy class. The math (diffusive displacement in random walks: E[|ΔS|] > 0) was always available; what was missing was honoring the execution venue.

Ask yourself:

What execution venue is the user actually on?
What types of orders are possible?
What's the realistic slippage / commission / spread?
What position-sizing constraints apply?

If the hypothesis doesn't survive these questions, pivot the hypothesis, not the statistics.

2. Incremental artifact promotion (/tmp → repo early)

Move findings from /tmp/ to the persistent repo (audits/YYYY-MM-DD-slug/) as soon as a result survives two independent tests, not "when done".

Session anti-pattern: reproducers written in /tmp/ during exploration, causing reproducibility loss on reboot. The moment a result passed Gate C (OOS) it should have been promoted — not after the 4-gate suite completed.

Promotion triggers (at least one required):

Result passed shuffled-null z > 3 AND hasn't been contradicted
An agent synthesized a verdict that supersedes an earlier one
A reproducer script ran successfully twice

Mechanics:

mkdir -p findings/evolution/audits/$(date +%Y-%m-%d)-slug
cp /tmp/reproducer.py /tmp/artifact.json findings/evolution/audits/.../
# Write CLAUDE.md navigator + verdict.md
# Append to evolution.jsonl

What's impermanent gets lost.

3. Gate-failure scopes not kills

When a signal fails one of the serial gates (see Skill B §2), downgrade its scope, don't kill it outright.

Failed gate	Action
Gate A (directional breakdown)	Learn which side — often simplify to one-side
Gate B (mirror symmetry)	Note asymmetry; record as "direction-biased" feature
Gate C (OOS time-split)	Kill. No scope-narrowing rescues in-sample overfit.
Gate D (cross-asset)	Downgrade to `<asset>-specific`; keep
Gate E (per-year)	Flag bad years as "regime-unfavorable"; explore regime filters

NGRAM3FU-STRADDLE-001 failed Gate D (XAUUSD, GBPUSD) but passed A/B/C/E. Status downgraded to eur-only, NOT killed. A year later, if XAUUSD develops different microstructure, it could be retested — this is the resurrect_if: trigger (see Skill D).

Principle: scope-narrowing preserves optionality. Hard kills lose negative knowledge.

4. Agent-lens disagreement as signal

When parallel agents DISAGREE, the disagreement itself is diagnostic.

Session example: 4 agents reported "lower rejection at bottom → 67.8% UP" as a signal. Agent 5 (hidden-signal hunter, critic) flagged it as label leakage. The disagreement pointed precisely at the bug.

When agents disagree:

Don't average or vote — map WHAT they disagree about
Check: does one agent's evidence involve an implicit assumption the other rejects?
Disagreement about mechanism → investigate mechanism (may be label leakage, confound, or real but lens-bound effect)
Disagreement about significance → check each agent's multiple-testing burden

Anti-pattern: picking the agent that gives the answer you want. If the critic-agent disagrees with the proposer-agents, the critic is usually right.

5. Context-budget discipline

Conversation and data context are scarce. Reserve them for the most ambiguous questions; compress known-good findings ruthlessly.

Hierarchy of compression:

Raw bars (not for agents; 67 MB)
Token-rendered bar sequences (60 KB; good for one agent)
Stats tables (60 KB; consumable by 5 parallel agents) — PREFERRED
Ledger entries (1 KB; tracks findings)

When context feels tight:

Emit a fresh audit folder with artifacts; future sessions load that, not the transcript
Drop detailed raw data from agent prompts; use markdown summaries
If you must hand off mid-session, write a handoff file in .planning/ (not plugin scope; see project root)

Signal: context is BLOCKED when: you find yourself re-reading the same file twice in one session; or agents ask for re-briefings; or you can't remember what was decided 10 turns ago. Compress to an audit folder.

6. Supersede-not-rewrite

When a later finding replaces an earlier one, add a new ledger entry with supersedes: "OLD-ID"; update the old entry with superseded_by: "NEW-ID". Never rewrite or delete.

Why:

Future auditors need the trail, not the final answer
A superseded finding may contain negative knowledge (why it failed) that informs future work
Deletions create "mysterious silences" that agents can't interpret

Canonical chain from session:

NGRAM3FU-STRADDLE-001                     preliminary-positive
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-GATES               gates-validated (Gate D failed → eur-only)
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-HISTORY        confirmed at 7.18M bars
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FILTERED            Phase-L filter validated
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-STACK          Phase-L + Phase-M final

Note supplements vs supersedes: supplement EXTENDS; supersede REPLACES. Pick the right relationship.

Anti-pattern: editing an old ledger entry because the finding "got better". That's rewriting history. Add a new entry.

Confirmation counts

Pattern	Confirmed	Notes
1. physical-constraint pivot	1	The session-defining pivot (directional → straddle)
2. artifact promotion	Multiple	Every /tmp → audit folder move
3. gate-failure scopes	1	NGRAM3FU-STRADDLE Gate D → eur-only
4. disagreement as signal	2	Act-2 label leakage catch; Phase L agent variance
5. context-budget	Implicit	Used every time we preferred stats tables over raw
6. supersede-not-rewrite	5	NGRAM3FU-STRADDLE chain, 5 entries

Post-Execution Reflection

After invoking this skill:

Did a pattern save you from a bad decision? Increment confirmed count; note in references/evolution-log.md.
Did a pattern produce the wrong call? Demote it; record context + link to where it misled.
A new decision pattern emerged that isn't here? Draft a section.
A pattern could be better-phrased for future agents? Edit the text directly; log why.

crucible-meta-governance