crucible-meta-governance
Meta-governance — 6 decision-layer patterns
Self-Evolving Skill: If any pattern here misled decisions, update the section AND append to
references/evolution-log.md. Don't defer.
These patterns are meta-level — they're about the investigation itself, not its content. Invoke when a decision must be made: pivot vs persist, kill vs narrow, ship vs hold.
1. Physical-constraint-first pivot
When brute force yields null, extract the execution constraint and redesign the hypothesis class to fit it. Don't iterate on a hypothesis that ignores reality.
Session example: 17 directional-signal null campaigns → user pivoted:
"What's the best strategy for a highly random walk market?" "I can only trade on a traditional MT5 broker that allows hedging positions."
From this came the synthetic straddle (BUY_STOP + SELL_STOP pending orders, OCO). Constraint-driven design unlocked the strategy class. The math (diffusive displacement in random walks: E[|ΔS|] > 0) was always available; what was missing was honoring the execution venue.
Ask yourself:
- What execution venue is the user actually on?
- What types of orders are possible?
- What's the realistic slippage / commission / spread?
- What position-sizing constraints apply?
If the hypothesis doesn't survive these questions, pivot the hypothesis, not the statistics.
2. Incremental artifact promotion (/tmp → repo early)
Move findings from /tmp/ to the persistent repo (audits/YYYY-MM-DD-slug/) as soon as a result survives two independent tests, not "when done".
Session anti-pattern: reproducers written in /tmp/ during exploration, causing reproducibility loss on reboot. The moment a result passed Gate C (OOS) it should have been promoted — not after the 4-gate suite completed.
Promotion triggers (at least one required):
- Result passed shuffled-null z > 3 AND hasn't been contradicted
- An agent synthesized a verdict that supersedes an earlier one
- A reproducer script ran successfully twice
Mechanics:
mkdir -p findings/evolution/audits/$(date +%Y-%m-%d)-slug
cp /tmp/reproducer.py /tmp/artifact.json findings/evolution/audits/.../
# Write CLAUDE.md navigator + verdict.md
# Append to evolution.jsonl
What's impermanent gets lost.
3. Gate-failure scopes not kills
When a signal fails one of the serial gates (see Skill B §2), downgrade its scope, don't kill it outright.
| Failed gate | Action |
|---|---|
| Gate A (directional breakdown) | Learn which side — often simplify to one-side |
| Gate B (mirror symmetry) | Note asymmetry; record as "direction-biased" feature |
| Gate C (OOS time-split) | Kill. No scope-narrowing rescues in-sample overfit. |
| Gate D (cross-asset) | Downgrade to <asset>-specific; keep |
| Gate E (per-year) | Flag bad years as "regime-unfavorable"; explore regime filters |
NGRAM3FU-STRADDLE-001 failed Gate D (XAUUSD, GBPUSD) but passed A/B/C/E. Status downgraded to eur-only, NOT killed. A year later, if XAUUSD develops different microstructure, it could be retested — this is the resurrect_if: trigger (see Skill D).
Principle: scope-narrowing preserves optionality. Hard kills lose negative knowledge.
4. Agent-lens disagreement as signal
When parallel agents DISAGREE, the disagreement itself is diagnostic.
Session example: 4 agents reported "lower rejection at bottom → 67.8% UP" as a signal. Agent 5 (hidden-signal hunter, critic) flagged it as label leakage. The disagreement pointed precisely at the bug.
When agents disagree:
- Don't average or vote — map WHAT they disagree about
- Check: does one agent's evidence involve an implicit assumption the other rejects?
- Disagreement about mechanism → investigate mechanism (may be label leakage, confound, or real but lens-bound effect)
- Disagreement about significance → check each agent's multiple-testing burden
Anti-pattern: picking the agent that gives the answer you want. If the critic-agent disagrees with the proposer-agents, the critic is usually right.
5. Context-budget discipline
Conversation and data context are scarce. Reserve them for the most ambiguous questions; compress known-good findings ruthlessly.
Hierarchy of compression:
- Raw bars (not for agents; 67 MB)
- Token-rendered bar sequences (60 KB; good for one agent)
- Stats tables (60 KB; consumable by 5 parallel agents) — PREFERRED
- Ledger entries (1 KB; tracks findings)
When context feels tight:
- Emit a fresh audit folder with artifacts; future sessions load that, not the transcript
- Drop detailed raw data from agent prompts; use markdown summaries
- If you must hand off mid-session, write a handoff file in
.planning/(not plugin scope; see project root)
Signal: context is BLOCKED when: you find yourself re-reading the same file twice in one session; or agents ask for re-briefings; or you can't remember what was decided 10 turns ago. Compress to an audit folder.
6. Supersede-not-rewrite
When a later finding replaces an earlier one, add a new ledger entry with supersedes: "OLD-ID"; update the old entry with superseded_by: "NEW-ID". Never rewrite or delete.
Why:
- Future auditors need the trail, not the final answer
- A superseded finding may contain negative knowledge (why it failed) that informs future work
- Deletions create "mysterious silences" that agents can't interpret
Canonical chain from session:
NGRAM3FU-STRADDLE-001 preliminary-positive
↓ supplemented by
NGRAM3FU-STRADDLE-001-GATES gates-validated (Gate D failed → eur-only)
↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-HISTORY confirmed at 7.18M bars
↓ supplemented by
NGRAM3FU-STRADDLE-001-FILTERED Phase-L filter validated
↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-STACK Phase-L + Phase-M final
Note supplements vs supersedes: supplement EXTENDS; supersede REPLACES. Pick the right relationship.
Anti-pattern: editing an old ledger entry because the finding "got better". That's rewriting history. Add a new entry.
Confirmation counts
| Pattern | Confirmed | Notes |
|---|---|---|
| 1. physical-constraint pivot | 1 | The session-defining pivot (directional → straddle) |
| 2. artifact promotion | Multiple | Every /tmp → audit folder move |
| 3. gate-failure scopes | 1 | NGRAM3FU-STRADDLE Gate D → eur-only |
| 4. disagreement as signal | 2 | Act-2 label leakage catch; Phase L agent variance |
| 5. context-budget | Implicit | Used every time we preferred stats tables over raw |
| 6. supersede-not-rewrite | 5 | NGRAM3FU-STRADDLE chain, 5 entries |
Post-Execution Reflection
After invoking this skill:
- Did a pattern save you from a bad decision? Increment
confirmedcount; note inreferences/evolution-log.md. - Did a pattern produce the wrong call? Demote it; record context + link to where it misled.
- A new decision pattern emerged that isn't here? Draft a section.
- A pattern could be better-phrased for future agents? Edit the text directly; log why.