skills/terrylica/cc-skills/crucible-meta-governance

crucible-meta-governance

Installation
SKILL.md

Meta-governance — 6 decision-layer patterns

Self-Evolving Skill: If any pattern here misled decisions, update the section AND append to references/evolution-log.md. Don't defer.

These patterns are meta-level — they're about the investigation itself, not its content. Invoke when a decision must be made: pivot vs persist, kill vs narrow, ship vs hold.


1. Physical-constraint-first pivot

When brute force yields null, extract the execution constraint and redesign the hypothesis class to fit it. Don't iterate on a hypothesis that ignores reality.

Session example: 17 directional-signal null campaigns → user pivoted:

"What's the best strategy for a highly random walk market?" "I can only trade on a traditional MT5 broker that allows hedging positions."

From this came the synthetic straddle (BUY_STOP + SELL_STOP pending orders, OCO). Constraint-driven design unlocked the strategy class. The math (diffusive displacement in random walks: E[|ΔS|] > 0) was always available; what was missing was honoring the execution venue.

Ask yourself:

  • What execution venue is the user actually on?
  • What types of orders are possible?
  • What's the realistic slippage / commission / spread?
  • What position-sizing constraints apply?

If the hypothesis doesn't survive these questions, pivot the hypothesis, not the statistics.


2. Incremental artifact promotion (/tmp → repo early)

Move findings from /tmp/ to the persistent repo (audits/YYYY-MM-DD-slug/) as soon as a result survives two independent tests, not "when done".

Session anti-pattern: reproducers written in /tmp/ during exploration, causing reproducibility loss on reboot. The moment a result passed Gate C (OOS) it should have been promoted — not after the 4-gate suite completed.

Promotion triggers (at least one required):

  • Result passed shuffled-null z > 3 AND hasn't been contradicted
  • An agent synthesized a verdict that supersedes an earlier one
  • A reproducer script ran successfully twice

Mechanics:

mkdir -p findings/evolution/audits/$(date +%Y-%m-%d)-slug
cp /tmp/reproducer.py /tmp/artifact.json findings/evolution/audits/.../
# Write CLAUDE.md navigator + verdict.md
# Append to evolution.jsonl

What's impermanent gets lost.


3. Gate-failure scopes not kills

When a signal fails one of the serial gates (see Skill B §2), downgrade its scope, don't kill it outright.

Failed gate Action
Gate A (directional breakdown) Learn which side — often simplify to one-side
Gate B (mirror symmetry) Note asymmetry; record as "direction-biased" feature
Gate C (OOS time-split) Kill. No scope-narrowing rescues in-sample overfit.
Gate D (cross-asset) Downgrade to <asset>-specific; keep
Gate E (per-year) Flag bad years as "regime-unfavorable"; explore regime filters

NGRAM3FU-STRADDLE-001 failed Gate D (XAUUSD, GBPUSD) but passed A/B/C/E. Status downgraded to eur-only, NOT killed. A year later, if XAUUSD develops different microstructure, it could be retested — this is the resurrect_if: trigger (see Skill D).

Principle: scope-narrowing preserves optionality. Hard kills lose negative knowledge.


4. Agent-lens disagreement as signal

When parallel agents DISAGREE, the disagreement itself is diagnostic.

Session example: 4 agents reported "lower rejection at bottom → 67.8% UP" as a signal. Agent 5 (hidden-signal hunter, critic) flagged it as label leakage. The disagreement pointed precisely at the bug.

When agents disagree:

  1. Don't average or vote — map WHAT they disagree about
  2. Check: does one agent's evidence involve an implicit assumption the other rejects?
  3. Disagreement about mechanism → investigate mechanism (may be label leakage, confound, or real but lens-bound effect)
  4. Disagreement about significance → check each agent's multiple-testing burden

Anti-pattern: picking the agent that gives the answer you want. If the critic-agent disagrees with the proposer-agents, the critic is usually right.


5. Context-budget discipline

Conversation and data context are scarce. Reserve them for the most ambiguous questions; compress known-good findings ruthlessly.

Hierarchy of compression:

  • Raw bars (not for agents; 67 MB)
  • Token-rendered bar sequences (60 KB; good for one agent)
  • Stats tables (60 KB; consumable by 5 parallel agents) — PREFERRED
  • Ledger entries (1 KB; tracks findings)

When context feels tight:

  1. Emit a fresh audit folder with artifacts; future sessions load that, not the transcript
  2. Drop detailed raw data from agent prompts; use markdown summaries
  3. If you must hand off mid-session, write a handoff file in .planning/ (not plugin scope; see project root)

Signal: context is BLOCKED when: you find yourself re-reading the same file twice in one session; or agents ask for re-briefings; or you can't remember what was decided 10 turns ago. Compress to an audit folder.


6. Supersede-not-rewrite

When a later finding replaces an earlier one, add a new ledger entry with supersedes: "OLD-ID"; update the old entry with superseded_by: "NEW-ID". Never rewrite or delete.

Why:

  • Future auditors need the trail, not the final answer
  • A superseded finding may contain negative knowledge (why it failed) that informs future work
  • Deletions create "mysterious silences" that agents can't interpret

Canonical chain from session:

NGRAM3FU-STRADDLE-001                     preliminary-positive
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-GATES               gates-validated (Gate D failed → eur-only)
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-HISTORY        confirmed at 7.18M bars
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FILTERED            Phase-L filter validated
  ↓ supplemented by
NGRAM3FU-STRADDLE-001-FULL-STACK          Phase-L + Phase-M final

Note supplements vs supersedes: supplement EXTENDS; supersede REPLACES. Pick the right relationship.

Anti-pattern: editing an old ledger entry because the finding "got better". That's rewriting history. Add a new entry.


Confirmation counts

Pattern Confirmed Notes
1. physical-constraint pivot 1 The session-defining pivot (directional → straddle)
2. artifact promotion Multiple Every /tmp → audit folder move
3. gate-failure scopes 1 NGRAM3FU-STRADDLE Gate D → eur-only
4. disagreement as signal 2 Act-2 label leakage catch; Phase L agent variance
5. context-budget Implicit Used every time we preferred stats tables over raw
6. supersede-not-rewrite 5 NGRAM3FU-STRADDLE chain, 5 entries

Post-Execution Reflection

After invoking this skill:

  1. Did a pattern save you from a bad decision? Increment confirmed count; note in references/evolution-log.md.
  2. Did a pattern produce the wrong call? Demote it; record context + link to where it misled.
  3. A new decision pattern emerged that isn't here? Draft a section.
  4. A pattern could be better-phrased for future agents? Edit the text directly; log why.
Weekly Installs
8
GitHub Stars
37
First Seen
3 days ago