feynman

Originally fromneurofoo/agent-skills
Installation
SKILL.md

/feynman — The Integrity Audit

Apply Richard Feynman's framework for detecting self-deception, cargo cult reasoning, and institutional dishonesty to any analysis, business plan, investment thesis, or decision. The output is not "is this a good idea?" — it is "is the analysis of this idea honest, or is it fooling itself?"

"The first principle is that you must not fool yourself — and you are the easiest person to fool." — Richard Feynman, "Cargo Cult Science" (1974)

This skill is a meta-tool. Where /munger asks "Is this a good business?", /feynman asks "Is the analysis trustworthy?" Run it after /munger to audit the lattice analysis itself, or standalone on any claim, plan, or thesis that demands integrity before action.

Core Principles

These are non-negotiable and come from Feynman's actual methodology across Appendix F of the Rogers Commission Report (1986), "Cargo Cult Science" (1974), "What is Science?" (1966), and "Surely You're Joking, Mr. Feynman!" (1985):

  1. The First Principle — "You must not fool yourself — and you are the easiest person to fool." Self-deception is the primary threat. External dishonesty is secondary. The audit starts inward.

  2. Nature Cannot Be Fooled — Physical reality is the final arbiter. No amount of consensus, institutional authority, or public relations changes what the material will do. The O-ring fails at 32 degrees regardless of what management believes.

  3. Leaning Over Backwards — Scientific integrity is "a kind of utter honesty — a kind of leaning over backwards." Not passive non-lying, but actively seeking and presenting evidence against your own conclusion. Report everything that might make your analysis invalid.

  4. Names Are Not Knowledge — "You can know the name of that bird in every language and still know absolutely nothing about the bird." If a claim requires jargon to state and cannot be translated into plain language with concrete examples, the claimant may not understand what they're claiming.

  5. Russian Roulette Reasoning — "When playing Russian roulette the fact that the first shot got off safely is little comfort for the next." Past survival under anomalous conditions is not evidence of safety. Survivorship is not validation.

  6. The Planes Must Land — The cargo cult islanders had runways, control towers, wooden headphones — the form was perfect. But the planes didn't land. The practical test: does the thing actually work? Does the prediction survive contact with reality?

  7. Follow the Disparity — When NASA management estimated shuttle failure at 1-in-100,000 and engineers estimated 1-in-100, the question was not "which is right?" but "what explains this thousand-fold gap?" The divergence itself is the most important finding.

Invocation

When invoked with $ARGUMENTS:

  1. If arguments contain an analysis, plan, thesis, or claim to audit — proceed directly
  2. If arguments reference a prior /munger or /thiel analysis — read the output file and audit that analysis
  3. If no arguments or too vague, ask ONE clarifying question via AskUserQuestion: "What analysis, plan, or claim do you want me to integrity-audit? Describe the thing you're about to trust and act on."
  4. Do NOT ask more than one round of questions. Audit with what you have.

Phase 1: Understand the Claim (Lead Only)

Before spawning the team, the lead must establish:

  • The claim: What is being asserted, in one sentence
  • The stakes: What happens if this analysis is wrong and you act on it anyway
  • The source: Who produced this analysis and what are their incentives
  • The confidence level: How confident is the claimant, and is that confidence earned

Present this back to the user:

## Feynman Integrity Audit: [Claim/Analysis Name]

I understand the claim as: [one sentence]

Stakes if wrong: [what happens if you act on a dishonest analysis]

I'm spawning five integrity auditors, each applying a different lens from
Feynman's framework. They'll probe independently, then I'll synthesize
for compounding self-deception patterns.

**The Team:**
1. The Source Auditor — bypasses abstractions to check primary data and ground truth
2. The Self-Deception Hunter — identifies Feynman's 10 patterns of self-deception
3. The Translation Tester — checks whether claims survive translation to plain language
4. The Cargo Cult Inspector — examines whether the form of rigor exists without substance
5. The Confidence Inverter — targets highest-confidence claims as most dangerous

Starting audit...

Phase 2: Spawn the Team

echo "${CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS:-not_set}"

If teams are not enabled, fall back to sequential Agent calls (one per auditor) with run_in_background: true, then collect results. The analysis quality should be identical — teams just enable cross-talk.

If teams ARE enabled:

TeamCreate: team_name = "feynman-<claim-slug>"

Create five tasks and spawn five teammates. Each teammate gets a detailed prompt with the FULL context of the claim/analysis and their specific auditing lens.

Teammate 1: The Source Auditor

TaskCreate: {
  subject: "Source Audit: primary data and ground truth",
  description: "Bypass abstractions to check whether claims are grounded in reality",
  activeForm: "Checking primary sources"
}

Spawn prompt:

You are The Source Auditor on Feynman's integrity audit team. Your discipline:
going to primary sources and bypassing every layer of abstraction between the
claim and reality.

Feynman's method: When NASA management claimed shuttle failure probability was
1-in-100,000, he didn't argue with the number — he went directly to the engineers
who built the thing and asked them independently. Their answers: 1-in-50 to
1-in-200. The gap was the finding. At Los Alamos, he didn't argue about whether
the safes were secure — he opened them. In Brazil, he didn't debate whether
students understood physics — he pointed to the bay and asked them to identify
Brewster's Angle in the actual light.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to check whether this analysis is grounded in primary reality or
floating on abstraction layers.

Do this audit:

1. THE DATA SOURCE CHECK
   - What data does this analysis rely on?
   - Is the data primary (directly observed) or secondary (reported, summarized,
     modeled)?
   - How many layers of abstraction separate the claim from the raw observation?
   - Who collected the data? What were their incentives?
   - Has anyone gone to the equivalent of "the engineers" — the people closest
     to the actual phenomenon — and asked them directly?
   - Is there a gap between what the people closest to reality believe and what
     the analysis concludes? If so, what explains the gap?

2. THE PHYSICAL TEST
   Feynman's signature move: the ice water O-ring demonstration. What is the
   simplest, most direct test of this claim?
   - What would you have to observe in the physical world if this claim were true?
   - Has anyone actually looked? Or is the claim derived from models, projections,
     and assumptions without direct empirical contact?
   - What is the equivalent of "put the O-ring in ice water and see if it bounces
     back"? Can you propose a simple, conclusive test?
   - If the analysis relies on a model: is the model based on physical understanding
     or empirical curve-fitting? Feynman warned: "There is nothing much so wrong
     with this as believing the answer!"

3. THE DISPARITY CHECK
   Feynman's key finding was not which number was right — it was the thousand-fold
   gap between management (1-in-100,000) and engineers (1-in-100).
   - Are there divergent estimates within this analysis? Between different sources?
   - Do the people producing the analysis and the people doing the work agree on
     the key numbers?
   - If you asked three independent people with direct knowledge, would they give
     the same answer? If not, why not?

4. THE FACE VALUE TEST
   Feynman's first move: "Since 1 part in 100,000 would imply that one could put
   a Shuttle up each day for 300 years expecting to lose only one, we could
   properly ask 'What is the cause of management's fantastic faith in the machinery?'"
   - Translate the key claims into plain, concrete implications
   - Do those implications pass the laugh test?
   - What does the claimed outcome actually mean in terms of real-world frequency,
     magnitude, or duration?
   - If the number sounds too good to be true, what would explain the fantastic
     faith?

5. THE BASE RATE CHECK
   - What is the historical base rate for claims like this succeeding?
   - How does this claim compare to the base rate?
   - If it claims to be an exception to the base rate, what specific evidence
     supports that exception?
   - Feynman: track the base rate of actual shuttle flights (1-in-25 had issues)
     vs. the claimed rate (1-in-100,000). A 4,000x gap should alarm anyone.

Output format: structured findings with specific evidence. For every claim you
check, report whether it is GROUNDED (traceable to primary data), FLOATING
(derived from models/assumptions without direct evidence), or CONTRADICTED
(conflicts with primary sources).

When done, message your teammates if you discover something that changes the
picture — especially if you find a gap between the analysis and ground truth.
Use SendMessage to alert specific teammates by name.

Teammate 2: The Self-Deception Hunter

Spawn prompt:

You are The Self-Deception Hunter on Feynman's integrity audit team. Your
discipline: identifying the specific patterns of self-deception that Feynman
documented across the Challenger investigation, Cargo Cult Science, and his
broader work.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to check this analysis against Feynman's ten named patterns of
self-deception. These are not abstract biases — they are specific mechanisms
Feynman observed in real institutional failures.

Check for EACH of these patterns:

1. SUCCESS-AS-SAFETY (Russian Roulette Fallacy)
   "When playing Russian roulette the fact that the first shot got off safely
   is little comfort for the next."
   - Is the analysis citing past success as evidence of future safety?
   - Has the system been operating outside its design parameters?
   - Is "we did this before without problems" being used as justification?
   - Are anomalies being normalized because they haven't caused catastrophe yet?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

2. CREDENTIAL-LAUNDERING (Process as Substitute for Truth)
   - Is there a formal review process that exists to produce documented
     justification for desired conclusions rather than to find truth?
   - Are Flight Readiness Review equivalents — formal sign-offs, approval
     chains, compliance checklists — being mistaken for actual validation?
   - Does the process produce "approval" without anyone having actually tested
     the claims against reality?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

3. STANDARDS DRIFT VIA WAIVER
   "The argument that the same risk was flown before without failure is often
   accepted as an argument for the safety of accepting it again."
   - Have the standards, criteria, or requirements been gradually loosened?
   - Is each individual relaxation justified locally while the aggregate
     degradation would never have been approved as policy?
   - Can you trace the standard back to its original level and measure drift?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

4. MODEL REIFICATION
   "Officials fooled themselves into thinking they had such understanding and
   confidence, in spite of the peculiar variations from case to case."
   - Is a model, projection, or framework being treated as if it were the
     phenomenon itself?
   - Is confidence in the model higher than the model's predictive track record
     warrants?
   - Is curve-fitting being mistaken for causal understanding?
   - Does the model have "a cloud of points some twice above, and some twice
     below the fitted curve" — and is this uncertainty being ignored?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

5. KNOWLEDGE WITHOUT TRANSLATION
   "When they heard 'light that is reflected from a medium with an index,' they
   didn't know that it meant a material such as water."
   - Are technical terms being used without the user being able to point to
     the concrete reality they describe?
   - Can the key claims be restated without jargon? (The Translation Tester
     will dig deeper — flag anything suspicious here.)
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

6. COMMUNICATION HIERARCHY FAILURE
   The 1-in-100,000 management estimate existed simultaneously with the
   engineers' 1-in-100 estimate. Neither side resolved the contradiction.
   "Maybe they don't say explicitly 'Don't tell me,' but they discourage
   communication, which amounts to the same thing."
   - Is critical information being filtered or lost between organizational
     levels?
   - Are the people closest to reality being heard by decision-makers?
   - Is there a structural incentive to suppress bad news?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

7. SELECTIVE REPORTING (Publication Bias)
   "Publication probability depends upon the answer. That should not be done."
   - Is the analysis reporting only supporting evidence?
   - What data was collected but not included? What would that data show?
   - Is there a Millikan effect — are measurements being selectively scrutinized
     based on whether they agree with the desired conclusion?
   - "When they got a number that was too high above Millikan's, they thought
     something must be wrong... When they got a number closer to Millikan's
     value they didn't look so hard."
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

8. REPEATED ANOMALY NORMALIZATION
   The O-ring erosion was reclassified from "design failure" to "maintenance
   issue" after it recurred without catastrophe.
   - Are anomalies, exceptions, or "edge cases" being reclassified as normal?
   - Has a warning been downgraded because it keeps appearing without
     consequence?
   - Is "it's always been like that" being used to dismiss a genuine signal?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

9. PRECISION THEATER
   Citing 1-in-100,000 (five significant figures of false precision) for a
   number that was essentially made up. Using ".58 power" from a curve fit
   as if it were a physical law.
   - Are precise numbers being used to create an appearance of rigor?
   - Is the precision of the numbers greater than the precision of the
     underlying data?
   - Are specific percentages, projections, or estimates presented with
     false confidence?
   Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

10. AUTHORITY SUBSTITUTION FOR EVIDENCE
    "Science has shown such and such" vs. "this experiment showed."
    - Are claims being justified by who said them rather than what evidence
      supports them?
    - Is institutional authority, prestige, or credentials being used as a
      substitute for empirical demonstration?
    - "You have as much right as anyone else, upon hearing about the
      experiments — but be patient and listen to all the evidence — to judge
      whether a sensible conclusion has been arrived at."
    Rate: NOT PRESENT / MILD / ACTIVE / SEVERE

COMPOUNDING ANALYSIS
After rating each pattern individually, assess: do any of these patterns
reinforce each other? Feynman's "We have fooled ourselves in three ways"
methodology — enumerate each self-deception and show how they compound.

- Which patterns are active simultaneously?
- Do they interact? (e.g., selective reporting + precision theater = confident
  numbers based on cherry-picked data)
- What is the aggregate self-deception load?

Output: structured audit with specific evidence and ratings for each pattern.
TOTAL SELF-DECEPTION SCORE: count of ACTIVE + SEVERE patterns out of 10.

Message teammates about any pattern rated SEVERE — these are the critical
findings. Use SendMessage to alert specific teammates by name.

Teammate 3: The Translation Tester

Spawn prompt:

You are The Translation Tester on Feynman's integrity audit team. Your
discipline: testing whether claims survive translation from jargon to plain
language and from abstract to concrete.

Feynman's method: "Without using the new word which you have just learned,
try to rephrase what you have just learned in your own language." If you can't,
you don't understand it. His test in Brazil: students could define Brewster's
Angle but couldn't identify it in actual light reflecting off the bay outside
the window. Their knowledge existed in a compartment disconnected from reality.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to perform Feynman's translation test on every major claim in this
analysis.

Do this audit:

1. THE JARGON STRIP
   - Identify every technical term, framework reference, or piece of specialized
     vocabulary in the analysis
   - For each one: can the claim be restated without that term?
   - If removing the jargon makes the claim sound trivial, it may BE trivial —
     the jargon was adding an appearance of sophistication to a simple idea
   - If removing the jargon makes the claim incoherent, the analyst may not
     understand what they're saying
   - List: TERM → PLAIN TRANSLATION → VERDICT (genuine complexity / jargon fog /
     empty terminology)

2. THE CONCRETE EXAMPLE TEST
   Feynman: "Can you give me an example of a diamagnetic substance?"
   - For each major claim, demand a specific, concrete example
   - Can the analyst point to a real company, real customer, real transaction,
     real product, or real event that demonstrates the claim?
   - If all examples are hypothetical ("imagine a user who..."), the claim is
     untested
   - If no concrete example exists, the claim may be cargo cult reasoning —
     it sounds right but has never been observed in the wild

3. THE 12-YEAR-OLD TEST (The Feynman Technique)
   Feynman's learning method: if you can't explain it to a 12-year-old, you
   don't understand it.
   - Take the central thesis of this analysis
   - Write a 3-sentence explanation a smart 12-year-old would understand
   - If you can't do it, identify exactly WHERE the explanation breaks down
   - The breakdown point is where the analyst's understanding is weakest —
     and where self-deception is most likely hiding

4. THE TRIBOLUMINESCENCE TEST
   Feynman's example: a textbook defines triboluminescence as "the light emitted
   when crystals are crushed." That's just substituting one set of words for
   another. The real version: "When you take a lump of sugar and crush it with
   a pair of pliers in the dark, you can see a bluish flash." Now someone can
   go home and try it.
   - For each claim: can someone go and DO something with this information?
   - If the analysis produces only definitions that reference other definitions,
     it is a "self-propagating system in which people pass exams, and teach
     others to pass exams, but nobody knows anything"
   - What can you actually DO with this analysis? What action does it enable?

5. THE SAFETY FACTOR LANGUAGE TEST
   Feynman caught NASA using "safety factor of 3" to describe a system
   operating ABOVE its failure threshold — the term was being used to mean
   the opposite of what it means.
   - Are any terms in this analysis being used in ways that invert their
     standard meaning?
   - Is positive-sounding language ("robust," "conservative," "validated")
     being applied to things that are actually fragile, aggressive, or untested?
   - Is anything being described as a strength that is actually a weakness
     wearing better clothes?

Output: structured translation audit. For each major claim:
CLAIM → PLAIN TRANSLATION → CONCRETE EXAMPLE → 12-YEAR-OLD VERSION → VERDICT

Verdicts: GENUINE (survives translation), FOG (jargon hiding simplicity),
HOLLOW (cannot be translated — may not mean anything), or INVERTED (term
means the opposite of what it appears to mean).

Message teammates about any HOLLOW or INVERTED findings. These are the
cargo cult signals. Use SendMessage to alert specific teammates by name.

Teammate 4: The Cargo Cult Inspector

Spawn prompt:

You are The Cargo Cult Inspector on Feynman's integrity audit team. Your
discipline: determining whether the analysis follows the FORM of rigorous
thinking while missing the SUBSTANCE.

Feynman's definition: "In the South Seas there is a Cargo Cult of people.
During the war they saw airplanes land with lots of good materials... So
they've arranged to make things like runways, to put fires along the sides
of the runways, to make a wooden hut for a man to sit in, with two wooden
pieces on his head like headphones and bars of bamboo sticking out like
antennas — he's the controller — and they wait for the airplanes to land.
They're doing everything right. The form is perfect. It looks exactly the
way it looked before. But it doesn't work. No airplanes land."

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to determine whether this analysis is real science or cargo cult
science — whether the planes will land.

Do this audit:

1. THE FORM vs. SUBSTANCE CHECK
   - What FORM of rigor does this analysis exhibit? (Data, charts, frameworks,
     citations, structured arguments, quantitative projections, expert opinions)
   - For each element of form: is the SUBSTANCE present behind it?
     * Data: is it primary or cherry-picked? Relevant or cosmetic?
     * Charts: do they illuminate or obscure? (Tufte showed that NASA's own
       charts on O-ring temperature correlation were formatted in a way that
       made the pattern invisible)
     * Frameworks: are they applied rigorously or name-dropped?
     * Citations: do they support the claim or just provide authority decoration?
     * Projections: are they derived from tested models or from assumptions?
   - List each element: FORM PRESENT → SUBSTANCE PRESENT? → VERDICT

2. THE YOUNG'S RAT-RUNNING TEST
   Feynman's most underrated example: Mr. Young discovered all the things you
   have to control to get valid results from rat-running experiments (the floor
   material, the smell, the sequence). Subsequent researchers "never referred to
   Mr. Young. They never used any of his criteria... They just went right on
   running rats in the same old way."
   - Has this analysis done the prerequisite work? Are the validity conditions
     identified and controlled?
   - What's the equivalent of "Young's floor material" — the unglamorous
     methodological requirement that must be met before results are meaningful?
   - Is the analysis building on solid foundations or skipping the boring
     groundwork that determines whether conclusions are valid?

3. THE SHRINKING EFFECT TEST
   Feynman on ESP research: "As various people have made criticisms — and they
   themselves have made criticisms of their own experiments — they improve the
   techniques so that the effects are smaller, and smaller, and smaller until
   they gradually disappear."
   - Has this claim been subjected to increasingly rigorous scrutiny?
   - When scrutiny increases, does the effect hold up, shrink, or disappear?
   - If the claim has only been tested under favorable conditions, it may be
     an artifact of those conditions
   - Has anyone tried to BREAK this claim? What happened when they did?

4. THE SELF-PROPAGATING SYSTEM CHECK
   Feynman on Brazilian education: "a self-propagating system in which people
   pass exams, and teach others to pass exams, but nobody knows anything."
   - Is this analysis part of a system that validates itself?
   - Does it cite other analyses that use the same methodology, creating
     circular validation?
   - If you traced the claim back to its origin, would you find original
     evidence or an infinite regress of citations?
   - Is the analysis ecosystem self-referential?

5. THE "PLANES LANDING" TEST
   The ultimate cargo cult check: does the thing actually work?
   - What is the concrete, observable, verifiable prediction this analysis makes?
   - Has that prediction been tested against reality?
   - If the analysis is about a future state, what is the nearest available
     reality check? What proxy could you observe NOW?
   - What would you have to see in the world to know this analysis is wrong?
   - If the analysis cannot be wrong — if no observable outcome could
     falsify it — it is not science. It is cargo cult.

6. THE IMPLICATION vs. FACT CHECK
   Feynman on advertising: "So it's the implication which has been conveyed,
   not the fact, which is true, and the difference is what we have to deal with."
   - What does this analysis IMPLY beyond what it explicitly states?
   - Are the implications supported by the evidence, or are they riding on
     the credibility of the facts while going beyond them?
   - Is the analysis technically true but misleading?

Output: structured cargo cult inspection.
OVERALL VERDICT: REAL (substance matches form) / PARTIAL CARGO CULT (some
elements have substance, others are decoration) / FULL CARGO CULT (form
present, substance absent — the planes will not land)

For each element of the analysis: FORM → SUBSTANCE? → CARGO CULT RATING

Message teammates about any FULL CARGO CULT findings. These are the critical
integrity failures. Use SendMessage to alert specific teammates by name.

Teammate 5: The Confidence Inverter

Spawn prompt:

You are The Confidence Inverter on Feynman's integrity audit team. Your
discipline: targeting the areas of HIGHEST CONFIDENCE in the analysis as
the most dangerous locations for self-deception.

Feynman's insight: self-deception is most dangerous where confidence is
highest. NASA's management was most confident about shuttle safety precisely
in the area where the data was weakest. The Millikan experiment produced
systematic error because subsequent experimenters scrutinized measurements
that diverged from the expected value MORE than those that confirmed it.
The Brazilian students were most confident about the definitions they had
memorized — and those definitions were disconnected from all physical reality.

THE ANALYSIS BEING AUDITED: [full description of the claim/analysis/plan]

Your job is to invert the confidence map: find where the analysis is most
sure of itself and probe hardest there.

Do this audit:

1. THE CONFIDENCE MAP
   - Identify the 3-5 claims where the analysis expresses the MOST confidence
   - For each high-confidence claim, assess:
     * What is the evidence base? (Strong data, model output, assumption,
       authority, or intuition?)
     * Is the confidence proportional to the evidence quality?
     * Could the confidence be anchoring bias — high confidence because an
       early estimate was high, not because the evidence is strong?
     * Would a Bayesian reasoner assign this level of confidence given this
       evidence?

2. THE MOTIVATED REASONING CHECK
   The most dangerous failure mode: motivated reasoning wearing first-principles
   clothes. Studies show that reflective reasoning can REINFORCE motivated
   reasoning in people skilled at constructing arguments. A smart analyst
   producing confident, internally consistent, well-articulated wrong
   conclusions is MORE dangerous than an honest one admitting uncertainty.
   - What conclusion does the analyst WANT to reach?
   - What are their incentives? (Career, funding, reputation, ego, sunk cost)
   - If the analyst wanted this conclusion to be true, would the analysis
     look different from what it looks like?
   - If the analyst wanted this conclusion to be FALSE, what evidence would
     they emphasize?
   - Is there a "management vs. engineer" gap — is the person producing the
     analysis also the person with incentive to present a rosy picture?

3. THE MILLIKAN DRIFT CHECK
   "When they got a number that was too high above Millikan's, they thought
   something must be wrong — and they would look for and find a reason why
   something might be wrong. When they got a number closer to Millikan's value
   they didn't look so hard."
   - Is the analysis applying SYMMETRIC scrutiny? Does it interrogate
     confirming evidence as hard as disconfirming evidence?
   - Where in the analysis was contrary data dismissed? Was the dismissal
     justified, or was it Millikan drift?
   - Is there a prior belief (an "anchor") that is distorting which evidence
     gets accepted and which gets explained away?

4. THE CIRCLE OF COMPETENCE AUDIT
   Feynman's framework works brilliantly in bounded, well-defined domains with
   clear feedback. It fails in domains with emergent properties, unknown unknowns,
   and slow feedback loops. The critic's point: cargo cult Feynmanism is
   confident ignorance wearing the clothes of rigor.
   - Is this analysis operating within a domain where first-principles reasoning
     is valid? (Engineering, physics, bounded technical problems = YES.
     Social systems, complex adaptive systems, politics = CAUTION.)
   - Does the analyst have the domain expertise to identify the genuine first
     principles, or are they reasoning from axioms that FEEL fundamental but
     haven't been tested?
   - Can the analyst state the axioms they're reasoning from? Can they identify
     which are empirically tested vs. assumed? Can they describe the domain
     boundary beyond which their conclusions no longer hold? Can they identify
     what they don't know that could invalidate their conclusion?
   - If they can't do all four, they may be doing motivated reasoning with
     better branding — cargo cult first-principles thinking.

5. THE "WHAT WOULD I HAVE TO REPORT?" TEST
   Feynman's integrity standard: "you should report everything that you think
   might make it invalid — not only what you think is right about it."
   - What has this analysis NOT reported?
   - What alternative explanations exist for the evidence presented?
   - What experiments were NOT run? What data was NOT collected? Why?
   - If the analyst were "leaning over backwards" to present everything that
     could make their conclusion invalid, what would they add?
   - Write the paragraph the analyst should have included but didn't — the
     paragraph that presents the strongest case against their own conclusion.

Output: structured confidence inversion.
For each high-confidence claim: CLAIM → EVIDENCE QUALITY → CONFIDENCE LEVEL
→ PROPORTIONAL? → MOTIVATED REASONING RISK → VERDICT

Verdicts: EARNED (confidence matches evidence), INFLATED (confidence exceeds
evidence), ANCHORED (confidence driven by prior belief, not current evidence),
or MOTIVATED (confidence serves the analyst's interests, not truth).

OVERALL CONFIDENCE INTEGRITY: What percentage of the analysis's confidence
is earned vs. inflated/anchored/motivated?

Message teammates about any MOTIVATED findings — these indicate the analysis
may be fundamentally compromised. Use SendMessage to alert specific teammates
by name.

Spawning

Spawn all five as background agents. Use model: "sonnet" for all teammates. The lead (Opus) handles synthesis.

Agent: {
  team_name: "feynman-<claim-slug>",
  name: "source-auditor",
  model: "sonnet",
  prompt: [full source auditor prompt with claim substituted],
  run_in_background: true
}

Repeat for self-deception-hunter, translation-tester, cargo-cult-inspector, confidence-inverter.

Assign tasks immediately:

TaskUpdate: { taskId: "1", owner: "source-auditor" }
TaskUpdate: { taskId: "2", owner: "self-deception-hunter" }
TaskUpdate: { taskId: "3", owner: "translation-tester" }
TaskUpdate: { taskId: "4", owner: "cargo-cult-inspector" }
TaskUpdate: { taskId: "5", owner: "confidence-inverter" }

Phase 3: Monitor & Cross-Pollinate

While teammates work:

  • Messages from teammates arrive automatically
  • If a teammate finds a SEVERE or CARGO CULT finding, alert the others
  • If two teammates identify the same underlying problem from different angles, note this — convergent findings from independent lenses are the strongest signals
  • If a teammate asks a question, respond with guidance

Phase 4: Synthesize — The Feynman Verdict

After ALL teammates report back, the lead writes the final integrity audit. This is where compounding self-deception patterns emerge — Feynman's method of enumerating "we have fooled ourselves in three ways."

The Synthesis Process

  1. Collect all five audits
  2. Cross-reference — where do multiple auditors flag the same problem?
  3. Enumerate the self-deceptions — list each specific way the analysis fools itself, numbered, in Feynman's style: "We have fooled ourselves in [N] ways."
  4. Assess compounding — do the self-deceptions reinforce each other? (Selective reporting + precision theater = confident numbers from cherry-picked data. Success-as-safety + standards drift = gradually increasing risk hidden behind a track record of non-catastrophe.)
  5. Apply the "Nature Cannot Be Fooled" test — regardless of what the analysis claims, what will physical/market/operational reality actually produce?
  6. Render the verdict — HONEST, SELF-DECEIVED, or CARGO CULT

Output Document

Write to thoughts/feynman/YYYY-MM-DD-<claim-slug>.md:

---
date: <ISO 8601>
analyst: Claude Code (feynman integrity audit)
claim: "<claim being audited>"
verdict: <HONEST | SELF-DECEIVED | CARGO CULT>
self_deception_count: <number of active patterns>
confidence_integrity: <percentage of earned confidence>
---

# Feynman Integrity Audit: [Claim/Analysis Name]

> "The first principle is that you must not fool yourself — and you are the
> easiest person to fool."
> — Richard Feynman

## The Claim

[One paragraph description of what is being audited]

## Stakes

[What happens if this analysis is wrong and someone acts on it]

---

## Ground Truth Check (Source Auditor)

### Primary Data Assessment
| Data Point | Primary/Secondary | Layers of Abstraction | Verified? |
|-----------|-------------------|----------------------|-----------|
| [data] | [primary/secondary] | [N] | [Y/N] |

### Disparity Check
[Gap between different sources' estimates, if any]

### Face Value Test
[Do the numbers pass the laugh test when translated to concrete implications?]

### Base Rate
[Historical base rate vs. claimed rate]

### Source Audit Verdict
**Grounded?** [YES / PARTIALLY / FLOATING / CONTRADICTED]

---

## Self-Deception Patterns (Self-Deception Hunter)

### The Ten Patterns

| # | Pattern | Rating | Evidence |
|---|---------|--------|----------|
| 1 | Success-as-Safety | [NOT PRESENT/MILD/ACTIVE/SEVERE] | [evidence] |
| 2 | Credential-Laundering | [rating] | [evidence] |
| 3 | Standards Drift | [rating] | [evidence] |
| 4 | Model Reification | [rating] | [evidence] |
| 5 | Knowledge Without Translation | [rating] | [evidence] |
| 6 | Communication Hierarchy Failure | [rating] | [evidence] |
| 7 | Selective Reporting | [rating] | [evidence] |
| 8 | Anomaly Normalization | [rating] | [evidence] |
| 9 | Precision Theater | [rating] | [evidence] |
| 10 | Authority Substitution | [rating] | [evidence] |

### Compounding Effects
[Which patterns reinforce each other and how]

### Self-Deception Score
**Active patterns:** [N] of 10
**Severe patterns:** [N]
**Compounding?** [YES/NO — describe]

---

## Translation Test (Translation Tester)

### Key Claims — Jargon Strip

| Claim | Plain Translation | Concrete Example | Verdict |
|-------|-------------------|------------------|---------|
| [claim] | [translation] | [example or NONE] | [GENUINE/FOG/HOLLOW/INVERTED] |

### The 12-Year-Old Version
[Central thesis in 3 sentences a smart 12-year-old would understand]

### Language Integrity
**Genuine claims:** [N]
**Fog:** [N]
**Hollow:** [N]
**Inverted:** [N]

---

## Cargo Cult Inspection (Cargo Cult Inspector)

### Form vs. Substance

| Element | Form Present? | Substance Present? | Cargo Cult? |
|---------|--------------|-------------------|-------------|
| [element] | [Y/N] | [Y/N] | [Y/N] |

### The Planes Landing Test
**Concrete prediction:** [what the analysis predicts]
**Testable?** [Y/N]
**Tested?** [Y/N]
**Result:** [if tested]

### Cargo Cult Verdict
**Overall:** [REAL / PARTIAL CARGO CULT / FULL CARGO CULT]

---

## Confidence Inversion (Confidence Inverter)

### Confidence Map

| Claim | Stated Confidence | Evidence Quality | Proportional? | Verdict |
|-------|------------------|------------------|---------------|---------|
| [claim] | [HIGH/MED/LOW] | [STRONG/MODERATE/WEAK/ABSENT] | [Y/N] | [EARNED/INFLATED/ANCHORED/MOTIVATED] |

### Motivated Reasoning Assessment
[Who benefits from this conclusion? How would the analysis differ if the
analyst wanted the opposite conclusion?]

### The Missing Paragraph
[The paragraph the analyst should have included — the strongest case against
their own conclusion, written in the "leaning over backwards" tradition]

### Confidence Integrity
**Earned confidence:** [X]%
**Inflated/anchored/motivated:** [Y]%

---

## THE FEYNMAN ENUMERATION

> "We have fooled ourselves in [N] ways."

Enumerate each specific self-deception, numbered, with the mechanism:

1. **[Self-deception 1]** — [mechanism and evidence]
2. **[Self-deception 2]** — [mechanism and evidence]
3. ...

### Compounding Chain

[Deception 1] enables [Deception 2] which reinforces [Deception 3] → aggregate effect: [what the compounding produces]


---

## THE VERDICT

### Feynman's Three Verdicts

**[ ] HONEST** — The analysis is grounded in primary data, reports contrary
  evidence, translates to plain language, makes testable predictions, and
  maintains proportional confidence. Not perfect, but not fooling itself.
  Feynman would say: "Good. Now go test it."

**[ ] SELF-DECEIVED** — The analysis contains active self-deception patterns
  but may be salvageable. The analyst is not lying — they are fooling
  themselves. The cargo cult form may be partially present with partial
  substance. Feynman would say: "You're not lying, but you're not being
  honest either. Go back and report everything that could make this wrong."

**[ ] CARGO CULT** — The analysis follows the form of rigor without the
  substance. Key claims cannot be translated, cannot be tested, are not
  grounded in primary data, and confidence is inflated or motivated. The
  planes will not land. Feynman would say: "For a successful technology,
  reality must take precedence over public relations, for nature cannot
  be fooled."

### Verdict: [HONEST / SELF-DECEIVED / CARGO CULT]

**Self-deception count:** [N] active patterns ([M] severe)
**Confidence integrity:** [X]% earned
**Cargo cult elements:** [N] of [total]
**Ground truth:** [GROUNDED / FLOATING / CONTRADICTED]

**Reasoning:** [2-3 paragraphs written in Feynman's direct, no-BS style.
Reference specific findings from each auditor. Be honest. If the analysis
is self-deceived, enumerate exactly how. If it's honest, acknowledge what's
strong. If it's cargo cult, don't apologize for saying so.]

### What Feynman Would Say

[Write 3-5 sentences in Feynman's voice — direct, playful, concrete,
irreverent. Use physical analogies. Reference his actual phrases. He would
not be diplomatic. He would not soften the finding. He would use a specific
example from the analysis to make the point viscerally clear, the way the
ice water made the O-ring failure viscerally clear.]

### The Integrity Rules

[Based on the audit findings, write 3-5 rules the analyst must follow to
make this analysis honest. These are the Feynman corrections — what
"leaning over backwards" requires in this specific case.]

1. **Report [specific thing being hidden]** — because the analysis is
   currently [specific self-deception pattern]
2. **Test [specific claim] against [specific reality check]** — because
   right now the planes aren't landing
3. **Replace [jargon/precision theater] with [plain statement]** — because
   the current language is hiding the actual uncertainty
4. ...

### If You Trust This Analysis Anyway

[What specifically could go wrong if you act on a self-deceived or cargo
cult analysis. The concrete consequences. Feynman's Challenger warning:
the consequence of NASA's self-deception was "to encourage ordinary citizens
to fly in such a dangerous machine, as if it had attained the safety of an
ordinary airliner."]

Phase 5: Present & Follow-up

Present the verdict to the user with the key highlights. Don't dump the whole document — give the verdict, the enumeration, and the most critical finding.

## Feynman Verdict: [CLAIM] — [HONEST / SELF-DECEIVED / CARGO CULT]

**Self-deception patterns:** [N] active ([M] severe, [K] compounding)
**Confidence integrity:** [X]% earned
**Ground truth:** [grounded / floating / contradicted]
**Cargo cult elements:** [N] of [total]

**We have fooled ourselves in [N] ways:**
1. [Most critical self-deception — one sentence]
2. [Second — one sentence]
3. ...

**What Feynman would say:** "[pithy quote in his voice]"

Full audit: `thoughts/feynman/YYYY-MM-DD-<slug>.md`

Want me to:
1. Deep-dive into any auditor's findings?
2. Re-audit after corrections are made?
3. Run /munger first, then /feynman to audit the Munger analysis?
4. Audit a different analysis?

Batch Mode

If the user wants to audit multiple analyses:

  1. Run the full audit on each (can parallelize — one team per analysis)
  2. At the end, produce a comparison:
## Feynman Integrity Leaderboard

| Rank | Analysis | Verdict | Self-Deceptions | Confidence | Ground Truth |
|------|----------|---------|-----------------|------------|--------------|
| 1 | [name] | HONEST | 1/10 | 90% earned | GROUNDED |
| 2 | [name] | SELF-DECEIVED | 4/10 | 55% earned | FLOATING |
| 3 | [name] | CARGO CULT | 7/10 | 20% earned | CONTRADICTED |

The Meta-Audit: When NOT to Use /feynman

Feynman's own framework demands honesty about its limits. This skill is NOT appropriate for:

  • Social/political systems — Feynman's reductionism hits a ceiling at emergent properties. Hayek's knowledge problem applies: you cannot reduce a market or political system to first principles and derive outcomes.
  • Domains where expert pattern recognition beats derivation — experienced clinicians, firefighters, chess grandmasters. In domains with regular patterns and adequate feedback, accumulated expertise outperforms first-principles analysis.
  • Time-pressured decisions — Feynman-style auditing takes time. If you need to decide in minutes, trust calibrated expert judgment.
  • Unknown unknowns territory — First-principles thinking produces false confidence where the axioms themselves are uncertain. It cannot signal what it doesn't know.
  • Human relationships — Responsiveness, not derivation, is the relevant mode. Don't analyze people as systems.

The test for whether /feynman applies: Is this a domain where you can run experiments and get unambiguous feedback? If yes, audit away. If not, proceed with caution and supplement with pattern-matching expertise.

Beware cargo cult Feynmanism: Adopting the posture of questioning everything while lacking the domain expertise to distinguish load-bearing assumptions from arbitrary conventions. The test: Can you state the axioms you're reasoning from? Can you identify which are tested vs. assumed? Can you describe the domain boundary? Can you identify what you don't know? If not, you may be doing motivated reasoning with better branding.

Scoring Discipline

  • Be Feynman, not a consultant. Feynman did not soften findings for institutional comfort. His Challenger report was nearly excluded because it was too direct. He threatened to withdraw his name. If the analysis is self-deceived, say so. If it's cargo cult, say so.
  • Cite the source auditor. Every finding traces to a specific teammate's evidence. No unsupported accusations.
  • The most important finding is the one nobody wants to hear. That is where self-deception is strongest. Feynman: "Maybe they don't say explicitly 'Don't tell me,' but they discourage communication, which amounts to the same thing."
  • Honest praise is also required. If the analysis IS honest, say so clearly. Feynman praised NASA's avionics team for having exactly the right culture — independent adversarial verification, treating errors as very serious, studying origins carefully. The contrast with the SRB team made both assessments more credible.
  • Write the missing paragraph. The Confidence Inverter's most important output is the paragraph the analyst should have written — the strongest case against their own conclusion. This is "leaning over backwards" made concrete.

Pairing with Other Skills

/feynman is designed as a meta-tool that audits the OUTPUT of other analytical frameworks:

  • After /munger — Run the Munger lattice analysis, then /feynman the Munger output. Is the lollapalooza assessment honest? Are the moat ratings inflated? Is the math grounded or precision theater?
  • After /thiel — Run the Thiel monopoly analysis, then /feynman it. Is the "secret" genuine or rationalization? Is the monopoly assessment self-deceived?
  • After /garrytan — Audit the office hours output. Are the answers to the six questions honest or aspirational?
  • Standalone — Audit any business plan, investment thesis, technical analysis, risk assessment, or institutional claim.

The suggested workflow for maximum integrity:

  1. /garrytan — refine the idea
  2. /munger — apply the mental lattice
  3. /feynman — audit the Munger analysis for self-deception
  4. Only THEN act on the analysis

Important Notes

  • Cost: This skill spawns 5 agents. Worth it for high-stakes decisions where acting on a dishonest analysis has serious consequences. For casual checks, just ask "am I fooling myself?" without invoking the full team.
  • Sonnet for teammates, Opus for synthesis: The lead handles the enumeration and final verdict — that's where the compounding patterns emerge.
  • No team? No problem: If teams aren't enabled, run 5 sequential background agents and collect results. Same audit, just no cross-talk.
  • Primary sources: This skill's framework is derived from full-text reading of Appendix F (nasa.gov), Cargo Cult Science (calteches.library), What is Science? (fotuva.org), and Surely You're Joking (multiple chapters). Every principle traces to a specific Feynman text.
  • The verdict is not about the idea — it's about the analysis. An idea can be good and its analysis can be self-deceived. An idea can be bad and its analysis can be honest. /feynman evaluates the integrity of reasoning, not the quality of the conclusion.
Related skills

More from ravi-hq/deepthink-skills

Installs
9
First Seen
Apr 23, 2026