probabilistic-thinking
Probabilistic & Bayesian Thinking
Core principle: Probabilistic thinking replaces vague confidence with calibrated estimates. Bayesian thinking updates those estimates as evidence arrives — neither clinging to priors nor overreacting to new data.
Core Concepts
Probability as Degree of Belief
"Will probably work" → 60%? 90%? Forcing a number exposes vague confidence and creates a baseline for updating.
Base Rates
Find the base rate before estimating a specific event — how often does this event type occur in a reference class?
"Will this feature succeed?" → What % of similar features in similar products succeeded?
Ignoring base rates (base rate fallacy) is a top reasoning error.
Bayesian Updating
Update proportionally — not by ignoring priors, not by overwriting them.
New Belief = Prior Belief × Weight of New Evidence
- Prior: belief before evidence
- Likelihood: P(evidence | hypothesis true) vs. false
- Posterior: belief after evidence
Expected Value
EV = Probability × Value
A 10% chance of +€100 (EV = €10) beats a 90% chance of +€5 (EV = €4.50).
Confidence Intervals
Point estimates are usually wrong. Ranges are honest.
- "4 weeks" → "3–7 weeks (80% confidence)"
- Wide intervals on uncertain things = calibration, not weakness.
Output Format
Probability Estimates
| Claim | Prior | Evidence | Updated | Confidence |
|---|---|---|---|---|
| "Feature will succeed" | 30% (base rate) | Strong user signal | 55% | Medium |
| "Will ship on time" | 40% (historical) | Experienced team | 50% | Low |
Base Rate Check
- Reference class for this situation?
- Historical base rate for this outcome?
- How does this case differ from base rate (and does that justify adjustment)?
Bayesian Update
- Prior: belief before
- New evidence: what we now know
- Likelihood ratio: more consistent with hypothesis true or false?
- Posterior: belief now
- Update size: did evidence move the needle? (Strong evidence → large; weak → small.)
Expected Value Comparison
| Option | Probability | Value if succeeds | Value if fails | EV |
|---|---|---|---|---|
| A | 70% | +€50k | -€10k | +€32k |
| B | 30% | +€200k | -€20k | +€46k |
Confidence Ranges
- Optimistic (10th pct): [value]
- Expected (50th pct): [value]
- Pessimistic (90th pct): [value]
- Black swan: [tail scenario]
Probability Hygiene Flags
- Probabilities treated as certainties (0%/100%)? Almost nothing is certain.
- Base rate ignored for the specific case?
- Overreaction to latest evidence (anchoring)?
- Conjunction fallacy? (P(A and B) < P(A) — more specific = lower probability)
Calibration Heuristics
Fermi Estimation — break unknowns into estimable parts:
- "How many users?" → market size × awareness % × conversion % × retention %
Reference Class Forecasting — historical data from similar projects:
- "This feature type took 4–8 weeks for 80% of teams in our class"
Outside View vs. Inside View:
- Inside: "We're special, we'll beat the average"
- Outside: "What does the data say for projects like this?"
- Default outside. Adjust only with specific, strong evidence.
Pre-commit to what would change your mind:
- "If we see X, I'll move probability from 60% to below 30%"
- Prevents post-hoc rationalization.
Thinking Triggers
- "What's the base rate?"
- "Are we treating 70% like certainty?"
- "What's the EV of each option, not just the upside?"
- "How much should this evidence actually move our belief?"
- "What would change our mind significantly?"
- "Are we in the reference class we think we're in?"
- "What's the downside, and are we weighting it correctly?"
Example Applications
- "Should we build this?" → % of similar features that drove retention? Cost if it fails?
- "A/B test showed a lift" → Sample size sufficient? Prior for this change type?
- "We'll ship in 2 weeks" → Historical distribution? 80th percentile?
- "Agent failed once — bug?" → Base rate of one-off failures? Evidence that would confirm systematic?
More from andurilcode/skills
causal-inference
Apply causal inference whenever the user is interpreting metrics, debugging system behavior, reading A/B test results, or trying to understand whether an observed change was caused by an action or by something else. Triggers on phrases like "X caused Y", "since we deployed this, metrics changed", "the A/B test showed a lift", "why did this metric move?", "is this correlation or causation?", "we changed X and Y improved", "how do we know this worked?", "the data shows…", or any situation where conclusions are being drawn from observational data. Also trigger before any decision based on metric interpretation — confusing correlation with causation leads to interventions that don't work and misattribution of credit. Never assume causation without applying this skill.
30cognitive-bias-detection
Apply cognitive bias detection whenever the user (or Claude itself) is making an evaluation, recommendation, or decision that could be silently distorted by systematic thinking errors. Triggers on phrases like "I'm pretty sure", "obviously", "everyone agrees", "we already invested so much", "this has always worked", "just one more try", "I knew it", "the data confirms what we thought", "we can't go back now", or when analysis feels suspiciously aligned with what someone wanted to hear. Also trigger proactively when evaluating high-stakes decisions, plans with significant sunk costs, or conclusions that conveniently support the evaluator's existing position. The goal is not to paralyze — it's to flag where reasoning may be compromised so it can be corrected.
24inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
23analogical-thinking
Apply analogical thinking whenever the user is designing a system, architecture, or process and would benefit from structural patterns that already exist in other domains — or when a problem feels novel but may have been solved elsewhere under a different name. Triggers on phrases like "how should we structure this?", "has anyone solved this before?", "we're designing from scratch", "what's a good model for this?", "I keep feeling like this resembles something", "what patterns apply here?", or when facing architecture, organizational, or process design decisions. Also trigger when a problem has been analyzed thoroughly but no good solution has emerged — the answer may exist in an adjacent domain. Don't reinvent what's been solved. Recognize the shape of the problem first.
22first-principles-thinking
Apply first principles thinking whenever the user is questioning whether a design, strategy, or solution is fundamentally right — not just well-executed. Triggers on phrases like "are we solving the right problem?", "why do we do it this way?", "is this the best approach?", "everyone does X but should we?", "we've always done it this way", "challenge our assumptions", "start from scratch", "is there a better way?", or when the user seems to be iterating on a flawed premise rather than questioning the premise itself. Also trigger when a proposed solution feels like an incremental improvement on something that may be fundamentally broken. Don't optimize a flawed foundation — question it first.
21scenario-planning
Apply scenario planning whenever the user is making long-term decisions, building roadmaps, evaluating strategies, or operating in an environment with significant uncertainty about how the future will unfold. Triggers on phrases like "what should our roadmap look like?", "how do we plan for the future?", "what if things change?", "we're not sure which direction the market will go", "how do we make this strategy resilient?", "what's our plan B?", "what are the different futures we could face?", or when a plan assumes a single future state. Also trigger when someone is over-committed to one expected outcome and hasn't stress-tested the strategy against alternative futures. Don't plan for one future — plan for multiple.
21