skills/andurilcode/skills/probabilistic-thinking

probabilistic-thinking

SKILL.md

Probabilistic & Bayesian Thinking

Core principle: Most real decisions happen under uncertainty. Probabilistic thinking replaces vague confidence with calibrated estimates. Bayesian thinking adds the discipline of updating those estimates as new evidence arrives — neither clinging to prior beliefs nor overreacting to new data.


Core Concepts

Probability as Degree of Belief

Probability isn't just for coin flips. It's a measure of how confident we are in any claim, given current evidence.

  • "This will probably work" → What probability? 60%? 90%? The difference matters.
  • Forcing a number exposes vague confidence and creates a baseline for updating.

Base Rates

Before estimating the probability of a specific event, find the base rate — how often does this type of event occur in a reference class?

"Will this feature succeed?" → What % of similar features in similar products succeeded?

Ignoring base rates (the base rate fallacy) is one of the most common reasoning errors.

Bayesian Updating

When new evidence arrives, update beliefs proportionally — not by ignoring prior beliefs, and not by overwriting them entirely.

New Belief = Prior Belief × Weight of New Evidence

Key questions:

  • Prior: What did we believe before this evidence?
  • Likelihood: How probable is this evidence if the hypothesis is true? If it's false?
  • Posterior: What should we believe now?

Expected Value

When choosing between options under uncertainty, compare expected values:

EV = Probability of outcome × Value of outcome

A 10% chance of +€100 (EV = €10) is better than a 90% chance of +€5 (EV = €4.50).

Confidence Intervals

Point estimates are almost always wrong. Ranges are more honest.

  • Instead of "this will take 4 weeks" → "this will take 3–7 weeks (80% confidence)"
  • Wide intervals are not weakness — they're calibration. Narrow intervals on uncertain things are overconfidence.

Output Format

📊 Probability Estimates

For each key claim or outcome:

Claim Prior probability Evidence Updated probability Confidence
"Feature will succeed" 30% (base rate) Strong user signal 55% Medium
"This will ship on time" 40% (historical) Team is experienced 50% Low

🔢 Base Rate Check

  • What is the reference class for this situation?
  • What is the historical base rate for this type of outcome?
  • How does this specific case differ from the base rate (and does that justify adjusting up or down)?

🔄 Bayesian Update

When new evidence has arrived:

  • Prior belief: What did we think before?
  • New evidence: What do we now know?
  • Likelihood ratio: Is this evidence more consistent with the hypothesis being true or false?
  • Posterior belief: What should we believe now?
  • Update size: Did this evidence move the needle significantly? (Strong evidence = large update. Weak evidence = small update.)

⚖️ Expected Value Comparison

When choosing between options:

Option Probability Value if succeeds Value if fails Expected Value
Option A 70% +€50k -€10k +€32k
Option B 30% +€200k -€20k +€46k

📏 Confidence Ranges

Replace point estimates with ranges:

  • Optimistic case (10th percentile): [value]
  • Expected case (50th percentile): [value]
  • Pessimistic case (90th percentile): [value]
  • Black swan scenario: [What happens in the tail?]

⚠️ Probability Hygiene Flags

  • Are any probabilities being treated as certainties (0% or 100%)? Almost nothing is certain.
  • Is base rate being ignored in favor of the specific case?
  • Is new evidence causing overreaction (anchoring to latest data)?
  • Is there a conjunction fallacy? (P(A and B) < P(A) always — the more specific the scenario, the lower its probability)

Calibration Heuristics

Fermi Estimation — For unknown quantities, break into smaller estimable parts:

  • Instead of "how many users will we get?" → estimate: market size × awareness % × conversion % × retention %

Reference Class Forecasting — Use historical data from similar projects:

  • "This type of feature took 4–8 weeks for 80% of teams in our reference class"

Outside View vs. Inside View:

  • Inside view: "Our situation is special, we'll beat the average"
  • Outside view: "What does the data say for projects like this?"
  • Default to the outside view. Adjust only with specific, strong evidence.

Pre-commit to what would change your mind:

  • "If we see X, I will update my probability from 60% to below 30%"
  • This prevents post-hoc rationalization of new evidence

Thinking Triggers

  • "What's the base rate for this?"
  • "Are we treating a 70% probability like a certainty?"
  • "What's the expected value of each option, not just the upside?"
  • "How much should this new evidence actually move our belief?"
  • "What would we need to see to change our mind significantly?"
  • "Are we in the reference class we think we're in?"
  • "What's the downside scenario, and are we weighting it correctly?"

Example Applications

  • "Should we build this feature?" → What % of similar features drove meaningful retention? What's the cost if it fails?
  • "This A/B test showed a lift" → Is the sample size sufficient? What's the prior for this type of change?
  • "We'll ship in 2 weeks" → What's the historical distribution for similar tasks? What's the 80th percentile?
  • "The agent failed once — is it a bug?" → What's the base rate of one-off failures? What evidence would confirm it's systematic?
Weekly Installs
11
First Seen
11 days ago
Installed on
gemini-cli11
github-copilot11
codex11
kimi-cli11
cursor11
amp11