probabilistic-thinking

Installation
SKILL.md

Probabilistic & Bayesian Thinking

Core principle: Probabilistic thinking replaces vague confidence with calibrated estimates. Bayesian thinking updates those estimates as evidence arrives — neither clinging to priors nor overreacting to new data.


Core Concepts

Probability as Degree of Belief

"Will probably work" → 60%? 90%? Forcing a number exposes vague confidence and creates a baseline for updating.

Base Rates

Find the base rate before estimating a specific event — how often does this event type occur in a reference class?

"Will this feature succeed?" → What % of similar features in similar products succeeded?

Ignoring base rates (base rate fallacy) is a top reasoning error.

Bayesian Updating

Update proportionally — not by ignoring priors, not by overwriting them.

New Belief = Prior Belief × Weight of New Evidence
  • Prior: belief before evidence
  • Likelihood: P(evidence | hypothesis true) vs. false
  • Posterior: belief after evidence

Expected Value

EV = Probability × Value

A 10% chance of +€100 (EV = €10) beats a 90% chance of +€5 (EV = €4.50).

Confidence Intervals

Point estimates are usually wrong. Ranges are honest.

  • "4 weeks" → "3–7 weeks (80% confidence)"
  • Wide intervals on uncertain things = calibration, not weakness.

Output Format

Probability Estimates

Claim Prior Evidence Updated Confidence
"Feature will succeed" 30% (base rate) Strong user signal 55% Medium
"Will ship on time" 40% (historical) Experienced team 50% Low

Base Rate Check

  • Reference class for this situation?
  • Historical base rate for this outcome?
  • How does this case differ from base rate (and does that justify adjustment)?

Bayesian Update

  • Prior: belief before
  • New evidence: what we now know
  • Likelihood ratio: more consistent with hypothesis true or false?
  • Posterior: belief now
  • Update size: did evidence move the needle? (Strong evidence → large; weak → small.)

Expected Value Comparison

Option Probability Value if succeeds Value if fails EV
A 70% +€50k -€10k +€32k
B 30% +€200k -€20k +€46k

Confidence Ranges

  • Optimistic (10th pct): [value]
  • Expected (50th pct): [value]
  • Pessimistic (90th pct): [value]
  • Black swan: [tail scenario]

Probability Hygiene Flags

  • Probabilities treated as certainties (0%/100%)? Almost nothing is certain.
  • Base rate ignored for the specific case?
  • Overreaction to latest evidence (anchoring)?
  • Conjunction fallacy? (P(A and B) < P(A) — more specific = lower probability)

Calibration Heuristics

Fermi Estimation — break unknowns into estimable parts:

  • "How many users?" → market size × awareness % × conversion % × retention %

Reference Class Forecasting — historical data from similar projects:

  • "This feature type took 4–8 weeks for 80% of teams in our class"

Outside View vs. Inside View:

  • Inside: "We're special, we'll beat the average"
  • Outside: "What does the data say for projects like this?"
  • Default outside. Adjust only with specific, strong evidence.

Pre-commit to what would change your mind:

  • "If we see X, I'll move probability from 60% to below 30%"
  • Prevents post-hoc rationalization.

Thinking Triggers

  • "What's the base rate?"
  • "Are we treating 70% like certainty?"
  • "What's the EV of each option, not just the upside?"
  • "How much should this evidence actually move our belief?"
  • "What would change our mind significantly?"
  • "Are we in the reference class we think we're in?"
  • "What's the downside, and are we weighting it correctly?"

Example Applications

  • "Should we build this?" → % of similar features that drove retention? Cost if it fails?
  • "A/B test showed a lift" → Sample size sufficient? Prior for this change type?
  • "We'll ship in 2 weeks" → Historical distribution? 80th percentile?
  • "Agent failed once — bug?" → Base rate of one-off failures? Evidence that would confirm systematic?
Related skills

More from andurilcode/skills

Installs
27
GitHub Stars
6
First Seen
Mar 5, 2026