causal-inference
Causal Inference
Core principle: Correlation is not causation — but sometimes it is, and knowing which matters enormously. Use counterfactuals, confounders, and causal structure to ask "did X actually cause Y?" rigorously before acting on data.
The Core Distinction
Correlation: X and Y move together. Causation: Changing X changes Y — and we know why.
Why it matters:
- Intervening on a correlate with no causal path wastes effort
- Missing a confounder leads to attributing effects to the wrong cause
- Acting on spurious correlation can make things worse
Key Concepts
Counterfactual Reasoning
The fundamental question:
"What would have happened to Y if X had been different, all else equal?"
You never observe both the treated and untreated state of the same unit at the same time — the fundamental problem of causal inference. Every causal claim is implicitly counterfactual; make it explicit.
Confounders
A third variable Z that causally affects both X and Y, creating correlation between them without a direct causal path.
Z → X
Z → Y
X and Y correlate, but X doesn't cause Y. Intervening on X does nothing.
Example: Ice cream sales and drowning rates correlate. Confounder: hot weather → more ice cream AND more swimming → more drowning. Banning ice cream doesn't reduce drowning.
Common confounders in product/engineering work:
- Seasonality (feature adoption and engagement move together)
- Selection bias (users who adopt are already more engaged)
- External events (a competitor shut down the same week you shipped)
- Time trends (both metrics were already moving before intervention)
Mediators vs. Confounders
A mediator is on the causal path — X → M → Y. Blocking it blocks the effect. A confounder is upstream of both — control for it.
Confusing them causes overcorrection (controlling for a mediator removes the effect you're looking for).
Simpson's Paradox
An observed trend can reverse when data is aggregated. A treatment can appear harmful in aggregate but beneficial in every subgroup (or vice versa) due to unequal group sizes.
Always ask: Does disaggregating change the conclusion?
Tools for Establishing Causation
Randomized Controlled Experiment (Gold Standard)
Random assignment eliminates confounding by making treatment independent of all other variables.
In product work: A/B tests are RCTs. Validity depends on:
- Random assignment (not self-selection)
- Sufficient sample size (statistical power)
- Single treatment change (no simultaneous changes)
- No interference between units (SUTVA)
- Correct metric selection
A/B test failure modes:
- Novelty effect: early lift decays as users habituate
- Sample Ratio Mismatch: unequal group sizes indicating randomization failure
- Multiple comparisons: 20 metrics gives 1 false positive by chance at p=0.05
- Peeking: stopping early when results look good inflates false positive rate
Difference-in-Differences (DiD)
Compare the change for a treated group vs. control over time.
Effect = (Treated_after - Treated_before) - (Control_after - Control_before)
Assumes: Without treatment, both groups would have followed parallel trends. Use when: You have pre/post data and a natural control group but couldn't randomize.
Natural Experiments
External factors create quasi-random treatment variation — policy changes, geographic boundaries, system outages, cohort-based rollouts.
Example: Feature rolled out by sign-up date — early users are treatment, later users are control (if no self-selection in timing).
Causal Graph (DAG)
Map all variables and their causal relationships. Makes confounders and mediators explicit and determines what to control for.
[Confounder Z] → [Treatment X] → [Mediator M] → [Outcome Y]
↓___________________________________↑
Reading the DAG: control for Z (confounder), don't control for M (mediator).
Output Format
🔍 Causal Claim Under Examination
- Stated claim: [What is asserted to cause what]
- Reformulated as counterfactual: "Would Y have been different if X had not occurred, all else equal?"
🕸️ Causal Structure
Sketch the causal graph:
- Proposed causal paths?
- Potential confounders?
- Mediators (on the causal path)?
- Colliders (caused by both X and Y — controlling opens spurious paths)?
⚠️ Threats to Causal Interpretation
For each: Present / Possible / Unlikely
| Threat | Present? | Evidence | Impact on Conclusion |
|---|---|---|---|
| Confounding | |||
| Selection bias | |||
| Reverse causation (Y → X) | |||
| Common cause (Z → X, Z → Y) | |||
| Seasonality / time trend | |||
| Coincidental timing | |||
| Simpson's Paradox |
📊 Evidence Quality
- Design used: [RCT / DiD / Natural experiment / Observational]
- Evidence strength: [Strong / Moderate / Weak]
- Key assumptions: [What must be true for the design to be valid]
- Assumption violations: [Any signs assumptions don't hold]
🎯 Conclusion
- Causal claim warranted?: [Yes / Probably / Unclear / No]
- If yes: Estimated effect size and confidence
- If unclear: What evidence would resolve it?
- If no: What alternative explanation better fits the data?
🔬 Next Steps
- What experiment would establish causation most efficiently?
- What natural variation in the data could be exploited?
- What confounders should be measured and controlled for?
Causal Inference Checklist for A/B Tests
Before trusting a result:
- Was assignment truly random? Check Sample Ratio Mismatch.
- Was only one thing changed?
- Is sample size sufficient for the expected effect?
- Was the test run for a full weekly cycle?
- Is the primary metric pre-specified?
- Do secondary metrics that should move actually move?
- Is there a plausible mechanism explaining why X would cause Y?
- Is the effect consistent across segments? (Check Simpson's Paradox)
Thinking Triggers
- "What's the counterfactual? What would have happened without this change?"
- "What else changed at the same time that could explain this?"
- "Are the units we're comparing actually comparable?"
- "Is there a third variable that could explain the correlation?"
- "Does the mechanism make sense — why would X cause Y?"
- "Does disaggregating the data change the conclusion?"
- "Would we see the same result if we ran this experiment again?"
More from andurilcode/skills
probabilistic-thinking
Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.
27cognitive-bias-detection
Apply cognitive bias detection whenever the user (or Claude itself) is making an evaluation, recommendation, or decision that could be silently distorted by systematic thinking errors. Triggers on phrases like "I'm pretty sure", "obviously", "everyone agrees", "we already invested so much", "this has always worked", "just one more try", "I knew it", "the data confirms what we thought", "we can't go back now", or when analysis feels suspiciously aligned with what someone wanted to hear. Also trigger proactively when evaluating high-stakes decisions, plans with significant sunk costs, or conclusions that conveniently support the evaluator's existing position. The goal is not to paralyze — it's to flag where reasoning may be compromised so it can be corrected.
24inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
23analogical-thinking
Apply analogical thinking whenever the user is designing a system, architecture, or process and would benefit from structural patterns that already exist in other domains — or when a problem feels novel but may have been solved elsewhere under a different name. Triggers on phrases like "how should we structure this?", "has anyone solved this before?", "we're designing from scratch", "what's a good model for this?", "I keep feeling like this resembles something", "what patterns apply here?", or when facing architecture, organizational, or process design decisions. Also trigger when a problem has been analyzed thoroughly but no good solution has emerged — the answer may exist in an adjacent domain. Don't reinvent what's been solved. Recognize the shape of the problem first.
22first-principles-thinking
Apply first principles thinking whenever the user is questioning whether a design, strategy, or solution is fundamentally right — not just well-executed. Triggers on phrases like "are we solving the right problem?", "why do we do it this way?", "is this the best approach?", "everyone does X but should we?", "we've always done it this way", "challenge our assumptions", "start from scratch", "is there a better way?", or when the user seems to be iterating on a flawed premise rather than questioning the premise itself. Also trigger when a proposed solution feels like an incremental improvement on something that may be fundamentally broken. Don't optimize a flawed foundation — question it first.
21scenario-planning
Apply scenario planning whenever the user is making long-term decisions, building roadmaps, evaluating strategies, or operating in an environment with significant uncertainty about how the future will unfold. Triggers on phrases like "what should our roadmap look like?", "how do we plan for the future?", "what if things change?", "we're not sure which direction the market will go", "how do we make this strategy resilient?", "what's our plan B?", "what are the different futures we could face?", or when a plan assumes a single future state. Also trigger when someone is over-committed to one expected outcome and hasn't stress-tested the strategy against alternative futures. Don't plan for one future — plan for multiple.
21