stat-causal-inference
Installation
SKILL.md
Causal Inference
Framework
IRON LAW: Correlation Is Not Causation — But Causation Is Estimable
Observational data cannot prove causation through correlation alone.
BUT with the right methodology (matching, IV, DID, RDD), we CAN
estimate causal effects from observational data — IF the assumptions
of each method are satisfied and explicitly tested.
The key question is always: "What would have happened WITHOUT the treatment?"
(the counterfactual)
The Fundamental Problem
We observe: Y_i(treated) — what happened to the treated unit.
We want to know: Y_i(treated) - Y_i(untreated) — the causal effect.
We can never observe: Y_i(untreated) for the same unit at the same time.
All causal inference methods estimate the counterfactual — what would have happened without the treatment.
Method Selection Guide
| Method | When to Use | Key Assumption |
|---|---|---|
| RCT | You can randomize | Random assignment eliminates confounders |
| Propensity Score Matching (PSM) | Treatment is non-random but based on observables | No unobserved confounders (selection on observables) |
| Instrumental Variables (IV) | Unobserved confounders exist but you have an instrument | Instrument affects treatment but not outcome directly |
| Difference-in-Differences (DID) | Policy/event creates natural treatment/control groups | Parallel trends: groups would have trended similarly without treatment |
| Regression Discontinuity (RDD) | Treatment assigned by a cutoff | Observations just above/below cutoff are comparable |
| Synthetic Control | One treated unit, multiple control units (aggregate data) | Synthetic weighted combination matches pre-treatment trends |
Analysis Steps
- Define the causal question: What is the treatment? What is the outcome?
- Identify threats to validity: What confounders could explain the association?
- Choose a method: Based on data structure and available identification strategy
- Check assumptions: Each method has testable and untestable assumptions
- Estimate the effect: Run the analysis
- Sensitivity analysis: How much would results change if assumptions are partially violated?
Output Format
# Causal Analysis: {Treatment} → {Outcome}
## Causal Question
- Treatment: {what intervention/event}
- Outcome: {what we're measuring}
- Counterfactual: {what would have happened without treatment}
## Identification Strategy
- Method: {PSM / IV / DID / RDD / etc.}
- Rationale: {why this method fits}
- Key assumption: {stated explicitly}
- Assumption test: {how we check, or acknowledge if untestable}
## Results
- Estimated causal effect: {magnitude with CI}
- Robustness checks: {alternative specifications}
## Limitations
{What could still invalidate these results}
Gotchas
- "Controlling for X" doesn't guarantee causation: Adding control variables to a regression reduces SOME confounding but not unobserved confounders. If the treatment wasn't random, OLS with controls is not causal.
- Parallel trends is untestable: For DID, we can check pre-treatment parallel trends but can't prove they would have continued. It's an assumption, not a fact.
- Weak instruments invalidate IV: An instrument that barely affects the treatment produces biased estimates (often worse than OLS). Test instrument strength with the first-stage F-statistic (> 10).
- External validity: Causal effects estimated in one context may not generalize. An effect estimated for users near a cutoff (RDD) may not apply to the full population.
- Causal inference requires domain knowledge: Statistical methods alone can't determine what is a confounder, what is a mediator, or what is a collider. Draw the causal diagram (DAG) first.
References
- For directed acyclic graphs (DAGs), see
references/causal-dags.md - For DID implementation in Python/R, see
references/did-implementation.md
Weekly Installs
15
Repository
asgard-ai-platf…m/skillsGitHub Stars
125
First Seen
6 days ago
Security Audits