fermi-estimation
Fermi Estimation
Core principle: Almost any quantity can be estimated to within an order of magnitude by decomposing it into estimable factors and multiplying. Goal is the right number of zeros, not precision. A 10× error is informative; a 1000× error changes the decision.
The Core Process
Step 1: Define the Target Quantity Precisely
Specify what, over what period, for what scope, in what units.
"How many tokens does this use?" → "Total token count of one Constellation pipeline run, medium-complexity feature, across all agent turns?"
Step 2: Decompose into Estimable Factors
Target = Factor_1 × Factor_2 × Factor_3 × ...
Each factor independently estimable; units cancel correctly; no factor is the original unknown in disguise.
Patterns: Rate × Time · Count × Average · Population × Fraction · Flow × Duration
Step 3: Estimate Each Factor
Explicit reasoning per factor. Round numbers — order of magnitude, not false precision.
Step 4: Compute and Sanity-Check
Multiply through. Does it pass common sense? Match reference points? Which factor, if wrong, most changes the result?
Step 5: Bound the Estimate
Low (each factor at low) / Central (best guess) / High (each at high). High/low within ~3× each side = well-bounded. Orders of magnitude apart = one factor too uncertain (validate it).
Output Format
Target Quantity
- Estimating: [Precisely defined quantity]
- Units: [What we're counting in]
Decomposition
| Factor | Estimate | Reasoning |
|---|---|---|
| [Factor 1] | [Value] | [Why] |
| [Factor 2] | [Value] | [Why] |
| Product | = [Result] |
Range
| Scenario | Estimate | Key driver |
|---|---|---|
| Low | [Value] | [Factor at low] |
| Central | [Value] | Best guess |
| High | [Value] | [Factor at high] |
Key Driver
- Which factor contributes most?
- If you could validate one, which?
- A 2× error in [key factor] produces a 2× error in result — worth checking.
Sanity Checks
- Reference point: [comparable known value]
- Common sense pass? [If no, which factor is suspect?]
- Order-of-magnitude conclusion: [zeros that matter]
Reference Points
Time
- Person-hour engineering: ~1–4 hrs focused
- Working hours/week: ~40 (effective ~25–30)
- Working days/month: ~22
Compute / LLM
- Token density: ~750 words / 1,000 tokens
- GPT-4-class input: ~$2–10 / M tokens
- LLM response time: 1–10s
- Code file: 50–500 lines; ~100–2,000 tokens
Scale
- Small SaaS: 1k–10k MAU
- Mid-size: 100k–1M MAU
- Large platform: 10M+ MAU
Money
- Fully-loaded engineer (EU/US): €80k–€200k/yr
- Per-hour: €40–€100
- AWS small instance: ~$10–50/month
Anti-Patterns
- False precision: Reporting "42,381 tokens" for an order-of-magnitude estimate. Use round numbers.
- Single-path decomposition: Cross-check with an independent decomposition.
- Forgetting units: If they don't cancel, the decomposition is wrong.
- Treating estimate as answer: Starting point and sanity check, not a substitute for measurement when measurement is warranted.
- Refusing to estimate: "I don't have enough data" is rarely right when a decision needs to be made. Decompose what you can; flag what you can't.
Thinking Triggers
- "What does this equal as a product of things I can estimate?"
- "What's the right number of zeros?"
- "Which single factor, if wrong by 10×, changes my conclusion?"
- "What reference point can I sanity-check against?"
- "If off by 2×, does the decision change? By 10×?"
Example: Token Budget for an Agent Pipeline
Question: How many tokens does one Constellation run consume?
| Factor | Estimate | Reasoning |
|---|---|---|
| Agent turns | 8 | 6 agents + orchestrator + review |
| Avg input tokens/turn | 4,000 | System ~1k + context ~2k + task ~1k |
| Avg output tokens/turn | 1,000 | Structured response |
| Total per run | = 8 × 5,000 = 40,000 |
Range: 20k (simple, short context) to 120k (complex, full history).
Key driver: Input context size dominates. Compressing context is highest-leverage.
Sanity check: 40k @ $5/M = $0.20/run. 100 runs/day = $20/day = ~$600/month. Plausible for a dev tool.
More from andurilcode/skills
causal-inference
Apply causal inference whenever the user is interpreting metrics, debugging system behavior, reading A/B test results, or trying to understand whether an observed change was caused by an action or by something else. Triggers on phrases like "X caused Y", "since we deployed this, metrics changed", "the A/B test showed a lift", "why did this metric move?", "is this correlation or causation?", "we changed X and Y improved", "how do we know this worked?", "the data shows…", or any situation where conclusions are being drawn from observational data. Also trigger before any decision based on metric interpretation — confusing correlation with causation leads to interventions that don't work and misattribution of credit. Never assume causation without applying this skill.
31probabilistic-thinking
Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.
27cognitive-bias-detection
Apply cognitive bias detection whenever the user (or Claude itself) is making an evaluation, recommendation, or decision that could be silently distorted by systematic thinking errors. Triggers on phrases like "I'm pretty sure", "obviously", "everyone agrees", "we already invested so much", "this has always worked", "just one more try", "I knew it", "the data confirms what we thought", "we can't go back now", or when analysis feels suspiciously aligned with what someone wanted to hear. Also trigger proactively when evaluating high-stakes decisions, plans with significant sunk costs, or conclusions that conveniently support the evaluator's existing position. The goal is not to paralyze — it's to flag where reasoning may be compromised so it can be corrected.
24inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
23analogical-thinking
Apply analogical thinking whenever the user is designing a system, architecture, or process and would benefit from structural patterns that already exist in other domains — or when a problem feels novel but may have been solved elsewhere under a different name. Triggers on phrases like "how should we structure this?", "has anyone solved this before?", "we're designing from scratch", "what's a good model for this?", "I keep feeling like this resembles something", "what patterns apply here?", or when facing architecture, organizational, or process design decisions. Also trigger when a problem has been analyzed thoroughly but no good solution has emerged — the answer may exist in an adjacent domain. Don't reinvent what's been solved. Recognize the shape of the problem first.
22first-principles-thinking
Apply first principles thinking whenever the user is questioning whether a design, strategy, or solution is fundamentally right — not just well-executed. Triggers on phrases like "are we solving the right problem?", "why do we do it this way?", "is this the best approach?", "everyone does X but should we?", "we've always done it this way", "challenge our assumptions", "start from scratch", "is there a better way?", or when the user seems to be iterating on a flawed premise rather than questioning the premise itself. Also trigger when a proposed solution feels like an incremental improvement on something that may be fundamentally broken. Don't optimize a flawed foundation — question it first.
21