systems-thinking
Systems Thinking Evaluation Skill
Apply a rigorous systems lens to evaluate what's working, what isn't, and why in any system — technical, organizational, product, or process.
When to Use This Skill
Trigger this skill whenever the user asks you to:
- Evaluate or assess a system, product, process, strategy, or architecture
- Understand why something isn't working as expected
- Identify bottlenecks, failure points, or risks
- Suggest where to intervene for the most impact
- Review a design or plan before execution
Core Mental Models to Apply
1. Identify the System Boundary
- What's inside the system (in scope)?
- What's outside (environment, dependencies, constraints)?
- Where does the system begin and end?
2. Stocks and Flows
- Stocks: What accumulates over time? (users, debt, trust, knowledge, bugs, revenue)
- Flows: What increases or decreases those stocks? (acquisition, churn, learning, entropy)
- Where are flows blocked, broken, or leaking?
3. Feedback Loops
- Reinforcing loops (R): Self-amplifying dynamics — virtuous cycles or vicious spirals
- Example: More users → more content → more users (growth flywheel)
- Example: More bugs → less trust → fewer contributors → more bugs
- Balancing loops (B): Self-correcting dynamics — goal-seeking behaviors
- Example: High load → auto-scale → stable performance
- Example: User complaints → support → resolution → satisfaction
- Ask: Which loops dominate the system's current behavior?
4. Delays
- Where are there time lags between cause and effect?
- Delays often cause oscillation, overcorrection, or invisible failures
- Example: Hiring takes 3 months → team overloads → burnout → more attrition
5. System Archetypes (common failure patterns)
Match observed behavior to known archetypes:
| Archetype | Pattern | Signal |
|---|---|---|
| Limits to Growth | Growth hits a constraint and stalls | Plateau despite investment |
| Fixes that Fail | Quick fix creates new problems | Recurring issues after "solutions" |
| Shifting the Burden | Symptomatic fixes erode fundamental ones | Team always firefighting |
| Tragedy of the Commons | Shared resources are depleted | Quality/performance degrades over time |
| Escalation | Competing actors amplify each other | Bidding wars, arms races |
| Drifting Goals | Performance gap closed by lowering standards | "Good enough" keeps declining |
| Accidental Adversaries | Well-meaning actors undermine each other | Misaligned incentives between teams |
6. Leverage Points
Rank interventions by impact (from lowest to highest leverage):
- Numbers (parameters, budgets, quotas) — low leverage
- Buffer sizes and stock capacities
- Flow rates and delays
- Feedback loop strength
- Information flows (who has access to what, when)
- Rules and incentives
- Goals of the system
- Power to change the system's structure
- Mindsets and paradigms — highest leverage
Evaluation Output Format
When evaluating a system, structure the response as follows:
🟢 Where the System Works
- Identify functioning feedback loops, healthy stocks, aligned incentives
- Call out genuine strengths (not to be polite — to understand what to protect)
🔴 Where the System Breaks Down
- Point to broken loops, leaking flows, missing feedback, or misaligned incentives
- For each issue, name the archetype if one applies
- Identify delays that hide the problem
⚠️ Key Risks and Failure Modes
- What could cause the system to tip into a bad equilibrium?
- What reinforcing loop could go negative?
- What constraint will be hit next?
🎯 High-Leverage Interventions
- Ranked list of where to intervene
- For each: what changes, what loop or flow it affects, expected result
- Flag quick fixes that might backfire (Fixes that Fail archetype)
📊 System Diagram (optional, when helpful)
Describe or sketch a causal loop diagram in text:
[Variable A] → (+) [Variable B] → (+) [Variable A] ← Reinforcing loop R1
[Variable A] → (+) [Variable C] → (-) [Variable A] ← Balancing loop B1
Tone and Approach
- Be direct about what's broken — systems evaluation is not diplomacy
- Use concrete examples tied to the user's specific context
- Prioritize systemic causes over symptoms — don't just describe what's wrong, explain why the system produces that outcome
- When a problem "keeps coming back," suspect a reinforcing loop or Shifting the Burden archetype
- Always suggest at least one high-leverage intervention — not just diagnosis
Example Application Triggers
- "Evaluate our onboarding funnel" → apply stocks/flows to conversion, identify where users leak out and why
- "Why does our team keep missing deadlines?" → look for delays, Shifting the Burden, workload dynamics
- "Is this architecture scalable?" → identify capacity limits, balancing loops under load, missing circuit breakers
- "Assess our growth strategy" → find reinforcing flywheels, limits to growth constraints, escalation risks
- "What's wrong with our deploy process?" → trace flow from commit to production, find delays, balancing loops
- "Should we change our pricing model?" → map revenue stocks, customer feedback loops, competitive dynamics
More from andurilcode/skills
causal-inference
Apply causal inference whenever the user is interpreting metrics, debugging system behavior, reading A/B test results, or trying to understand whether an observed change was caused by an action or by something else. Triggers on phrases like "X caused Y", "since we deployed this, metrics changed", "the A/B test showed a lift", "why did this metric move?", "is this correlation or causation?", "we changed X and Y improved", "how do we know this worked?", "the data shows…", or any situation where conclusions are being drawn from observational data. Also trigger before any decision based on metric interpretation — confusing correlation with causation leads to interventions that don't work and misattribution of credit. Never assume causation without applying this skill.
30probabilistic-thinking
Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.
27cognitive-bias-detection
Apply cognitive bias detection whenever the user (or Claude itself) is making an evaluation, recommendation, or decision that could be silently distorted by systematic thinking errors. Triggers on phrases like "I'm pretty sure", "obviously", "everyone agrees", "we already invested so much", "this has always worked", "just one more try", "I knew it", "the data confirms what we thought", "we can't go back now", or when analysis feels suspiciously aligned with what someone wanted to hear. Also trigger proactively when evaluating high-stakes decisions, plans with significant sunk costs, or conclusions that conveniently support the evaluator's existing position. The goal is not to paralyze — it's to flag where reasoning may be compromised so it can be corrected.
24inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
23analogical-thinking
Apply analogical thinking whenever the user is designing a system, architecture, or process and would benefit from structural patterns that already exist in other domains — or when a problem feels novel but may have been solved elsewhere under a different name. Triggers on phrases like "how should we structure this?", "has anyone solved this before?", "we're designing from scratch", "what's a good model for this?", "I keep feeling like this resembles something", "what patterns apply here?", or when facing architecture, organizational, or process design decisions. Also trigger when a problem has been analyzed thoroughly but no good solution has emerged — the answer may exist in an adjacent domain. Don't reinvent what's been solved. Recognize the shape of the problem first.
22first-principles-thinking
Apply first principles thinking whenever the user is questioning whether a design, strategy, or solution is fundamentally right — not just well-executed. Triggers on phrases like "are we solving the right problem?", "why do we do it this way?", "is this the best approach?", "everyone does X but should we?", "we've always done it this way", "challenge our assumptions", "start from scratch", "is there a better way?", or when the user seems to be iterating on a flawed premise rather than questioning the premise itself. Also trigger when a proposed solution feels like an incremental improvement on something that may be fundamentally broken. Don't optimize a flawed foundation — question it first.
21