systems-thinking
Systems Thinking Evaluation Skill
Apply a rigorous systems lens to evaluate what's working, what isn't, and why in any system — technical, organizational, product, or process.
When to Use This Skill
Trigger this skill whenever the user asks you to:
- Evaluate or assess a system, product, process, strategy, or architecture
- Understand why something isn't working as expected
- Identify bottlenecks, failure points, or risks
- Suggest where to intervene for the most impact
- Review a design or plan before execution
Core Mental Models to Apply
1. Identify the System Boundary
- What's inside the system (in scope)?
- What's outside (environment, dependencies, constraints)?
- Where does the system begin and end?
2. Stocks and Flows
- Stocks: What accumulates over time? (users, debt, trust, knowledge, bugs, revenue)
- Flows: What increases or decreases those stocks? (acquisition, churn, learning, entropy)
- Where are flows blocked, broken, or leaking?
3. Feedback Loops
- Reinforcing loops (R): Self-amplifying dynamics — virtuous cycles or vicious spirals
- Example: More users → more content → more users (growth flywheel)
- Example: More bugs → less trust → fewer contributors → more bugs
- Balancing loops (B): Self-correcting dynamics — goal-seeking behaviors
- Example: High load → auto-scale → stable performance
- Example: User complaints → support → resolution → satisfaction
- Ask: Which loops dominate the system's current behavior?
4. Delays
- Where are there time lags between cause and effect?
- Delays often cause oscillation, overcorrection, or invisible failures
- Example: Hiring takes 3 months → team overloads → burnout → more attrition
5. System Archetypes (common failure patterns)
Match observed behavior to known archetypes:
| Archetype | Pattern | Signal |
|---|---|---|
| Limits to Growth | Growth hits a constraint and stalls | Plateau despite investment |
| Fixes that Fail | Quick fix creates new problems | Recurring issues after "solutions" |
| Shifting the Burden | Symptomatic fixes erode fundamental ones | Team always firefighting |
| Tragedy of the Commons | Shared resources are depleted | Quality/performance degrades over time |
| Escalation | Competing actors amplify each other | Bidding wars, arms races |
| Drifting Goals | Performance gap closed by lowering standards | "Good enough" keeps declining |
| Accidental Adversaries | Well-meaning actors undermine each other | Misaligned incentives between teams |
6. Leverage Points
Rank interventions by impact (from lowest to highest leverage):
- Numbers (parameters, budgets, quotas) — low leverage
- Buffer sizes and stock capacities
- Flow rates and delays
- Feedback loop strength
- Information flows (who has access to what, when)
- Rules and incentives
- Goals of the system
- Power to change the system's structure
- Mindsets and paradigms — highest leverage
Evaluation Output Format
When evaluating a system, structure the response as follows:
🟢 Where the System Works
- Identify functioning feedback loops, healthy stocks, aligned incentives
- Call out genuine strengths (not to be polite — to understand what to protect)
🔴 Where the System Breaks Down
- Point to broken loops, leaking flows, missing feedback, or misaligned incentives
- For each issue, name the archetype if one applies
- Identify delays that hide the problem
⚠️ Key Risks and Failure Modes
- What could cause the system to tip into a bad equilibrium?
- What reinforcing loop could go negative?
- What constraint will be hit next?
🎯 High-Leverage Interventions
- Ranked list of where to intervene
- For each: what changes, what loop or flow it affects, expected result
- Flag quick fixes that might backfire (Fixes that Fail archetype)
📊 System Diagram (optional, when helpful)
Describe or sketch a causal loop diagram in text:
[Variable A] → (+) [Variable B] → (+) [Variable A] ← Reinforcing loop R1
[Variable A] → (+) [Variable C] → (-) [Variable A] ← Balancing loop B1
Tone and Approach
- Be direct about what's broken — systems evaluation is not diplomacy
- Use concrete examples tied to the user's specific context
- Prioritize systemic causes over symptoms — don't just describe what's wrong, explain why the system produces that outcome
- When a problem "keeps coming back," suspect a reinforcing loop or Shifting the Burden archetype
- Always suggest at least one high-leverage intervention — not just diagnosis
Example Application Triggers
- "Evaluate our onboarding funnel" → apply stocks/flows to conversion, identify where users leak out and why
- "Why does our team keep missing deadlines?" → look for delays, Shifting the Burden, workload dynamics
- "Is this architecture scalable?" → identify capacity limits, balancing loops under load, missing circuit breakers
- "Assess our growth strategy" → find reinforcing flywheels, limits to growth constraints, escalation risks
- "What's wrong with our deploy process?" → trace flow from commit to production, find delays, balancing loops
- "Should we change our pricing model?" → map revenue stocks, customer feedback loops, competitive dynamics
More from andurilcode/craftwork
deep-document-processor
>
4summarizer
Apply this skill whenever the user asks to summarize, condense, distill, or compress any content — a document, article, meeting notes, conversation, codebase, book, research paper, video transcript, or any other source material. Triggers on phrases like 'summarize this', 'give me the TL;DR', 'condense this', 'what are the key points?', 'distill this down', 'brief me on this', 'what's the gist?', 'BLUF this', 'executive summary', 'compress this for me', or any request to reduce content while preserving its essential value. Also trigger when the user pastes a long text and implicitly wants it shortened, when they share a link and ask 'what does this say?', or when they ask for meeting notes or action items from a transcript. This skill does NOT apply to 'explain X to me' (use topic-explainer) or 'write a summary section for my doc' (use technical-writing). This skill is for when source material exists and needs to be compressed.
3inversion-premortem
Apply inversion and pre-mortem thinking whenever the user asks to evaluate a plan, strategy, architecture, feature, or decision before execution — or when they want to stress-test something that already exists. Triggers on phrases like "is this a good idea?", "what could go wrong?", "review this plan", "should we do this?", "are we missing anything?", "stress-test this", "what are the risks?", or any request to validate a decision or design. Use this skill proactively — if the user is about to commit to something, this skill should be consulted even if they don't ask for it explicitly.
3llms-txt-generator
Generate llms.txt-style context documents — token-budgeted, section-per-concept Markdown optimized for LLM and RAG consumption. Use this skill whenever someone asks to generate an llms.txt, create LLM-friendly documentation, produce a context document for a library or codebase, build a RAG-ready reference, make docs 'agent-readable', create a developer quick-reference, or says anything like 'generate context for X', 'make an llms.txt for this repo', 'create a reference doc for NotebookLM', 'turn these docs into something an LLM can use', 'context document', 'developer cheatsheet from docs'. Also trigger when someone provides a GitHub repo URL and asks for documentation synthesis, or when working inside a codebase and asked to produce a self-contained reference of how it works. This is the context engineer's doc generation tool — it turns sprawling documentation into precise, structured, token-efficient context.
3context-compressor
>
3probabilistic-thinking
Apply probabilistic and Bayesian thinking whenever the user needs to reason under uncertainty, compare risks, prioritize between options, update beliefs based on new evidence, or make decisions without complete information. Triggers on phrases like "what are the odds?", "how likely is this?", "should I be worried about X?", "which risk is bigger?", "does this data change anything?", "is this a signal or noise?", "what's the probability?", "how confident are we?", or any situation where decisions are being made based on incomplete or ambiguous evidence. Also trigger when someone is treating uncertain outcomes as certainties, or when probability language is being used loosely ("probably", "unlikely", "very likely") without quantification. Don't leave uncertainty unexamined.
3