governance-and-risk
Governance and Risk
Overview
This skill implements the Decision Analysis & Resolution (DAR) and Risk Management (RSKM) process areas from the CMMI-based SDLC prescription.
Core principle: Proactive governance prevents costly reactive firefighting. Documentation and risk management are investments that pay 3-10x returns by avoiding crisis mode.
Critical distinction:
- Reactive: Handle problems when they occur (expensive, stressful, compounding)
- Proactive: Identify and mitigate problems before they occur (cheap, controlled, preventive)
Reference: See docs/sdlc-prescription-cmmi-levels-2-4.md Sections 3.4.1 (DAR) and 3.4.2 (RSKM) for complete policy.
When to Use
Use this skill when:
- Making architectural or technical decisions without ADRs
- Hearing "it's obvious" or "everyone agrees" (groupthink red flag)
- Skipping risk identification ("what could go wrong?")
- Accepting risks without mitigation plans
- Deferring to authority without independent analysis (CTO says, tech lead suggests)
- Using sunk cost to justify decisions ("we've already invested...")
- Treating governance as bureaucracy or overhead
- No ongoing risk monitoring ("set and forget")
Do NOT use for:
- Trivial decisions (variable names, code style) → Use coding standards
- Implementation details → Use design-and-build skill
- Security-specific risk analysis → Use ordis-security-architect
Quick Reference
| Situation | Framework | Mandatory At | Key Action |
|---|---|---|---|
| "Obvious" architectural decision | DAR with ADR | Level 3+ | Document alternatives even if choice is clear |
| High-risk decision (vendor, framework) | DAR with decision matrix | Level 2+ for high-risk | Evaluate alternatives before committing |
| Authority wants specific option | DAR with independent analysis | Level 3+ | Analyze alternatives BEFORE authority input |
| External dependency (API, vendor) | RSKM with mitigation | Level 2+ | Risk register + mitigation plan mandatory |
| "Low-risk" project | RSKM with risk identification | Level 2+ | Optimism bias - identify risks proactively |
| Mid-project (risk monitoring) | RSKM review cadence | Level 3+ | Scheduled reviews, not set-and-forget |
Governance Level Framework
When Practices Are MANDATORY
Level 2 Baseline (All Projects):
- ADRs for high-risk decisions (vendor selection, framework choice, data storage)
- Risk identification with basic register
- Mitigation plans for high-probability or high-impact risks
Level 3 Organizational Standard:
- ADRs for all architectural decisions (not just high-risk)
- Alternatives analysis with decision criteria
- Risk register with probability/impact classification
- Scheduled risk reviews (not set-and-forget)
- Independent analysis before authority/consensus input
Level 4 Quantitative:
- Statistical risk models
- Quantitative decision criteria
- Process performance baselines for decision quality
When Practices Are OPTIONAL
Level 1 or Low-Risk Projects:
- Internal prototypes (< 2 week lifespan)
- Single-developer projects with no audit requirements
- Throwaway code (spikes, experiments)
CRITICAL: "Low-risk" is often optimism bias. Verify with risk assessment before declaring optional.
Anti-Patterns and Rationalizations
"It's Obvious"
Detection: "Everyone agrees", "clear choice", "no brainer"
Why it's tempting: Saves time, reduces documentation burden, team aligned
Why it fails: Today's "obvious" is tomorrow's mysterious. Future maintainers lack context, assumptions not validated, alternatives not considered
Counter:
- Level 3 requirement: Document even "obvious" decisions
- Context loss timeline: 6 months for team turnover, 3 months for forgotten assumptions
- Question to ask: "If someone joins the team in 6 months, will they know WHY we chose this?"
- Lightweight ADR takes 20 minutes, saves hours of future confusion
Red flags: "We all know", "Obviously", "No need to write it down"
"Low-Risk Project"
Detection: "Simple project", "Internal only", "We've done this before", "What could go wrong?"
Why it's tempting: Small scope, experienced team, reduces overhead
Why it fails: Scope creep, resource constraints, and timeline slips hit "simple" projects just as often. Optimism bias blinds to risks.
Counter:
- Level 2 requirement: Risk identification for ALL projects
- Common risks for "simple" projects: scope creep (stakeholders add "just one more thing"), resource availability (PTO, competing priorities), data access (permissions, security approvals), timeline slip (integration surprises)
- Reactive firefighting costs 3-10x proactive planning
- 30-minute risk session saves days of crisis mode
Red flags: "What could go wrong?", "It's just...", "Low-risk"
"Authority/CTO Prefers It"
Detection: "CTO met with vendor", "Tech lead suggested", "Management wants"
Why it's tempting: Reduces conflict, speeds decision, aligns with leadership
Why it fails: Authority bias prevents genuine alternatives analysis. Senior stakeholders have blind spots, vendor relationships create bias, title ≠ technical correctness
Counter:
- Level 3 requirement: Independent alternatives analysis BEFORE authority input
- Document decision criteria first (security, cost, integration, vendor stability)
- Evaluate options against criteria WITHOUT authority preference
- Present analysis to authority: "Here's what the data shows, here's your preference, here's my recommendation"
- Authority can override, but must be documented as "decision override based on non-technical factors"
Red flags: "CTO wants", "We should align with leadership", "Don't want to contradict"
"We've Already Invested Time" (Sunk Cost)
Detection: "We've had 2 sales calls", "Demo account set up", "Already started integration"
Why it's tempting: Feels wasteful to "go backwards", momentum toward choice
Why it fails: Sunk cost fallacy - past investment doesn't validate future commitment. Small sunk cost vs large future cost (vendor lock-in, wrong tool).
Counter:
- Name the fallacy: "This is sunk cost fallacy"
- Calculate future cost: "2 sales calls (4 hours sunk) vs 3-year vendor lock-in (hundreds of hours if wrong choice)"
- Reframe: "We invested 4 hours evaluating Option A. Should we invest 2 hours evaluating Options B and C to validate?"
- Past investment gives you evaluation data, not decision commitment
Red flags: "We've already", "Going backwards", "Wasted effort"
"Trust the Vendor" / "99.9% SLA"
Detection: "Established company", "Good reputation", "SLA guarantees uptime"
Why it's tempting: Vendor reputation, SLA promises reduce perceived risk
Why it fails: SLAs are probabilistic, not guarantees. 99.9% = 43 minutes downtime per month. All vendors have outages. Trust ≠ technical mitigation.
Counter:
- Calculate SLA impact: 99.9% uptime = 43 min/month, 8.76 hours/year. Acceptable for your use case?
- Mitigation still required: Circuit breaker, fallback, queueing, graceful degradation
- Vendor reputation reduces probability but doesn't eliminate risk
- Question: "What happens to our users if vendor API is down for 1 hour? Do we have a plan?"
Red flags: "We can trust them", "SLA is good enough", "Reputable company"
"We'll Fix It If It Happens"
Detection: "Handle issues as they come up", "React when needed", "Cross that bridge"
Why it's tempting: Defers work, avoids speculation, focuses on current tasks
Why it fails: Reactive firefighting costs 3-10x proactive mitigation. Incidents occur when you have least capacity to respond (deadlines, weekends, vacations).
Counter:
- Cost math: 1 hour mitigation planning now vs 10 hours firefighting later
- Reactive timing: Incidents don't wait for convenient times - they hit during sprints, before demos, on Friday evenings
- Level 2 requirement: Mitigation plan for high-probability or high-impact risks BEFORE acceptance
- Question: "Do you have 10 hours next week to drop everything and firefight this risk if it materializes?"
Red flags: "We'll handle it", "If it happens", "Cross that bridge when we come to it"
"Risks Haven't Materialized" (Complacency)
Detection: "4 months in, no issues", "Original risks didn't hit", "We're good"
Why it's tempting: Past success validates approach, monitoring feels wasteful
Why it fails: Risks evolve throughout project lifecycle. Absence of risks to-date ≠ absence of future risks. Complacency before late-stage crunch (integration, final testing, deployment).
Counter:
- Lifecycle risk evolution: Early risks (requirements, team ramp-up) vs late risks (integration, tech debt, timeline crunch)
- Month 4 of 6: Integration testing, timeline pressure, technical debt, scope control
- Level 3 requirement: Scheduled risk reviews, not set-and-forget
- New risks emerge, probabilities shift, priorities change
Red flags: "No problems yet", "We're on track", "Monitoring feels like overhead"
"Process Feels Like Bureaucracy"
Detection: "Overhead", "Red tape", "Meetings for meetings' sake", "We want to code"
Why it's tempting: Team wants to deliver, documentation feels unproductive
Why it fails: Lightweight process prevents heavyweight problems. 30 min planning saves hours of firefighting. Process ≠ bureaucracy.
Counter:
- Process vs bureaucracy: Process has ROI (30 min → saves hours). Bureaucracy has no ROI (forms for forms' sake).
- Lightweight governance: 20-min ADR, 30-min risk session, 15-min risk review
- Cost comparison: 30 min process now vs 10+ hours crisis later
- Question: "Would you rather spend 30 minutes planning or 10 hours firefighting next month?"
Red flags: "Bureaucracy", "Overhead", "Red tape", "Slows us down"
"We're Tired / Under Pressure"
Detection: "Just finished major release", "Deadline is tight", "Team exhausted"
Why it's tempting: Exhaustion and deadlines are real, shortcuts feel necessary
Why it fails: Shortcuts under pressure create more pressure later. Technical debt compounds into crisis. Skipping governance creates future exhaustion.
Counter:
- Compound effect: Skipping governance now creates 3x more work later
- Pressure math: 2 hours deadline pressure now vs 10+ hours crisis pressure later
- When you're exhausted is exactly when you need process (prevents mistakes)
- Question: "Will skipping governance make the NEXT deadline easier or harder?"
Red flags: "We're exhausted", "Too busy", "Under pressure", "Just this once"
"We'll Document Later"
Detection: "After we ship", "When we have time", "In the next sprint"
Why it's tempting: Defers effort, focuses on delivery now
Why it fails: "Later" never comes. Context is lost immediately. Future maintainers suffer.
Counter:
- Historical pattern: "Later" has 5% success rate (documented fact)
- Context loss: Starts immediately, complete within 2 weeks
- Requirement: Documentation is part of "done", not optional follow-up
- Question: "When exactly is 'later'? Put it on the calendar now."
Red flags: "Later", "After we ship", "When we have time", "Eventually"
Handling "My Project Is Special" Exceptions
Common exception requests:
- "We're a startup, need to move fast"
- "This is just an MVP/prototype"
- "We'll upgrade to proper governance after product-market fit"
- "Our team is experienced, we don't need process"
- "This project is different because..."
Why it's tempting: Context appears legitimately exceptional, constraints feel unique, team confidence is high
Why it fails: Every team thinks they're special. Startups fail from poor decisions as often as slow delivery. "MVP" and "prototype" often become production. "After product-market fit" never arrives.
Response framework:
-
Acknowledge the constraint: "Startup time pressure IS real, I understand the urgency"
-
Reframe governance as enabler: "Governance prevents the costly mistakes that kill startups - bad vendor choices, unmitigated risks, undocumented decisions that create chaos during scaling"
-
Offer Level 2 as minimum: "Not suggesting heavyweight process. Level 2 = 30-min risk session, ADRs for high-risk decisions only. That's ~2% overhead, not 20%."
-
Make exception criteria explicit:
- Non-negotiable Level 2 minimum: Security-critical, customer-facing, financially-material, or >3 month projects
- Level 1 acceptable: Internal prototype <2 weeks, single developer, throwaway code, no audit trail needed
- Exception must be documented: If declaring Level 1, document why in project README
-
Show the math: "30 minutes planning vs 10+ hours crisis firefighting. Which timeline can your startup afford?"
Red flags indicating exception is rationalization, not legitimate:
- "Just this once" (it's never just once)
- "We'll add governance later" (later has 5% success rate)
- "Trust me, I've done this before" (past success ≠ future guarantee)
- Can't articulate specific criteria for when to upgrade governance
Non-negotiable boundaries:
- Customer-facing code: Minimum Level 2
- Financial transactions: Minimum Level 2
- Security-sensitive (PII, auth, payments): Minimum Level 2
-
3 month timeline: Minimum Level 2
-
5 developers: Minimum Level 3
Example response:
User: "We're a startup MVP, can we skip the ADRs?"
Agent: "I understand the pressure to move fast. Let's find the right balance:
Your MVP is customer-facing with payment processing → Level 2 minimum, not negotiable.
Level 2 = lightweight, not heavyweight:
- ADRs for high-risk decisions only (database, payment gateway, auth provider)
- 30-min risk session (identify risks, plan mitigation for top 3)
- Total overhead: ~2% of project time
This prevents the mistakes that kill MVPs:
- Wrong payment provider → costly migration mid-growth
- No auth security planning → breach before Series A
- Undocumented decisions → chaos when scaling team
ROI: 2 hours planning saves 20+ hours crisis firefighting.
Can we start with risk identification? 30 minutes now."
Reference Sheets
The following reference sheets provide detailed methodologies for specific governance domains. Load them on-demand when needed.
1. Decision Analysis & Resolution (DAR)
When to use: Making architectural decisions, evaluating alternatives, documenting choices
→ See dar-methodology.md
Covers:
- When ADRs are mandatory vs optional
- ADR template and examples
- Decision criteria frameworks
- Alternatives analysis process
- Decision matrix tools
- Authority bias resistance
2. Risk Management (RSKM)
When to use: Identifying risks, assessing probability/impact, planning mitigation, monitoring risks
→ See rskm-methodology.md
Covers:
- Risk identification techniques
- Probability × Impact matrix
- Risk mitigation strategies (avoid, transfer, mitigate, accept)
- Risk register template
- Monitoring and review cadence
- Risk triggers for ad-hoc reviews
3. Templates and Examples
When to use: Need concrete templates for ADRs or risk registers
→ See templates.md
Covers:
- ADR template (lightweight and comprehensive)
- Risk register format
- Decision matrix template
- Real-world examples
4. Level 2→3→4 Scaling
When to use: Understanding appropriate governance rigor for project tier
→ See level-scaling.md
Covers:
- Level 2 baseline practices
- Level 3 organizational standards
- Level 4 quantitative management
- When to escalate or de-escalate rigor
Common Mistakes
| Mistake | Why It Fails | Better Approach |
|---|---|---|
| "Obvious" decisions undocumented | Context loss in 6 months, assumptions not validated | Level 3: Document all architectural decisions, even "obvious" ones |
| Alternatives analysis after commitment | Analysis becomes validation theater | Evaluate alternatives BEFORE authority/consensus input |
| Risk acceptance without mitigation | Reactive firefighting costs 3-10x | Mitigation plan required for high-probability or high-impact risks |
| Set-and-forget risk planning | Risks evolve, complacency before late-stage crunch | Scheduled reviews based on project length |
| Deferring to authority without analysis | Authority bias, vendor relationships create blind spots | Independent analysis first, authority input second |
| Sunk cost justifies decision | Small sunk cost vs large future cost | Name the fallacy, calculate future cost |
| "We'll document later" | "Later" never comes (5% success rate) | Documentation = part of "done" |
Integration with Other Skills
| When You're Doing | Also Use | For |
|---|---|---|
| Creating ADRs | design-and-build |
Technical decision criteria |
| Risk identification for security | ordis-security-architect |
Security-specific risk techniques |
| Decision analysis with data | quantitative-management |
Quantitative decision criteria |
| Requirements with risks | requirements-lifecycle |
Risk-driven requirements prioritization |
Real-World Impact
Without this skill: Teams experience:
- "Obvious" decisions become mysterious (context loss)
- Authority bias and groupthink (bad decisions)
- Reactive firefighting (3-10x cost)
- No risk mitigation (crisis mode when risks materialize)
- Documentation never happens ("later")
With this skill: Teams achieve:
- Documented decisions with rationale (knowledge retention)
- Independent alternatives analysis (better decisions)
- Proactive risk mitigation (prevent crisis)
- Ongoing risk monitoring (adapt to changing conditions)
- Governance as lightweight process (ROI-positive)
Next Steps
- Determine project level: Check CLAUDE.md or ask user for CMMI target level (default: Level 3)
- Identify situation: Use Quick Reference table to find applicable framework
- Load reference sheet: Read detailed methodology (DAR or RSKM)
- Enforce requirements: Level 3 requires ADRs for all architectural decisions, risk mitigation for high risks
- Counter rationalizations: Use anti-pattern catalog to address shortcuts
- Provide templates: Lightweight ADR or risk register to reduce friction
- Calculate ROI: Show cost comparison (30 min planning vs 10+ hours firefighting)
Remember: Proactive governance prevents costly reactive firefighting. Documentation and risk management are investments with 3-10x returns.