risk-register
Risk Register Skill
When creating or updating a risk register, follow this structured process. The goal is to maintain a living document that surfaces project risks early enough to act on them — before they become incidents, missed deadlines, or scope explosions.
IMPORTANT: Always save the output as a markdown file in the project-decisions/ directory at the project root. Create the directory if it doesn't exist.
PRINCIPLE: A good risk register is not a one-time document. It should be reviewed and updated every sprint. Risks change — new ones appear, old ones are mitigated, some become reality.
0. Output Setup
mkdir -p project-decisions
# File naming:
# First time: project-decisions/YYYY-MM-DD-risk-register.md
# Updates: Edit the existing file, add to the changelog at the bottom
# If no existing register exists, create a new one
# If one exists, update it
ls project-decisions/*risk-register* 2>/dev/null
1. Risk Discovery
1a. Codebase & Technical Risks
# Complexity hotspots (high complexity = high risk of bugs)
find . -type f \( -name "*.ts" -o -name "*.js" -o -name "*.py" \) ! -path '*/node_modules/*' ! -path '*/dist/*' -exec wc -l {} + 2>/dev/null | sort -rn | head -15
# Files with highest churn (most changes = most fragile)
git log --name-only --since="3 months ago" --format="" -- src/ app/ 2>/dev/null | sort | uniq -c | sort -rn | head -15
# Files with most bug fixes (where problems live)
git log --name-only --since="6 months ago" --grep="fix\|bug\|hotfix" --format="" -- src/ app/ 2>/dev/null | sort | uniq -c | sort -rn | head -10
# Dependency vulnerabilities
npm audit --json 2>/dev/null | head -50
pip audit 2>/dev/null | head -20
# Outdated dependencies
npm outdated 2>/dev/null | head -20
pip list --outdated 2>/dev/null | head -20
# TODO/FIXME/HACK count (unaddressed known issues)
echo "TODO: $(grep -rn 'TODO' --include='*.ts' --include='*.js' --include='*.py' src/ app/ 2>/dev/null | grep -v 'node_modules' | wc -l)"
echo "FIXME: $(grep -rn 'FIXME' --include='*.ts' --include='*.js' --include='*.py' src/ app/ 2>/dev/null | grep -v 'node_modules' | wc -l)"
echo "HACK: $(grep -rn 'HACK' --include='*.ts' --include='*.js' --include='*.py' src/ app/ 2>/dev/null | grep -v 'node_modules' | wc -l)"
# Test coverage gaps (untested code = risk)
find src/ app/ -type f \( -name "*.ts" -o -name "*.js" -o -name "*.py" \) ! -name "*.test.*" ! -name "*.spec.*" ! -name "test_*" ! -name "*.d.ts" ! -name "index.*" ! -path '*/node_modules/*' ! -path '*/dist/*' 2>/dev/null | while read f; do
base=$(basename "$f" | sed 's/\.\(ts\|tsx\|js\|jsx\|py\)$//')
if ! find . \( -name "${base}.test.*" -o -name "${base}.spec.*" -o -name "test_${base}.*" \) ! -path '*/node_modules/*' 2>/dev/null | grep -q .; then
echo "UNTESTED: $f"
fi
done | head -20
# Single points of failure (bus factor)
for f in $(git log --name-only --since="12 months ago" --format="" -- src/ 2>/dev/null | sort -u | head -30); do
authors=$(git log --format='%aN' --since="12 months ago" -- "$f" 2>/dev/null | sort -u | wc -l)
if [ "$authors" -eq 1 ]; then
echo "BUS FACTOR 1: $f ($(git log --format='%aN' -1 -- "$f" 2>/dev/null))"
fi
done | head -15
# Missing error handling in critical paths
grep -rn "catch\|except\|rescue" --include="*.ts" --include="*.js" --include="*.py" src/ 2>/dev/null | wc -l
grep -rn "async function\|async def\|async (" --include="*.ts" --include="*.js" --include="*.py" src/ 2>/dev/null | wc -l
# Infrastructure configuration
cat docker-compose.yml Dockerfile 2>/dev/null | head -40
cat .github/workflows/*.yml 2>/dev/null | head -40
# Check for health checks and monitoring
grep -rn "health\|readiness\|liveness\|monitor\|sentry\|datadog\|prometheus" --include="*.ts" --include="*.js" --include="*.py" --include="*.yaml" --include="*.yml" . 2>/dev/null | grep -v "node_modules" | head -10
# Check for secrets management
grep -rn "process\.env\|os\.environ\|os\.Getenv" --include="*.ts" --include="*.js" --include="*.py" --include="*.go" src/ app/ 2>/dev/null | grep -v "node_modules\|test\|spec" | wc -l
ls .env .env.local .env.production 2>/dev/null
1b. Project & Delivery Risks
Evaluate from context, PRDs, recent activity:
# Recent velocity (commits per week)
for week in 4 3 2 1 0; do
start=$(date -d "$((week+1)) weeks ago" +%Y-%m-%d 2>/dev/null || date -v-$((week+1))w +%Y-%m-%d 2>/dev/null)
end=$(date -d "$week weeks ago" +%Y-%m-%d 2>/dev/null || date -v-${week}w +%Y-%m-%d 2>/dev/null)
count=$(git log --oneline --after="$start" --before="$end" 2>/dev/null | wc -l)
echo "Week -$week: $count commits"
done
# PR cycle time (how long PRs stay open)
gh pr list --state merged --limit 10 --json number,title,createdAt,mergedAt 2>/dev/null | head -40
# Open PRs (work in progress)
gh pr list --state open --json number,title,createdAt,author 2>/dev/null | head -20
# Pending issues
gh issue list --state open --limit 20 --json number,title,labels,createdAt 2>/dev/null | head -40
# Recent incidents
ls project-decisions/*incident* 2>/dev/null
# Recent scope changes or decision records
ls project-decisions/ 2>/dev/null | tail -10
# Check for deadline references
grep -rn "deadline\|due date\|launch\|go-live\|ship by\|target date" --include="*.md" . 2>/dev/null | grep -v "node_modules\|\.git" | head -10
2. Risk Categories
Technical Risks
| ID | Risk Category | What to Look For |
|---|---|---|
| T1 | Architecture | Single points of failure, monolith pain points, scaling bottlenecks, circular dependencies |
| T2 | Code Quality | High complexity files, low test coverage, excessive tech debt, code smells |
| T3 | Dependencies | Vulnerable packages, outdated major versions, unmaintained libraries, license issues |
| T4 | Security | Exposed secrets, injection vulnerabilities, auth gaps, data exposure |
| T5 | Performance | Slow queries, memory leaks, missing caching, N+1 problems |
| T6 | Data | Missing backups, no migration rollback, data integrity gaps, missing validation |
| T7 | Infrastructure | No redundancy, manual deployments, missing monitoring, no auto-scaling |
| T8 | Integration | Flaky third-party APIs, missing circuit breakers, undocumented API contracts |
Delivery Risks
| ID | Risk Category | What to Look For |
|---|---|---|
| D1 | Timeline | Unrealistic deadlines, scope creep, incomplete requirements, blocked tasks |
| D2 | Resources | Team capacity constraints, key person dependency, skill gaps, competing priorities |
| D3 | Scope | Vague requirements, missing acceptance criteria, unbounded features, no MVP definition |
| D4 | Dependencies | Cross-team blockers, external vendor timelines, design deliverables, stakeholder approvals |
| D5 | Communication | Unclear ownership, missing documentation, no stakeholder alignment, siloed knowledge |
Operational Risks
| ID | Risk Category | What to Look For |
|---|---|---|
| O1 | Availability | No SLA defined, missing health checks, no incident response plan, no runbooks |
| O2 | Disaster Recovery | No backup strategy, untested recovery, missing failover, no RTO/RPO targets |
| O3 | Compliance | GDPR gaps, missing audit logging, data retention policy unclear, security certifications pending |
| O4 | Support | No on-call rotation, missing runbooks, no escalation path, knowledge silos |
Business Risks
| ID | Risk Category | What to Look For |
|---|---|---|
| B1 | Market | Competitive pressure, changing requirements, pivoting product direction |
| B2 | Vendor | Vendor lock-in, pricing changes, vendor stability, contract expiry |
| B3 | Revenue | Payment system reliability, billing accuracy, churn risk from outages |
| B4 | Reputation | Data breach risk, public-facing outage risk, user trust |
3. Risk Scoring
Likelihood Scale
| Score | Level | Definition | Probability |
|---|---|---|---|
| 1 | Rare | Could happen but very unlikely in the next 3 months | < 10% |
| 2 | Unlikely | Possible but not expected | 10-30% |
| 3 | Possible | Could go either way | 30-60% |
| 4 | Likely | More likely than not | 60-85% |
| 5 | Almost Certain | Will very likely happen | > 85% |
Impact Scale
| Score | Level | Definition | Examples |
|---|---|---|---|
| 1 | Negligible | Minor inconvenience, no user impact | Cosmetic bug, minor delay |
| 2 | Minor | Small user impact, easy to fix | Edge case bug, 1-2 day delay |
| 3 | Moderate | Noticeable impact, workaround exists | Feature degraded, 1 week delay |
| 4 | Major | Significant impact, hard to work around | Core feature broken, 2+ week delay, partial data loss |
| 5 | Severe | Critical failure, no workaround | Full outage, data breach, project cancelled, regulatory fine |
Risk Score Matrix
IMPACT
1 2 3 4 5
┌─────┬─────┬─────┬─────┬─────┐
5 │ 5 │ 10 │ 15 │ 20 │ 25 │
│ 🟡 │ 🟠 │ 🔴 │ 🔴 │ 🔴 │
L ──────┼─────┼─────┼─────┼─────┼─────┤
I 4 │ 4 │ 8 │ 12 │ 16 │ 20 │
K │ 🟢 │ 🟡 │ 🟠 │ 🔴 │ 🔴 │
E ──────┼─────┼─────┼─────┼─────┼─────┤
L 3 │ 3 │ 6 │ 9 │ 12 │ 15 │
I │ 🟢 │ 🟡 │ 🟡 │ 🟠 │ 🔴 │
H ──────┼─────┼─────┼─────┼─────┼─────┤
O 2 │ 2 │ 4 │ 6 │ 8 │ 10 │
O │ 🟢 │ 🟢 │ 🟡 │ 🟡 │ 🟠 │
D ──────┼─────┼─────┼─────┼─────┼─────┤
1 │ 1 │ 2 │ 3 │ 4 │ 5 │
│ 🟢 │ 🟢 │ 🟢 │ 🟢 │ 🟡 │
└─────┴─────┴─────┴─────┴─────┘
Score ranges:
🟢 Low (1-4): Accept — monitor, no immediate action
🟡 Medium (5-9): Mitigate — plan mitigation, review regularly
🟠 High (10-15): Act — active mitigation required, escalate
🔴 Critical (16-25): Urgent — immediate action, executive visibility
Risk Score Calculation
Risk Score = Likelihood × Impact
Example:
Risk: "Key developer leaves before project completion"
Likelihood: 3 (Possible)
Impact: 4 (Major — critical knowledge loss, 2+ week delay)
Score: 3 × 4 = 12 (🟠 High)
4. Risk Response Strategies
For each identified risk, choose a response strategy:
| Strategy | When to Use | Example |
|---|---|---|
| Avoid | Eliminate the risk entirely by changing approach | Don't use the unproven technology; use the established one instead |
| Mitigate | Reduce likelihood or impact | Add tests, create documentation, build redundancy |
| Transfer | Shift risk to a third party | Use managed service instead of self-hosting; buy insurance |
| Accept | Risk is low enough or unavoidable | Known minor UI bug that doesn't affect core functionality |
| Contingency | Prepare a plan B if the risk materializes | Rollback plan, backup vendor, alternative approach ready |
5. Risk Register Entry Format
Each risk should include:
### RISK-[ID]: [Title]
| Field | Value |
|-------|-------|
| **Category** | [Technical / Delivery / Operational / Business] |
| **Subcategory** | [T1-T8 / D1-D5 / O1-O4 / B1-B4] |
| **Description** | [What could happen and why] |
| **Trigger** | [What event or condition would cause this risk to materialize] |
| **Likelihood** | [1-5] [Rare/Unlikely/Possible/Likely/Almost Certain] |
| **Impact** | [1-5] [Negligible/Minor/Moderate/Major/Severe] |
| **Score** | [L × I] [🟢/🟡/🟠/🔴] |
| **Response** | [Avoid / Mitigate / Transfer / Accept / Contingency] |
| **Mitigation** | [Specific actions to reduce likelihood or impact] |
| **Contingency** | [What to do if the risk materializes] |
| **Owner** | [Person responsible for monitoring and acting] |
| **Status** | [Open / Mitigating / Mitigated / Accepted / Realized / Closed] |
| **Due Date** | [When mitigation should be complete] |
| **Evidence** | [Data from codebase scan, metrics, or observations] |
| **Linked Items** | [Related tickets, incidents, decisions] |
| **Last Reviewed** | [Date] |
6. Automated Risk Detection Rules
Auto-Flag as 🔴 Critical
IF any of these are true, auto-flag as critical risk:
- Dependency with known critical CVE (CVSS ≥ 9.0)
- Secrets/credentials committed to git
- Production database has no backup configured
- Zero test coverage on authentication or payment code
- Single point of failure in production architecture
- No rollback strategy for upcoming deployment
- Key person dependency on critical path with no documentation
- Deadline is < 2 weeks and > 30% of scope is incomplete
Auto-Flag as 🟠 High
IF any of these are true, auto-flag as high risk:
- Dependency with known high CVE (CVSS ≥ 7.0)
- Test coverage < 30% on modified files
- Files with > 500 lines and no tests
- Bus factor of 1 on > 5 critical files
- More than 20 unresolved TODOs/FIXMEs in critical paths
- No monitoring/alerting on production service
- Third-party API with no circuit breaker or fallback
- Sprint velocity declining for 3+ consecutive sprints
- PR cycle time > 5 days average
Auto-Flag as 🟡 Medium
IF any of these are true, auto-flag as medium risk:
- Dependencies > 6 months outdated
- No API documentation for public endpoints
- Missing .env.example or setup documentation
- No runbook for common failure scenarios
- Inconsistent error handling patterns
- Code duplication detected across > 3 files
7. Risk Trends
Track how risks change over time:
### Risk Trend: [Risk Title]
| Date | Likelihood | Impact | Score | Change | Notes |
|------|-----------|--------|-------|--------|-------|
| 2026-01-15 | 3 | 4 | 12 🟠 | — | Initial assessment |
| 2026-01-29 | 3 | 4 | 12 🟠 | → | No change, mitigation in progress |
| 2026-02-12 | 2 | 4 | 8 🟡 | ↓ | Tests added, documentation improved |
| 2026-02-19 | 2 | 3 | 6 🟡 | ↓ | Second engineer onboarded to module |
Trend: ↓ Improving
Trend symbols:
↑ Worsening (score increased)
→ Stable (no change)
↓ Improving (score decreased)
⚡ Realized (risk became an actual issue)
✅ Closed (risk eliminated or accepted and documented)
8. Review Cadence
Recommended review schedule:
| Review Type | Frequency | Who | Focus |
|------------|-----------|-----|-------|
| Quick scan | Every sprint | TPM | New risks, status updates, score changes |
| Full review | Monthly | TPM + Tech Lead | All risks, trends, mitigation effectiveness |
| Deep dive | Quarterly | Full team | Architecture risks, strategic risks, historical trends |
| Ad-hoc | As needed | TPM | After incidents, major scope changes, team changes |
Output Document Template
Save to project-decisions/YYYY-MM-DD-risk-register.md:
# Project Risk Register
**Project:** [Project Name]
**Last Updated:** YYYY-MM-DD
**Updated By:** [Name]
**Next Review:** YYYY-MM-DD
**Overall Risk Level:** [🟢 Low / 🟡 Medium / 🟠 High / 🔴 Critical]
---
## Risk Summary
| Severity | Count | Trend |
|----------|-------|-------|
| 🔴 Critical | X | [↑/→/↓] |
| 🟠 High | X | [↑/→/↓] |
| 🟡 Medium | X | [↑/→/↓] |
| 🟢 Low | X | [↑/→/↓] |
| **Total Open** | **X** | |
| Mitigated this period | X | |
| New this period | X | |
| Realized (became issues) | X | |
---
## Risk Heat Map
IMPACT
1 2 3 4 5
┌─────┬─────┬─────┬─────┬─────┐
5 │ │ │ │ R03 │ │
L ──────┼─────┼─────┼─────┼─────┼─────┤ I 4 │ │ │ R07 │ R01 │ │ K ──────┼─────┼─────┼─────┼─────┼─────┤ E 3 │ │ R09 │ R04 │ R02 │ │ L ──────┼─────┼─────┼─────┼─────┼─────┤ I 2 │ R10 │ R08 │ R06 │ │ │ H ──────┼─────┼─────┼─────┼─────┼─────┤ O 1 │ │ R11 │ R05 │ │ │ O └─────┴─────┴─────┴─────┴─────┘ D
---
## Top Risks Requiring Action
| Rank | ID | Risk | Score | Owner | Status | Due |
|------|----|------|-------|-------|--------|-----|
| 1 | R01 | [Title] | 16 🔴 | [Name] | [Status] | [Date] |
| 2 | R02 | [Title] | 12 🟠 | [Name] | [Status] | [Date] |
| 3 | R03 | [Title] | 20 🔴 | [Name] | [Status] | [Date] |
---
## Full Risk Register
### 🔴 Critical Risks
#### RISK-001: [Title]
| Field | Value |
|-------|-------|
| **Category** | [Category] |
| **Description** | [What could happen] |
| **Trigger** | [What would cause this] |
| **Likelihood** | [X] — [Level] |
| **Impact** | [X] — [Level] |
| **Score** | [XX] 🔴 |
| **Response** | [Strategy] |
| **Mitigation** | [Actions] |
| **Contingency** | [Plan B] |
| **Owner** | [Name] |
| **Status** | [Status] |
| **Due Date** | [Date] |
| **Evidence** | [Codebase findings] |
| **Last Reviewed** | [Date] |
**Trend:**
| Date | L | I | Score | Change | Notes |
|------|---|---|-------|--------|-------|
| [Date] | X | X | XX | — | [Notes] |
---
[Repeat for each risk...]
---
### 🟠 High Risks
[Same format...]
### 🟡 Medium Risks
[Same format...]
### 🟢 Low Risks
[Same format...]
---
## Realized Risks (became actual issues)
| ID | Risk | Realized Date | Impact | Incident Link |
|----|------|--------------|--------|--------------|
| R05 | [Title] | YYYY-MM-DD | [Actual impact] | [Link to incident report] |
---
## Closed Risks
| ID | Risk | Closed Date | Reason |
|----|------|------------|--------|
| R12 | [Title] | YYYY-MM-DD | [Mitigated / Accepted / No longer relevant] |
---
## Risk Metrics
| Metric | Current | Previous | Trend |
|--------|---------|----------|-------|
| Total open risks | X | X | [↑/→/↓] |
| Average risk score | X.X | X.X | [↑/→/↓] |
| Critical + High risks | X | X | [↑/→/↓] |
| Risks mitigated this period | X | X | |
| Risks realized this period | X | X | |
| Mean time to mitigate | X days | X days | [↑/→/↓] |
| Overdue mitigations | X | X | [↑/→/↓] |
---
## Upcoming Mitigation Actions
| Risk ID | Action | Owner | Due | Status |
|---------|--------|-------|-----|--------|
| R01 | [Specific action] | [Name] | [Date] | ⬜ TODO |
| R02 | [Specific action] | [Name] | [Date] | 🔄 In Progress |
| R03 | [Specific action] | [Name] | [Date] | ⬜ TODO |
---
## Review Log
| Date | Type | Reviewer | Changes Made |
|------|------|----------|-------------|
| YYYY-MM-DD | Initial creation | [Name] | Created register with X risks |
| YYYY-MM-DD | Sprint review | [Name] | Updated R01, added R15, closed R05 |
| YYYY-MM-DD | Monthly review | [Name] | Full review, re-scored 3 risks |
After saving, update the project-decisions index:
echo "# Project Decisions\n" > project-decisions/README.md
echo "| Date | Decision | Type | Status |" >> project-decisions/README.md
echo "|------|----------|------|--------|" >> project-decisions/README.md
for f in project-decisions/2*.md; do
date=$(basename "$f" | cut -d'-' -f1-3)
title=$(head -1 "$f" | sed 's/^# //')
type="Other"
echo "$f" | grep -q "risk-register" && type="Risk Register"
echo "$f" | grep -q "build-vs-buy" && type="Build vs Buy"
echo "$f" | grep -q "incident" && type="Incident Report"
echo "$f" | grep -q "scope" && type="Scope Check"
echo "$f" | grep -q "impact" && type="Impact Analysis"
echo "$f" | grep -q "tech-debt" && type="Tech Debt Report"
echo "$f" | grep -q "pentest" && type="Pentest Report"
echo "$f" | grep -qv "risk-register\|build-vs-buy\|incident\|scope\|impact\|tech-debt\|pentest" && type="Tech Decision"
status=$(grep "^**Status:\|^**Overall Risk Level:\|^**Last Updated:" "$f" | head -1 | sed 's/.*: //' | sed 's/\*//g')
echo "| $date | [$title](./$(basename $f)) | $type | $status |" >> project-decisions/README.md
done
Adaptation Rules
- Always save to file — every risk register gets persisted in
project-decisions/ - Scan the codebase — don't guess at technical risks, find them with grep, git log, npm audit
- Be specific — "authService.ts has 0% test coverage and handles password hashing" not "some code is untested"
- Include evidence — every technical risk should reference actual files, metrics, or scan results
- Score consistently — use the same likelihood and impact scales every time
- Track trends — show whether each risk is improving, stable, or worsening
- Update, don't recreate — if a risk register already exists, update it rather than starting from scratch
- Link to other documents — connect realized risks to incident reports, mitigations to tech decisions
- Assign owners — unowned risks don't get mitigated
- Flag overdue mitigations — a mitigation plan that's past due is itself a risk
- Scale to project — small project gets 5-10 risks, large project gets 20-30
- Distinguish symptoms from risks — "slow API" is a symptom, "no caching strategy for growing dataset" is the risk
Summary
End every risk register with:
- Overall risk level — 🟢/🟡/🟠/🔴 based on highest open risk
- Risk count — total open, by severity
- Top 3 risks — requiring immediate attention
- New risks — added since last review
- Trend — overall trajectory (improving / stable / worsening)
- Overdue actions — mitigations past their due date
- Next review date — when this should be updated
- File saved — confirm the document location
More from aakash-dhar/claude-skills
security-audit
Scans code for security vulnerabilities including injection attacks, authentication flaws, exposed secrets, insecure dependencies, and data exposure. Use when the user says "security review", "is this secure?", "check for vulnerabilities", "audit this", or before deploying to production.
118pentest-report
Generates a structured penetration testing report based on OWASP standards including OWASP Top 10, ASVS, and WSTG methodology. Scans code for vulnerabilities, maps findings to OWASP categories, assigns CVSS scores, and produces a professional pentest report. Use when the user says "pentest report", "penetration testing", "OWASP audit", "OWASP report", "security assessment", "vulnerability assessment", "application security test", or "OWASP compliance check".
18vulnerability-report
Scans project dependencies for known vulnerabilities (CVEs), categorizes them into three severity-based reports (Critical/High, Medium, Low), and generates detailed markdown documents with remediation guidance. Saves output to project-decisions/ folder. Use when the user says "vulnerability report", "dependency vulnerabilities", "CVE report", "package vulnerabilities", "npm audit report", "dependency scan", "vulnerable packages", "security vulnerabilities in dependencies", or "generate vulnerability reports".
5code-review
Reviews code for bugs, security issues, performance problems, and adherence to best practices. Use when the user asks to "review this code", "check my code", "is this code good?", or before submitting a PR.
4tech-decision
Evaluates technical proposals, "should we do X instead of Y?" questions, tool comparisons, and architecture suggestions. Analyzes feasibility, compares options with structured pros/cons, estimates effort and risk, and provides a clear recommendation. Saves output to project-decisions/ folder. Use when the user says "should we", "what if we", "is it worth", "should we switch to", "compare X vs Y", "evaluate this proposal", "tech decision", or brings up a technical suggestion from a team discussion.
1incident-report
Generates structured incident postmortem reports by analyzing git history, recent deployments, code changes, logs, and error patterns. Produces a blameless postmortem with timeline, root cause analysis, impact assessment, remediation actions, and prevention measures. Saves output to project-decisions/ folder. Use when the user says "incident report", "postmortem", "what went wrong", "outage report", "root cause analysis", "RCA", "write a post-mortem", "incident review", "we had an incident", "production issue", or "site went down".
1