aeo-escalation
AEO Escalation
Purpose: Human-AI interface for when to interrupt and involve humans. Presents clear, actionable options and records decisions for learning.
When to Escalate
Invoke this skill when:
- Confidence score < 0.70 (advisory/blocking from aeo-core)
- Spec score < 40 (refuse from aeo-spec-validator)
- QA veto occurs (from aeo-qa-agent)
- Failure pattern can't be resolved (from aeo-failure-patterns)
- Cost limit approaching (if cost-governor enabled)
- Architecture violation detected (if architecture skill enabled)
Escalation Framework
All escalations must follow this format:
⚠️ ESCALATION REQUIRED
[Clear one-line summary of the issue]
Context:
• [Why we're here - what triggered escalation]
• [What we've tried so far]
• [What's blocking progress]
Options:
1. [Clear, actionable option] - [Expected outcome]
2. [Clear, actionable option] - [Expected outcome]
3. [Clear, actionable option] - [Expected outcome]
Recommended: [Option X] - [Brief rationale]
Your choice (1-3):
Escalation Scenarios
Scenario 1: Low Confidence (Blocking)
Trigger: Confidence 0.50-0.69
⚠️ CONFIDENCE BELOW THRESHOLD
Confidence: 0.62
Threshold: 0.70 (Autonomous)
Task: Add user authentication with email verification
Context:
• Spec score: 68/100 (minor gaps)
• Risk: Touching authentication system
• Missing: Email service configuration
• Similar tasks: 2 successful, 1 failed
Concerns:
• [Spec] No email service specified
• [Risk] Auth changes require review
• [Context] Need email provider details
Options:
1. Proceed with Resend for emails (default) - Autonomously complete with assumptions
2. Provide email service details first - Improve spec, then execute autonomously
3. Break into smaller tasks - Create subtasks for review and approval
Recommended: Option 2 - Specify email service to improve confidence
Your choice (1-3):
After user responds:
- Record decision to
$PAI_DIR/MEMORY/aeo-escalations.jsonl - Update confidence model based on outcome
- Continue with chosen path
Scenario 2: Spec Refused (Unacceptable)
Trigger: Spec score < 40
❌ CANNOT EXECUTE - SPEC TOO UNCLEAR
Spec Score: 28/100
Threshold: 40 (Minimum)
Task: Improve the signup
Why this was refused:
• Objective vague (5/15) - "improve" is subjective
• No acceptance criteria (0/15) - What defines "done"?
• Missing context (0/10) - Which signup flow?
• No dependencies listed (0/10)
• Ambiguity penalty: -10 (vague verb "improve")
What's needed:
1. **Clear Objective:** What specific improvement?
2. **Acceptance Criteria:** How do we measure success?
3. **Context:** Which page/component?
4. **Dependencies:** Libraries, APIs, services?
Good spec example:
"Add email validation to /pages/auth/signup.tsx using validator.js.
Must validate RFC 5322, reject + aliases, show inline errors.
Tests for valid, invalid, duplicate emails."
Options:
1. Provide improved spec - I'll wait for your updated specification
2. Answer clarifying questions - I'll ask specific questions to build spec
3. Use template - I'll provide a spec template for you to fill
Recommended: Option 2 - Answer questions to build spec together
Your choice (1-3):
Scenario 3: QA Veto (Security Issue)
Trigger: Security vulnerability detected
🚨 SECURITY VETO - CANNOT PROCEED
Issue: SQL Injection vulnerability
File: /routes/users.js:Line 47
Severity: CRITICAL
Current Code:
```javascript
const query = `SELECT * FROM users WHERE id = ${userId}`
Why this is dangerous: • User input directly interpolated into SQL query • Attacker can execute arbitrary SQL commands • Can expose, modify, or delete all data
Required Fix:
const query = 'SELECT * FROM users WHERE id = $1'
await db.query(query, [userId])
Action Required: • This MUST be fixed before commit • No exceptions - security is non-negotiable • All changes with security issues are blocked
Options:
- Fix immediately - I'll apply the required fix now
- Review fix together - I'll show you the fix for approval
- Explain more - I'll provide detailed explanation of vulnerability
Recommended: Option 1 - Fix immediately (this is a critical issue)
Your choice (1-3):
---
### Scenario 4: Failure Pattern Not Recognized
**Trigger:** Error not in core patterns or project patterns
❌ CANNOT AUTO-FIX - PATTERN NOT RECOGNIZED
Error: TypeError: Cannot read properties of undefined (reading 'data')
Searched: • 20 core patterns - No match • 12 project patterns - No match • Fuzzy match: None with confidence ≥ 0.70
Context: • File: /src/services/UserService.js:125 • Function: fetchUserData() • Error occurred during: API call processing
Stack trace: TypeError: Cannot read properties of undefined (reading 'data') at UserService.fetchUserData (/src/services/UserService.js:125) at async Component.render (/src/components/UserList.tsx:45)
What I've tried: • Verified module is installed ✓ • Checked imports - all valid ✓ • Reviewed recent changes - no obvious issues ✗
Options:
- Provide fix - If you know the solution, I'll apply it
- Investigate together - I'll add logging and we'll debug
- Create new pattern - After we fix it, I'll save this pattern for future
Recommended: Option 2 - Investigate together to understand root cause
Your choice (1-3):
---
### Scenario 5: Cost Limit Warning
**Trigger:** Budget 80% consumed
⚠️ COST LIMIT APPROACHING
Budget Status: • Daily budget: $10.00 • Used so far: $8.47 (84.7%) • Remaining: $1.53 • Tasks completed: 7
Current task estimate: ~$0.75 Projected overage: ~$0.22 if we continue
Options:
- Continue anyway - Proceed with current task (may exceed budget)
- Pause and review - Review completed work, decide what's essential
- Adjust scope - Complete current task with reduced scope
Recommended: Option 2 - Review progress before continuing
Your choice (1-3):
---
### Scenario 6: Architecture Violation
**Trigger:** Circular dependency detected
⚠️ ARCHITECTURE VIOLATION DETECTED
Issue: Circular dependency between modules
Circular Path: → fileA.js imports fileB.js → fileB.js imports fileA.js → (cycle detected)
Why this matters: • Creates tight coupling • Makes code hard to test • Can cause runtime errors • Violates clean architecture principles
Options:
- Extract shared code - Create new module for shared functionality
- Refactor dependencies - Restructure to remove cycle
- Defer to architect - Let architecture skill analyze and propose solution
Recommended: Option 1 - Extract shared code (most common pattern)
Your choice (1-3):
---
## Decision Recording
After human chooses an option, record to memory:
```bash
# Append to escalation log
echo '{
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"task_id": "unique-id",
"escalation_type": "low_confidence|spec_refused|qa_veto|pattern_unknown|cost_warning|arch_violation",
"confidence_before": 0.62,
"option_presented": [1, 2, 3],
"option_chosen": 2,
"human_input": "User provided email service details",
"resolution": "Improved spec, confidence increased to 0.85",
"success": true,
"learning": "Context details improve confidence by ~0.20"
}' >> ~/.claude/MEMORY/aeo-escalations.jsonl
Learning from Escalations
Weekly Analysis:
# Read escalation patterns
jq -s 'group_by(.escalation_type) | map({type: .[0].escalation_type, count: length})' \
~/.claude/MEMORY/aeo-escalations.jsonl
Insights to track:
- Most common escalation types
- Which options are chosen most frequently
- Average confidence after escalation
- Resolution time
- Success rate of different options
Integration Flow
- Trigger received from another AEO skill
- Format escalation with clear options
- Present to human with recommended option
- Wait for response
- Record decision to memory
- Execute chosen option
- Follow up - Was the resolution successful?
- Learn - Update escalation patterns
Best Practices
DO:
- Always provide 2-4 clear, actionable options
- Include a recommended option with rationale
- Explain context clearly (what led to escalation)
- Keep options mutually exclusive
- Make options specific and actionable
- Record outcome for future learning
DON'T:
- Present vague options like "continue or stop"
- Overwhelm with >4 options
- Skip the recommendation
- Forget to explain the context
- Present options that aren't actually available
- Skip recording the outcome
Example Session
AEO-Core: Confidence 0.52 - below threshold
[Invokes aeo-escalation]
Escalation: ⚠️ CONFIDENCE BELOW THRESHOLD
Confidence: 0.52
Threshold: 0.70
Task: Refactor authentication system
Concerns:
• [Spec] No clear requirements defined
• [Risk] Touching critical auth system
• [Scope] Large, unclear boundaries
Options:
1. Define requirements first - Create detailed spec
2. Break into smaller tasks - Create manageable subtasks
3. Get architect approval - Ensure approach is sound
Recommended: Option 1 - Define requirements
Human: 1
Escalation: [Recording choice]
What specific requirements do you need documented?
Human: We need to support OAuth, JWT, and API keys
Escalation: [Updating spec]
[Recalculating confidence...]
New confidence: 0.78 ✓
Proceeding with implementation.
[Records to memory: escalation_type=low_confidence, option_chosen=1, success=true]
Escalation Outcomes
Track outcomes to improve escalation quality:
Successful Outcomes:
- Human provides needed info → Task completes
- Issue resolved → Continue autonomously
- Learning captured → Better future decisions
Unsuccessful Outcomes:
- Wrong option recommended → Adjust recommendation logic
- Unclear options → Improve option clarity
- Missing context → Add more context to escalation
- Repeated escalations → Identify root cause
Use outcomes to refine escalation patterns and recommendations.