confidence-calibration
Confidence Calibration Framework
When This Activates
This skill activates when:
- Expressing uncertainty about a suggestion
- Working in a domain with past errors
- User asks "how confident are you?"
- Making predictions or recommendations
Domain Tracking
The system tracks prediction accuracy across domains:
| Domain Category | Examples |
|---|---|
| Infrastructure | docker, kubernetes, nginx, ci/cd |
| Frontend | react, react-native, nextjs, expo |
| Languages | typescript, javascript, python |
| Backend | firebase, firestore, authentication |
| Operations | testing, git, database, api |
| Optimization | performance, security, caching |
Calibration Data Structure
{
"domain_stats": {
"docker": {
"correct": 12,
"incorrect": 3,
"partial": 2,
"accuracy": 0.71
}
},
"overall": {
"correct": 145,
"incorrect": 23,
"partial": 18
}
}
How to Express Calibrated Confidence
High Confidence (>85% domain accuracy)
"This approach should work well - it follows established patterns."
Medium Confidence (60-85% accuracy)
"This is my best assessment, though you may want to verify [specific aspect]."
Low Confidence (<60% accuracy, or past errors in domain)
"I've had some misses in [domain] before. Let me double-check this..."
"I'm less certain here - consider testing thoroughly before proceeding."
Unknown Domain
"I don't have much track record in [area]. Proceed with appropriate caution."
Self-Awareness Triggers
When working in a domain with past errors:
- Check track record before making recommendations
- Acknowledge past mistakes if relevant: "I've gotten Docker networking wrong before..."
- Suggest verification for uncertain areas
- Ask clarifying questions rather than guessing
Recording Outcomes
When the user indicates an outcome:
Success signals:
- "That worked!"
- "Perfect"
- "Thanks, it's fixed"
Failure signals:
- "That didn't work"
- "Still broken"
- "Wrong"
Partial signals:
- "Almost"
- "Partly fixed"
- "One issue remaining"
Domain Detection Keywords
DOMAIN_KEYWORDS = {
"docker": ["docker", "container", "dockerfile", "compose"],
"react": ["react", "component", "jsx", "hooks", "useState"],
"react-native": ["react native", "expo", "metro"],
"nextjs": ["next.js", "nextjs", "getServerSideProps"],
"typescript": ["typescript", "type", "interface"],
"firebase": ["firebase", "firestore"],
"authentication": ["auth", "login", "token", "jwt"],
"testing": ["test", "jest", "mock", "coverage"],
"git": ["git", "commit", "branch", "merge"],
"performance": ["slow", "optimize", "cache", "memory"]
}
Integration with Learning System
Confidence data feeds into:
<semantic-memory>context injection- ReasoningBank for pattern matching
- Preference learner for style calibration
Example Workflow
User: "Set up Docker networking between containers"
1. Detect domain: docker
2. Check calibration: docker accuracy = 71%
3. Check past corrections: "Docker can't use Metal GPU on Mac"
4. Respond with calibrated confidence:
"For container networking, you'll want a bridge network.
Note: I've had some edge cases with Docker networking before,
so if this doesn't work immediately, the issue is usually
DNS resolution between containers."
More from jamelna-apps/claude-dash
cost-tracking
When user mentions "spending", "usage", "tokens", "API cost", "budget", "expensive", or wants to understand Claude API costs. Provides cost awareness and optimization guidance.
11page-cro
When the user mentions "conversion", "CRO", "landing page", "not converting", "bounce rate", "optimize page", or asks about improving page performance.
7session-handoff
When user says "continue", "pick up where we left off", "last time", "previous session", "what were we doing", or wants explicit session continuity. Provides structured context handoff between sessions.
4error-diagnosis
When user encounters "error", "exception", "failed", "stack trace", "crashed", or needs error categorization. Provides structured root cause analysis and prevention strategies.
4code-review
When the user mentions "review", "PR", "pull request", "code review", "check my code", "feedback on", or asks for code quality assessment.
3smart-routing
When deciding which Claude model (Opus/Sonnet/Haiku) to use, or when "route", "which model", "complex task", "multi-file", "architectural", or "deep debugging" is mentioned. Guides quality-first model selection.
3