response-rater
SKILL.md
Response Rater Skill
Step 1: Define Rating Rubric
Use appropriate rubric for the content type:
For Plans:
| Dimension | Weight | Description |
|---|---|---|
| Completeness | 20% | All required sections present |
| Feasibility | 20% | Plan is realistic and achievable |
| Risk Mitigation | 20% | Risks identified with mitigations |
| Agent Coverage | 20% | Appropriate agents assigned |
| Integration | 20% | Fits with existing systems |
For Responses:
| Dimension | Weight | Description |
|---|---|---|
| Correctness | 25% | Technically accurate |
| Completeness | 25% | Addresses all requirements |
| Clarity | 25% | Easy to understand |
| Actionability | 25% | Provides clear next steps |
Step 2: Evaluate Each Dimension
Score each dimension 1-10:
## Dimension Scores
### Completeness: 8/10
- Has objectives, steps, and timeline
- Missing risk assessment section
### Feasibility: 7/10
- Most steps are achievable
- Step 3 timeline is aggressive
### Risk Mitigation: 5/10
- Only 1 risk identified
- No mitigation strategies
### Agent Coverage: 9/10
- All steps have assigned agents
- Good agent-task matching
### Integration: 8/10
- Uses existing APIs
- Minor compatibility concerns
Step 3: Calculate Overall Score
Average the weighted scores:
Overall = (8×0.2) + (7×0.2) + (5×0.2) + (9×0.2) + (8×0.2) = 7.4/10
Step 4: Generate Recommendations
Provide actionable improvements:
## Recommendations
### High Priority
1. Add risk assessment section with 3-5 risks
2. Include mitigation strategies for each risk
### Medium Priority
3. Extend Step 3 timeline by 2 days
4. Add fallback plan for external API dependency
### Low Priority
5. Add success metrics for each step
Step 5: Make Pass/Fail Decision
Apply minimum score thresholds:
| Task Type | Minimum Score |
|---|---|
| Standard | 7/10 |
| Enterprise | 8/10 |
| Critical | 9/10 |
## Decision
**Score**: 7.4/10
**Threshold**: 7/10 (Standard)
**Result**: PASS ✅
Plan approved with recommendations for improvement.
</execution_process>
<best_practices>
- Consistent Rubrics: Use same dimensions for comparable content
- Clear Thresholds: Define pass/fail criteria upfront
- Actionable Feedback: Provide specific improvements
- Prioritize Issues: Order recommendations by impact
- Document Rationale: Explain score justifications
</best_practices>
Rate this plan for user authentication implementation
Response:
## Plan Rating: User Authentication
### Scores
- Completeness: 8/10 - Good structure, missing tests section
- Feasibility: 9/10 - Realistic timeline and approach
- Risk Mitigation: 6/10 - Only 1 risk identified
- Agent Coverage: 9/10 - Good agent assignments
- Integration: 8/10 - Compatible with existing auth
### Overall: 8.0/10 ✅ PASS
### Top Recommendations
1. Add risk section with security and dependency risks
2. Include test plan for each authentication flow
3. Add rollback procedure for failed deployment
</usage_example>
Iron Laws
- ALWAYS use the same rubric dimensions when rating comparable content — inconsistent dimensions make scores meaningless and prevent valid comparison across sessions.
- NEVER issue a pass/fail decision without documenting score justification for each dimension — unjustified scores cannot be reviewed, challenged, or improved.
- ALWAYS apply defined minimum thresholds (7/10 standard, 8/10 enterprise, 9/10 critical) — ad-hoc thresholds produce inconsistent approval gates that erode trust in the rating system.
- NEVER provide vague recommendations — every recommendation must reference the specific dimension it addresses and state the concrete change required.
- ALWAYS prioritize recommendations by impact — high-priority items that would materially improve the score must be clearly distinguished from low-impact suggestions.
Anti-Patterns
| Anti-Pattern | Why It Fails | Correct Approach |
|---|---|---|
| Using different rubric dimensions for comparable content | Scores cannot be compared across sessions; the rating loses its evaluative value | Always use the same rubric (plans rubric for plans, responses rubric for responses) |
| Omitting score justification for individual dimensions | Scores without justification cannot be reviewed, verified, or acted upon | Document specific evidence for each dimension score (what was present, what was missing) |
| Setting thresholds arbitrarily per session | Inconsistent thresholds invalidate the pass/fail gate; teams lose confidence in approvals | Always apply the defined thresholds: 7/10 standard, 8/10 enterprise, 9/10 critical |
| Providing vague recommendations ("improve quality", "add more detail") | Vague feedback cannot be acted upon; no change results from the review | Reference the specific dimension, score gap, and required concrete change for each recommendation |
| Listing recommendations without priority ordering | Equal-weight feedback causes raters to address low-impact items first | Always order by impact: High (affects pass/fail threshold) before Medium before Low |
Memory Protocol (MANDATORY)
Before starting:
cat .claude/context/memory/learnings.md
After completing:
- New pattern ->
.claude/context/memory/learnings.md - Issue found ->
.claude/context/memory/issues.md - Decision made ->
.claude/context/memory/decisions.md
ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.
Weekly Installs
54
Repository
oimiragieo/agent-studioGitHub Stars
16
First Seen
Jan 27, 2026
Security Audits
Installed on
github-copilot53
gemini-cli52
cursor52
codex51
kimi-cli51
opencode51