Response Rater Skill

Step 1: Define Rating Rubric

Use appropriate rubric for the content type:

For Plans:

Dimension	Weight	Description
Completeness	20%	All required sections present
Feasibility	20%	Plan is realistic and achievable
Risk Mitigation	20%	Risks identified with mitigations
Agent Coverage	20%	Appropriate agents assigned
Integration	20%	Fits with existing systems

For Responses:

Dimension	Weight	Description
Correctness	25%	Technically accurate
Completeness	25%	Addresses all requirements
Clarity	25%	Easy to understand
Actionability	25%	Provides clear next steps

Step 2: Evaluate Each Dimension

Score each dimension 1-10:

## Dimension Scores

### Completeness: 8/10

- Has objectives, steps, and timeline
- Missing risk assessment section

### Feasibility: 7/10

- Most steps are achievable
- Step 3 timeline is aggressive

### Risk Mitigation: 5/10

- Only 1 risk identified
- No mitigation strategies

### Agent Coverage: 9/10

- All steps have assigned agents
- Good agent-task matching

### Integration: 8/10

- Uses existing APIs
- Minor compatibility concerns

Step 3: Calculate Overall Score

Average the weighted scores:

Overall = (8×0.2) + (7×0.2) + (5×0.2) + (9×0.2) + (8×0.2) = 7.4/10

Step 4: Generate Recommendations

Provide actionable improvements:

## Recommendations

### High Priority

1. Add risk assessment section with 3-5 risks
2. Include mitigation strategies for each risk

### Medium Priority

3. Extend Step 3 timeline by 2 days
4. Add fallback plan for external API dependency

### Low Priority

5. Add success metrics for each step

Step 5: Make Pass/Fail Decision

Apply minimum score thresholds:

Task Type	Minimum Score
Standard	7/10
Enterprise	8/10
Critical	9/10

## Decision

**Score**: 7.4/10
**Threshold**: 7/10 (Standard)
**Result**: PASS ✅

Plan approved with recommendations for improvement.

</execution_process>

<best_practices>

Consistent Rubrics: Use same dimensions for comparable content
Clear Thresholds: Define pass/fail criteria upfront
Actionable Feedback: Provide specific improvements
Prioritize Issues: Order recommendations by impact
Document Rationale: Explain score justifications

</best_practices>

Rate this plan for user authentication implementation

Response:

## Plan Rating: User Authentication

### Scores

- Completeness: 8/10 - Good structure, missing tests section
- Feasibility: 9/10 - Realistic timeline and approach
- Risk Mitigation: 6/10 - Only 1 risk identified
- Agent Coverage: 9/10 - Good agent assignments
- Integration: 8/10 - Compatible with existing auth

### Overall: 8.0/10 ✅ PASS

### Top Recommendations

1. Add risk section with security and dependency risks
2. Include test plan for each authentication flow
3. Add rollback procedure for failed deployment

</usage_example>

Iron Laws

ALWAYS use the same rubric dimensions when rating comparable content — inconsistent dimensions make scores meaningless and prevent valid comparison across sessions.
NEVER issue a pass/fail decision without documenting score justification for each dimension — unjustified scores cannot be reviewed, challenged, or improved.
ALWAYS apply defined minimum thresholds (7/10 standard, 8/10 enterprise, 9/10 critical) — ad-hoc thresholds produce inconsistent approval gates that erode trust in the rating system.
NEVER provide vague recommendations — every recommendation must reference the specific dimension it addresses and state the concrete change required.
ALWAYS prioritize recommendations by impact — high-priority items that would materially improve the score must be clearly distinguished from low-impact suggestions.

Anti-Patterns

Anti-Pattern	Why It Fails	Correct Approach
Using different rubric dimensions for comparable content	Scores cannot be compared across sessions; the rating loses its evaluative value	Always use the same rubric (plans rubric for plans, responses rubric for responses)
Omitting score justification for individual dimensions	Scores without justification cannot be reviewed, verified, or acted upon	Document specific evidence for each dimension score (what was present, what was missing)
Setting thresholds arbitrarily per session	Inconsistent thresholds invalidate the pass/fail gate; teams lose confidence in approvals	Always apply the defined thresholds: 7/10 standard, 8/10 enterprise, 9/10 critical
Providing vague recommendations ("improve quality", "add more detail")	Vague feedback cannot be acted upon; no change results from the review	Reference the specific dimension, score gap, and required concrete change for each recommendation
Listing recommendations without priority ordering	Equal-weight feedback causes raters to address low-impact items first	Always order by impact: High (affects pass/fail threshold) before Medium before Low

Memory Protocol (MANDATORY)

Before starting:

cat .claude/context/memory/learnings.md

After completing:

New pattern -> .claude/context/memory/learnings.md
Issue found -> .claude/context/memory/issues.md
Decision made -> .claude/context/memory/decisions.md

ASSUME INTERRUPTION: Your context may reset. If it's not in memory, it didn't happen.

response-rater