llm-evaluation
Installation
SKILL.md
LLM Evaluation
Evaluate and validate LLM outputs for quality assurance using RAGAS and LLM-as-judge patterns.
Quick Reference
LLM-as-Judge Pattern
async def evaluate_quality(input_text: str, output_text: str, dimension: str) -> float:
response = await llm.chat([{
"role": "user",
"content": f"""Evaluate for {dimension}. Score 1-10.
Input: {input_text[:500]}
Output: {output_text[:1000]}
Respond with just the number."""
}])
return int(response.content.strip()) / 10