testing-llm
Installation
SKILL.md
LLM & AI Testing Patterns
Patterns and tools for testing LLM integrations, evaluating AI output quality, mocking responses for deterministic CI, and applying agentic test workflows (planner, generator, healer).
Quick Reference
| Area | File | Purpose |
|---|---|---|
| Rules | rules/llm-evaluation.md |
DeepEval quality metrics, Pydantic schema validation, timeout testing |
| Rules | rules/llm-mocking.md |
Mock LLM responses, VCR.py recording, custom request matchers |
| Reference | references/deepeval-ragas-api.md |
Full API reference for DeepEval and RAGAS metrics |
| Reference | references/generator-agent.md |
Transforms Markdown specs into Playwright tests |
| Reference | references/healer-agent.md |
Auto-fixes failing tests (selectors, waits, dynamic content) |
| Reference | references/planner-agent.md |
Explores app and produces Markdown test plans |
| Checklist | checklists/llm-test-checklist.md |
Complete LLM testing checklist (setup, coverage, CI/CD) |
| Example | examples/llm-test-patterns.md |
Full examples: mocking, structured output, DeepEval, VCR, golden datasets |