skill-testing
Skills Testing
Evaluate a skill project's effectiveness by generating test cases, executing them, and producing an expert assessment report.
Workflow
Testing a skill project involves these steps:
- Discover the target skill project
- Deep-read and understand the skill
- Generate test cases
- Execute each test case against the skill
- Evaluate results and produce report
Step 1: Discover the Target Skill Project
Locate the skill being tested. The target is the current project (workspace), not a skill already in the skill list.
- Look for
SKILL.mdin the workspace root or common skill locations (.cursor/skills/*/,.agents/skills/*/,skills/*/). - If multiple skill projects exist, ask the user which one to test.
- If no
SKILL.mdis found, inform the user and stop.
Critical: The skill under test is the one being developed in the workspace, not one from the installed skill list.
Step 2: Deep-Read and Understand the Skill
Read all key files of the target skill project:
- SKILL.md — Parse frontmatter (
name,description) and body (instructions, workflow, examples). - references/ — Read all reference docs to understand domain knowledge and supplementary guidance.
- scripts/ — Read all scripts to understand automation and tooling capabilities.
- assets/ — List asset files (do not read binary files) to understand templates and resources.
Build a mental model of:
- What the skill does (purpose and scope)
- When it triggers (description triggers)
- How it works (workflow, instructions, decision points)
- What resources it uses (scripts, references, assets)
- What output it produces
Step 3: Generate Test Cases
Generate 2-3 test cases from different perspectives. Each test case must include:
- Name: Short descriptive title
- Scenario: What the user is trying to accomplish
- Input: Concrete user message or request that should trigger the skill
- Expected Behavior: What the skill should do (steps, resources used, output)
Test Case Selection Strategy
Choose test cases that cover different dimensions:
- Happy path — A straightforward use case that matches the skill's primary purpose. Tests whether core functionality works correctly.
- Edge case / variation — A less common but valid use case. Tests the skill's flexibility and handling of variations.
- Boundary / stress — A request at the boundary of the skill's scope, or one requiring multiple features together. Tests robustness and completeness.
Using Project Test Cases
Check for user-provided test case files:
- If the user specifies files to use as test cases, use those.
- Otherwise, look for a
/test-caseor/test-casesdirectory in the project root. - If test case files are found, read them and incorporate their content into the generated test cases.
- If no test case files exist, generate all test cases from the skill understanding built in Step 2.
Web Search for Test Cases (when applicable)
If the skill involves a specific domain, library, or technology where real-world context would improve test quality:
- Use available web search tools (tavily, brave, linkup.so, exa, serpapi, etc.) to find realistic scenarios.
- Use available URL fetch tools (fetch, tavily, jina reader, etc.) to retrieve relevant page content.
- Only do this when it meaningfully improves test case realism.
Step 4: Execute Test Cases
For each test case, simulate the skill execution:
- Pretend to be a user sending the test case's input message.
- Follow the skill's instructions exactly as another Claude instance would — read the SKILL.md, follow the workflow, use referenced scripts/resources.
- Execute any scripts referenced in the skill using the shell. Capture output and errors.
- Produce the output the skill would generate for the user.
- Record the full execution trace: which files were read, which scripts were run, what decisions were made, and the final output.
Important: Execute the skill faithfully. Do not shortcut or skip steps. The goal is to see how the skill actually performs, not how it ideally should perform.
Step 5: Evaluate and Report
Read references/evaluation-criteria.md for the evaluation rubric and report template.
Evaluate each test case execution against the 7 criteria:
- Triggering & Description Quality
- Instruction Clarity & Completeness
- Degree-of-Freedom Calibration
- Resource Organization
- Script Quality (if applicable)
- Output Quality
- Error Handling & Edge Cases
Output: /test-report Folder
Create a test-report/ folder in the project root with all test artifacts:
test-report/
├── TEST-REPORT.md # Main evaluation report
├── test-case-1.md # Test case 1 definition + execution result
├── test-case-2.md # Test case 2 definition + execution result
└── test-case-3.md # Test case 3 definition + execution result (if applicable)
Strict Output Rules
MANDATORY: The
test-report/folder and its.mdfiles are the sole deliverables of this skill. You MUST follow these rules exactly:
- Only create the
test-report/folder — do NOT create any other folders (e.g., nooutput/,results/,reports/,logs/,tmp/, etc.).- Only create
.mdfiles insidetest-report/— do NOT create any other file types (no.json,.html,.txt,.csv,.yaml,.log, or any other format).- Only create the files listed above —
TEST-REPORT.mdandtest-case-N.mdfiles. Do NOT create extra files likesummary.md,index.md,raw-data.md, or anything beyond the specified structure.- Do NOT modify any existing project files — this skill is read-only with respect to the skill project under test. The only writes allowed are creating the
test-report/folder and its.mdfiles.- Do NOT create scripts, configs, or helper files — no shell scripts, no temporary files, no intermediate artifacts. All analysis and evaluation must be written directly into the markdown reports.
- If the
test-report/folder already exists, overwrite its contents rather than creating a differently named folder.
Test case files (test-case-N.md)
Each test case file must contain:
# Test Case N: [Name]
## Definition
- **Scenario**: [description]
- **Input**: [user message]
- **Expected Behavior**: [what should happen]
## Execution Trace
[Which files were read, scripts run, decisions made — step by step]
## Output
[The actual output the skill produced]
Main report (TEST-REPORT.md)
Follow the template in references/evaluation-criteria.md. The report must include:
- Per-test-case results with ratings (referencing the individual test case files)
- Evaluation summary table
- Strengths — what the skill does well
- Weaknesses — where it falls short
- Characteristics — notable traits or patterns observed
- Optimization Suggestions — concrete, actionable improvements ranked by priority
References
- Evaluation rubric and report template: See references/evaluation-criteria.md