subagent-testing

SKILL.md

Subagent Testing - TDD for Skills

Test skills with fresh subagent instances to prevent priming bias and validate effectiveness.

Overview

Fresh instances prevent priming: Each test uses a new Claude conversation to verify the skill's impact is measured, not conversation history effects.

Why Fresh Instances Matter

The Priming Problem

Running tests in the same conversation creates bias:

  • Prior context influences responses
  • Skill effects get mixed with conversation history
  • Can't isolate skill's true impact

Fresh Instance Benefits

  • Isolation: Each test starts clean
  • Reproducibility: Consistent baseline state
  • Measurement: Clear before/after comparison
  • Validation: Proves skill effectiveness, not priming

Testing Methodology

Three-phase TDD-style approach:

Phase 1: Baseline Testing (RED)

Test without skill to establish baseline behavior.

Phase 2: With-Skill Testing (GREEN)

Test with skill loaded to measure improvements.

Phase 3: Rationalization Testing (REFACTOR)

Test skill's anti-rationalization guardrails.

Quick Start

# 1. Create baseline tests (without skill)
# Use 5 diverse scenarios
# Document full responses

# 2. Create with-skill tests (fresh instances)
# Load skill explicitly
# Use identical prompts
# Compare to baseline

# 3. Create rationalization tests
# Test anti-rationalization patterns
# Verify guardrails work

Detailed Testing Guide

For complete testing patterns, examples, and templates:

Success Criteria

  • Baseline: Document 5+ diverse baseline scenarios
  • Improvement: ≥50% improvement in skill-related metrics
  • Consistency: Results reproducible across fresh instances
  • Rationalization Defense: Guardrails prevent ≥80% of rationalization attempts

See Also

  • skill-authoring: Creating effective skills
  • bulletproof-skill: Anti-rationalization patterns
  • test-skill: Automated skill testing command
Weekly Installs
4
Installed on
claude-code4
opencode3
codex3
zencoder2
cline2
cursor2