skill-tester
SKILL.md
Skill Tester & Analyzer
A meta-skill for deeply testing and auditing other Claude skills. It instruments test runs to capture raw API call traces, records all script stdin/stdout/stderr with timing, and runs deterministic security scans followed by dedicated security and code review subagents against any scripts embedded in the skill.
Session Directory Layout
<report_root>/<skill_name>_<YYYYMMDD_HHMMSS>/
├── manifest.json # Validation results and session metadata (created by setup_test_env.py)
├── sandbox/ # Isolated workspace for script execution
├── inventory.json # Skill structure scan
├── scan_results.json # Deterministic security findings (B9 — runs first)
├── prompt_lint.json # Deterministic prompt quality findings (B11 — runs first)
├── prompt_review.json # AI prompt quality analysis (receives prompt_lint as input)
├── api_log.jsonl # All Claude API calls (one JSON object per line)
├── script_runs.jsonl # All script executions with I/O
├── security_report.json # AI security analysis (receives scan_results as input)
├── code_review.json # Code quality review
├── session_report.html # Claude Code session trace (API calls, tool use, conversation)
└── report.html # Unified interactive HTML report
Modes
| Mode | Description | Phases Run | Command |
|---|---|---|---|
| Full (default) | Complete analysis: scan → prompt-lint → test → security → review → report | All (2-9) | /st:run |
| Audit | Static analysis only, no test execution | 2-4, 6-7, 9 | /st:audit |
| Trace | Runtime capture only, no security/code review | 2, 5, 8, 9 | /st:trace |
| Report | Re-generate HTML from existing session data | 9 only | /st:report |
Commands
| Command | Mode | Phases | Purpose |
|---|---|---|---|
/st:init |
All | 1 | Set up session: target, mode, prompts, report location |
/st:run |
Full | 2-9 | Execute all analysis phases |
/st:audit |
Audit | 2-4, 6-7, 9 | Static analysis only |
/st:trace |
Trace | 2, 5, 8, 9 | Runtime capture only |
/st:report |
Report | 9 | Regenerate HTML from session data |
/st:status |
N/A | — | Show session state |
/st:resume |
Any | Variable | Resume interrupted session |
Interpreting Results
Security Severity Levels
| Level | Meaning | Action |
|---|---|---|
CRITICAL |
Active exploit risk (e.g., shell injection, RCE, hardcoded production key) | Block — do not use skill; fix immediately |
HIGH |
Likely data exposure or privilege escalation | Fix before production |
MEDIUM |
Defense-in-depth gap; not immediately exploitable | Fix in next iteration |
LOW |
Style/practice issue with minor security implications | Note in report |
INFO |
Observation, no risk | Informational only |
Code Quality Score (0–10)
| Range | Interpretation |
|---|---|
| 9–10 | Production-ready |
| 7–8 | Minor improvements needed |
| 5–6 | Significant gaps — refactoring advised |
| < 5 | Major issues — rework required |
Weekly Installs
7
Repository
ddunnock/claude-pluginsGitHub Stars
5
First Seen
9 days ago
Security Audits
Installed on
opencode7
antigravity7
qwen-code7
claude-code7
github-copilot7
codex7