skill-ab-test
Pass
Audited by Gen Agent Trust Hub on Mar 14, 2026
Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: Executes local shell commands including
git,tar,mkdir, anddiffto create workspaces and compare versions of the skill under test. - [COMMAND_EXECUTION]: The report generator script (
scripts/generate-report.js) usesexecSyncto manage local processes on the specified port and open the system's default browser. - [REMOTE_CODE_EXECUTION]: Orchestrates the evaluation of external skill logic by spawning sub-agents to process the "new" and "baseline" versions of a target skill.
- [PROMPT_INJECTION]: Exhibits an indirect prompt injection surface by design, as it must ingest and follow instructions from the skill files being tested.
- Ingestion points: Reads the content of
SKILL.mdand referenced files from the local directory identified for testing. - Boundary markers: Uses instructional framing to delimit the ingested skill content for sub-agents (e.g., "Read the skill at {skill-path}/SKILL.md and follow it").
- Capability inventory: The skill has access to shell commands, local file management, and the ability to spawn/interact with sub-agents.
- Sanitization: Does not explicitly sanitize or filter the instructions contained within the target skill files, which is expected for an evaluation tool.
Audit Metadata