ac-criteria-validator

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill's primary function involves running test suites and capturing output ('Run associated test files', 'Execute: Run test suite'). This typically involves spawning subprocesses (e.g., pytest, npm test) which can be manipulated to execute arbitrary shell commands.
  • [REMOTE_CODE_EXECUTION] (HIGH): Because the skill processes and executes code within the project directory, it acts as a vector for RCE if that directory contains untrusted content from external repositories or user submissions.
  • [Indirect Prompt Injection] (HIGH): The skill exhibits a high-risk vulnerability surface by ingesting untrusted data and possessing execution capabilities. Ingestion points: Project directory and test files. Boundary markers: None present; the skill treats all local test files as trusted. Capability inventory: High-privilege execution via subprocess calls for test runners. Sanitization: None detected; the skill does not validate the safety of the scripts it executes.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 07:12 AM