llm-judge
Warn
Audited by Gen Agent Trust Hub on Apr 12, 2026
Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [REMOTE_CODE_EXECUTION]: The workflow defined in
references/repo-agent.mdinstructs agents to execute test suites using commands likepytest,npm test, andgo testwithin target repositories. Because these repositories are provided by the user and are inherently untrusted, this allows for the execution of arbitrary code contained within the test files of those repositories.\n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. It reads data from external specification files and repository source code and interpolates this content directly into the prompts for sub-agents inSKILL.md(specifically Step 5 and Step 7).\n - Ingestion points: The contents of the specification file (
$SPEC_CONTENT) and the target repositories ($REPO_PATH) are read and injected into agent instructions.\n - Boundary markers: The skill uses markdown headers (e.g.,
**Spec Document:**) but lacks explicit delimiters or instructions telling the model to ignore potential commands embedded in the data.\n - Capability inventory: The agents can execute shell commands (
git,pytest), perform file system operations, and load additional skills (e.g.,beagle-core:llm-artifacts-detection).\n - Sanitization: No sanitization, validation, or escaping of the external content is performed before it is added to the agent's context.\n- [COMMAND_EXECUTION]: The skill uses shell commands like
git -C,cat, andmkdirinSKILL.mdandreferences/repo-agent.mdinvolving user-supplied paths ($REPO_PATH,$SPEC_PATH). Although existence checks are performed, the use of variables in shell strings can lead to command injection if the environment does not properly sanitize these inputs.
Audit Metadata