llm-judge

Warn

Audited by Gen Agent Trust Hub on Apr 12, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The workflow defined in references/repo-agent.md instructs agents to execute test suites using commands like pytest, npm test, and go test within target repositories. Because these repositories are provided by the user and are inherently untrusted, this allows for the execution of arbitrary code contained within the test files of those repositories.\n- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection. It reads data from external specification files and repository source code and interpolates this content directly into the prompts for sub-agents in SKILL.md (specifically Step 5 and Step 7).\n
  • Ingestion points: The contents of the specification file ($SPEC_CONTENT) and the target repositories ($REPO_PATH) are read and injected into agent instructions.\n
  • Boundary markers: The skill uses markdown headers (e.g., **Spec Document:**) but lacks explicit delimiters or instructions telling the model to ignore potential commands embedded in the data.\n
  • Capability inventory: The agents can execute shell commands (git, pytest), perform file system operations, and load additional skills (e.g., beagle-core:llm-artifacts-detection).\n
  • Sanitization: No sanitization, validation, or escaping of the external content is performed before it is added to the agent's context.\n- [COMMAND_EXECUTION]: The skill uses shell commands like git -C, cat, and mkdir in SKILL.md and references/repo-agent.md involving user-supplied paths ($REPO_PATH, $SPEC_PATH). Although existence checks are performed, the use of variables in shell strings can lead to command injection if the environment does not properly sanitize these inputs.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 12, 2026, 09:29 AM