llm-judge

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (MEDIUM): The skill executes potentially dangerous shell commands to analyze repositories and run tests. Evidence: references/repo-agent.md contains logic to run git, pytest, npm test, and go test within the provided $REPO_PATH. Risk: Malicious repositories can exploit these tools to execute arbitrary code on the host machine via package.json scripts, conftest.py files, or similar build/test hooks.\n- PROMPT_INJECTION (LOW): The skill is vulnerable to Indirect Prompt Injection (Category 8). Ingestion points: Phase 1 agents read untrusted code from the repository and the specification document as defined in references/repo-agent.md. Boundary markers: None identified. The prompt templates in SKILL.md interpolate external content ($SPEC_CONTENT and repository data) directly into instructions without delimiters or safety warnings. Capability inventory: The agent possesses capabilities to run shell commands and read local file systems. Sanitization: No sanitization, escaping, or validation is performed on the ingested code or specification text before it is processed by the LLM.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 05:25 PM