codebase-review

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTION
Full Analysis
  • [Category 4] (MEDIUM): Unverifiable Dependencies. The skill installs Python packages (vulture, bandit, radon, pip-audit) at runtime in the webapp container without version pinning or integrity checks, exposing the environment to supply chain risks from untrusted sources (PyPI).
  • [Category 8] (HIGH): Indirect Prompt Injection Surface. The skill is designed to ingest and process external content (the entire codebase) while possessing high-impact capabilities. \n
  • Ingestion points: Reads all files in the repository using grep, bandit, and vulture (SKILL.md, references/review-dimensions.md). \n
  • Boundary markers: Absent. There are no delimiters or instructions to treat codebase content as untrusted data rather than instructions. \n
  • Capability inventory: Execution of arbitrary shell commands via docker compose exec, runtime software installation, and the ability to create GitHub issues (SKILL.md, references/report-template.md). \n
  • Sanitization: Absent. Output from tools and codebase content is passed directly to the agent context.
  • [Category 10] (MEDIUM): Dynamic Execution. The skill executes multi-line Python strings dynamically using python -c inside the webapp container (references/gts-specifics.md), which can be manipulated if the container environment is compromised.
  • [Category 5] (MEDIUM): Excessive Privileges. The extensive use of docker compose exec assumes the host environment grants the agent permission to perform arbitrary operations inside running containers, which can be used to bypass local security controls.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 12:43 PM