codex-code-review

Fail

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • COMMAND_EXECUTION (HIGH): The helper script scripts/review.sh is vulnerable to command injection. It uses the eval command to execute a string built from unvalidated command-line arguments. An attacker providing a malicious string to parameters like --model or --branch could execute arbitrary shell commands.
  • Evidence: eval "$cmd" "\"$prompt\"" in scripts/review.sh. The variable $cmd is constructed directly from variables like $MODEL, $OUTPUT_FILE, and $BRANCH which are populated from user input without sanitization.
  • EXTERNAL_DOWNLOADS (MEDIUM): The documentation recommends installing @openai/codex via npm. This is not a known official package name for OpenAI (which typically uses openai), posing a supply chain risk if a user installs a malicious package registered under this name.
  • Evidence: npm install -g @openai/codex in references/codex_cli.md.
  • PROMPT_INJECTION (LOW): The skill is susceptible to Indirect Prompt Injection. It ingests untrusted data (git diffs and pull request content) and interpolates it directly into prompts without boundary markers or instructions to ignore embedded commands.
  • Evidence (Mandatory for Cat 8):
  • Ingestion points: scripts/review.sh fetches data via git diff and gh pr diff.
  • Boundary markers: Absent. Data is appended directly to the base_prompt.
  • Capability inventory: The skill can execute shell commands, write files via the --output flag, and access the network via the CLI tool.
  • Sanitization: None detected.
  • DATA_EXFILTRATION (LOW): The skill reads local repository data and PR diffs to send them to an external LLM API. While functional, users should be aware that sensitive code is transmitted externally.
  • Evidence: gh pr diff and git diff usage in scripts/review.sh.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 17, 2026, 06:11 PM