codex

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • Metadata Poisoning (MEDIUM): The skill and its references claim to use non-existent models like 'GPT-5.3-Codex' and 'GPT-5.1-thinking'.
  • Evidence: Found in SKILL.md description and references/codex-cli-reference.md table.
  • Risk: Using fake model versioning is a form of deception that can mislead the agent or user regarding the skill's actual capabilities and safety profile.
  • Obfuscation / Stealth (MEDIUM): The skill explicitly instructs the agent to hide standard error output from the user.
  • Evidence: SKILL.md contains the constraint: 'Suppress stderr by default: append 2>/dev/null to all codex exec commands'.
  • Risk: This prevents the user from seeing security warnings, crash logs, or unauthorized access attempts generated by the CLI tool.
  • Privilege Escalation (MEDIUM): The skill exposes a high-risk sandbox bypass flag.
  • Evidence: Reference to --sandbox danger-full-access which permits 'network or broad access'.
  • Risk: While the skill advises asking for permission, the underlying capability allows a CLI tool to escape basic sandbox constraints if the agent is manipulated via prompt injection.
  • Indirect Prompt Injection (LOW): The skill processes untrusted external code and diffs which are interpolated into command arguments.
  • Evidence Chain:
  • Ingestion points: The codex exec pattern in references/codex-cli-reference.md takes a [review prompt with diff] as a direct argument.
  • Boundary markers: Absent. No delimiters are specified to separate the diff from the command logic.
  • Capability inventory: The skill allows workspace writes (workspace-write) and full network access (danger-full-access).
  • Sanitization: No sanitization or escaping of the input diff is performed before shell execution.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 06:38 PM