codex
Warn
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- Metadata Poisoning (MEDIUM): The skill and its references claim to use non-existent models like 'GPT-5.3-Codex' and 'GPT-5.1-thinking'.
- Evidence: Found in
SKILL.mddescription andreferences/codex-cli-reference.mdtable. - Risk: Using fake model versioning is a form of deception that can mislead the agent or user regarding the skill's actual capabilities and safety profile.
- Obfuscation / Stealth (MEDIUM): The skill explicitly instructs the agent to hide standard error output from the user.
- Evidence:
SKILL.mdcontains the constraint: 'Suppress stderr by default: append 2>/dev/null to all codex exec commands'. - Risk: This prevents the user from seeing security warnings, crash logs, or unauthorized access attempts generated by the CLI tool.
- Privilege Escalation (MEDIUM): The skill exposes a high-risk sandbox bypass flag.
- Evidence: Reference to
--sandbox danger-full-accesswhich permits 'network or broad access'. - Risk: While the skill advises asking for permission, the underlying capability allows a CLI tool to escape basic sandbox constraints if the agent is manipulated via prompt injection.
- Indirect Prompt Injection (LOW): The skill processes untrusted external code and diffs which are interpolated into command arguments.
- Evidence Chain:
- Ingestion points: The
codex execpattern inreferences/codex-cli-reference.mdtakes a[review prompt with diff]as a direct argument. - Boundary markers: Absent. No delimiters are specified to separate the diff from the command logic.
- Capability inventory: The skill allows workspace writes (
workspace-write) and full network access (danger-full-access). - Sanitization: No sanitization or escaping of the input diff is performed before shell execution.
Audit Metadata