codex
Fail
Audited by Gen Agent Trust Hub on Mar 12, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill facilitates the execution of high-privilege system operations.
- Evidence: The documentation specifies that the
--sandbox danger-full-accessflag is used for "Installing packages, running tests, system operations." - [COMMAND_EXECUTION]: The skill allows for autonomous actions without user oversight.
- Evidence: The
--full-autoflag is explicitly designed to "Allow file edits without confirmation prompts" and for "unattended execution" when combined with sandbox levels. - [REMOTE_CODE_EXECUTION]: The skill provides a mechanism for arbitrary task execution based on natural language.
- Evidence: The
codex exec "task description"command allows the agent to execute complex tasks which, when paired with high-privilege sandbox levels, can result in arbitrary code execution on the host system. - [PROMPT_INJECTION]: The skill processes external, potentially untrusted data that could contain malicious instructions.
- Ingestion points: The skill ingests uncommitted changes, branch diffs, and specific commit data via
codex review. - Boundary markers: No boundary markers or "ignore previous instructions" warnings are included in the skill definition to protect against instructions embedded in the code being reviewed.
- Capability inventory: The agent has the capability to write to the workspace (
workspace-write) and execute system commands (danger-full-access). - Sanitization: There is no evidence of input sanitization or validation performed on the code content before it is processed by the AI.
Recommendations
- AI detected serious security threats
Audit Metadata