codex-prompting
Fail
Audited by Gen Agent Trust Hub on Mar 7, 2026
Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill configures a local environment where approval_policy is set to 'never' and sandbox_mode is 'danger-full-access', effectively removing human-in-the-loop safety checks for shell command execution.
- [COMMAND_EXECUTION]: The documentation describes a shell_command tool featuring a with_escalated_permissions parameter, allowing the model to bypass sandbox restrictions and execute commands with elevated system privileges.
- [DATA_EXFILTRATION]: The skill provides command templates to search through ~/.pi/agent/sessions, which can expose sensitive credentials, private chat history, and API keys from previous interactions.
- [PROMPT_INJECTION]: The skill architecture is susceptible to indirect prompt injection as it transforms arbitrary user input into high-signal commands without sanitization. Evidence chain: (1) Ingestion point: natural language requests like 'send to codex' (SKILL.md); (2) Boundary markers: canonical request format provided but lacks explicit 'ignore embedded instructions' warnings (SKILL.md); (3) Capability inventory: shell_command with escalated privileges, git, and read_file (codex-prompting-guide.md); (4) Sanitization: no evidence of input filtering or escaping.
- [COMMAND_EXECUTION]: The action-first execution contract prevents the model from providing narration or plans, hindering the user's ability to review or intercept dangerous operations.
Recommendations
- AI detected serious security threats
Audit Metadata