codex-prompting

Fail

Audited by Gen Agent Trust Hub on Mar 7, 2026

Risk Level: HIGHCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill configures a local environment where approval_policy is set to 'never' and sandbox_mode is 'danger-full-access', effectively removing human-in-the-loop safety checks for shell command execution.
  • [COMMAND_EXECUTION]: The documentation describes a shell_command tool featuring a with_escalated_permissions parameter, allowing the model to bypass sandbox restrictions and execute commands with elevated system privileges.
  • [DATA_EXFILTRATION]: The skill provides command templates to search through ~/.pi/agent/sessions, which can expose sensitive credentials, private chat history, and API keys from previous interactions.
  • [PROMPT_INJECTION]: The skill architecture is susceptible to indirect prompt injection as it transforms arbitrary user input into high-signal commands without sanitization. Evidence chain: (1) Ingestion point: natural language requests like 'send to codex' (SKILL.md); (2) Boundary markers: canonical request format provided but lacks explicit 'ignore embedded instructions' warnings (SKILL.md); (3) Capability inventory: shell_command with escalated privileges, git, and read_file (codex-prompting-guide.md); (4) Sanitization: no evidence of input filtering or escaping.
  • [COMMAND_EXECUTION]: The action-first execution contract prevents the model from providing narration or plans, hindering the user's ability to review or intercept dangerous operations.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 7, 2026, 02:18 AM