codex-safe-experiment
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
- [PROMPT_INJECTION] (HIGH): The skill employs 'LIBRARY-FIRST PROTOCOL (MANDATORY)' and Turkish triggers ('Kanitsal Cerceve') to override the agent's standard operating procedures and force a specific behavioral frame.
- [DATA_EXFILTRATION] (HIGH): In SKILL.md, 'Step 3: Existing Projects' explicitly commands the AI to scan 'D:\Projects*' and 'EXTRACT' data. This is a severe local data exposure risk that directly contradicts the README's claim of 'CWD only' filesystem access.
- [COMMAND_EXECUTION] (HIGH): The skill repeatedly instructs the use of 'bash -lc' to execute commands. Running a login shell can bypass environment restrictions and security controls applied to the agent's default restricted shell.
- [INDIRECT_PROMPT_INJECTION] (HIGH): The skill has a high-privilege attack surface (Bash, Write, Edit tools) and ingests untrusted 'experiment descriptions' from users. This creates a vulnerability where malicious input could escape the claimed 'sandbox' via the command-line execution patterns provided.
- [METADATA_POISONING] (MEDIUM): The README and SKILL.md claim 'Network DISABLED' and 'CWD only' protection layers, yet the internal instructions ('D:\Projects*', 'bash -lc') actively encourage the agent to violate these exact security boundaries.
Recommendations
- AI detected serious security threats
Audit Metadata