internalize
Pass
Audited by Gen Agent Trust Hub on Mar 8, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection during Step 1 (Receive). By scanning the conversation history for 'intervention signals,' the agent could inadvertently ingest malicious instructions previously introduced by an untrusted source (e.g., content from a website or document the agent read earlier) and attempt to 'internalize' them as permanent directives.
- [PROMPT_INJECTION]: The skill possesses the authority to modify persistent instruction files such as
.cursorrules,CLAUDE.md, andcodex.md. These files govern the agent's long-term behavior and security constraints for the entire repository. Malicious modifications to these files could lead to persistent behavioral compromises. - [SAFE]: The skill implements a mandatory human-in-the-loop (HITL) verification step. In Step 5 (Draft), the agent must present the proposed directive and a diff-style preview to the user, requiring explicit confirmation through the
AskUserQuestiontool before any file edits are performed in Step 6.
Audit Metadata