skills/jacehwang/harness/internalize/Gen Agent Trust Hub

internalize

Pass

Audited by Gen Agent Trust Hub on Mar 8, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection during Step 1 (Receive). By scanning the conversation history for 'intervention signals,' the agent could inadvertently ingest malicious instructions previously introduced by an untrusted source (e.g., content from a website or document the agent read earlier) and attempt to 'internalize' them as permanent directives.
  • [PROMPT_INJECTION]: The skill possesses the authority to modify persistent instruction files such as .cursorrules, CLAUDE.md, and codex.md. These files govern the agent's long-term behavior and security constraints for the entire repository. Malicious modifications to these files could lead to persistent behavioral compromises.
  • [SAFE]: The skill implements a mandatory human-in-the-loop (HITL) verification step. In Step 5 (Draft), the agent must present the proposed directive and a diff-style preview to the user, requiring explicit confirmation through the AskUserQuestion tool before any file edits are performed in Step 6.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 8, 2026, 01:49 PM