reflect

Pass

Audited by Gen Agent Trust Hub on Mar 1, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection (Category 8) because it ingests untrusted conversation data to define permanent behavioral rules.\n
  • Ingestion points: Conversation history is scanned for signals as described in SKILL.md and data/signal_patterns.md.\n
  • Boundary markers: Absent; the skill uses regex patterns to extract rules from raw message text without delimiters or instructions to ignore embedded commands.\n
  • Capability inventory: The skill uses Edit, Write, and Bash tools to modify persistent agent configuration files (e.g., ~/.claude/agents/, CLAUDE.md).\n
  • Sanitization: Relies on human-in-the-loop approval, which mitigates but does not fully prevent the persistence of harmful instructions if a user is misled by the generated proposal.\n- [COMMAND_EXECUTION]: The skill requests and utilizes the Bash tool to apply configuration changes and perform git commits. This provides a mechanism for any successfully injected instructions to interact with the underlying system and persist changes across sessions.\n- [PROMPT_INJECTION]: The use of high-confidence triggers like 'never', 'always', and 'the rule is' in data/signal_patterns.md allows attackers to easily influence the agent's logic. This can lead to behavioral overrides or the creation of malicious new skills through the 'Skill-Worthy' discovery logic defined in SKILL.md.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 1, 2026, 06:06 PM