self-improvement
Pass
Audited by Gen Agent Trust Hub on Mar 23, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill creates a surface for indirect prompt injection by ingesting untrusted user feedback and persisting it into the agent's instruction set and documentation.
- Ingestion points: User-provided feedback and corrections captured via conversational triggers such as "remember this", "update the skill", and "actually it should be" (defined in SKILL.md frontmatter).
- Boundary markers: Present. Phase 3 (User Confirmation) explicitly requires the use of
AskUserQuestionto show the proposed update to a human user for approval before any changes are applied. - Capability inventory: The skill identifies and modifies
CLAUDE.mdandSKILL.mdfiles (Phase 4). Since skills are instruction sets for the agent, this is a form of self-modification capability. - Sanitization: Absent. The skill instructions focus on formatting the feedback into documentation but do not specify any validation, filtering, or escaping of the user-provided text to prevent malicious instructions from being stored.
Audit Metadata