self-improve
Warn
Audited by Gen Agent Trust Hub on Apr 29, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it extracts 'lessons' from the conversation history and uses them to modify its persistent configuration. An attacker could potentially embed malicious instructions in a conversation or PR review that the agent then adopts as a permanent rule.
- Ingestion points: Processes current conversation history and external PR review comments (via
gh api). - Boundary markers: Absent; no delimiters are used to separate 'lessons' from potentially malicious instructions in the source text.
- Capability inventory: Employs
/create-skillto update or create skills and/note-improvementto modify project files. - Sanitization: Absent; content is filtered for utility but not for security or malicious intent.
- [REMOTE_CODE_EXECUTION]: The skill implements a self-modification mechanism by dynamically generating or updating its own instructions (SKILL.md) using the
/create-skilltool based on inputs from the conversation. - [DATA_EXFILTRATION]: Accesses sensitive directories and files containing agent state, configuration, and memory.
- Evidence: Reads from
~/.claude/projects/*/memory/,~/.turbo/config.json, and~/.claude/skills/. - [COMMAND_EXECUTION]: Executes system commands and interacts with external APIs to verify environment state and fetch metadata.
- Evidence: Uses
test -dfor directory validation andgh apito query repository collaborators.
Audit Metadata