self-improve

Warn

Audited by Gen Agent Trust Hub on Apr 29, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection because it extracts 'lessons' from the conversation history and uses them to modify its persistent configuration. An attacker could potentially embed malicious instructions in a conversation or PR review that the agent then adopts as a permanent rule.
  • Ingestion points: Processes current conversation history and external PR review comments (via gh api).
  • Boundary markers: Absent; no delimiters are used to separate 'lessons' from potentially malicious instructions in the source text.
  • Capability inventory: Employs /create-skill to update or create skills and /note-improvement to modify project files.
  • Sanitization: Absent; content is filtered for utility but not for security or malicious intent.
  • [REMOTE_CODE_EXECUTION]: The skill implements a self-modification mechanism by dynamically generating or updating its own instructions (SKILL.md) using the /create-skill tool based on inputs from the conversation.
  • [DATA_EXFILTRATION]: Accesses sensitive directories and files containing agent state, configuration, and memory.
  • Evidence: Reads from ~/.claude/projects/*/memory/, ~/.turbo/config.json, and ~/.claude/skills/.
  • [COMMAND_EXECUTION]: Executes system commands and interacts with external APIs to verify environment state and fetch metadata.
  • Evidence: Uses test -d for directory validation and gh api to query repository collaborators.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Apr 29, 2026, 05:23 PM