self-improvement

Pass

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill implements a self-improvement loop that processes untrusted data from user corrections and tool outputs. This architecture is vulnerable to indirect prompt injection, as malicious instructions could be logged and subsequently 'promoted' to permanent instruction files such as CLAUDE.md or AGENTS.md.
  • Ingestion points: Data enters via user dialogue and the CLAUDE_TOOL_OUTPUT environment variable processed by scripts/error-detector.sh.
  • Boundary markers: The skill uses markdown headers and structured fields for logging, which provide weak delimitation against adversarial input.
  • Capability inventory: The agent has the capability to write files, execute shell commands, and modify its own workspace instructions.
  • Sanitization: There is no evidence of content validation or sanitization before data is logged or promoted to core files.
  • [COMMAND_EXECUTION]: The skill includes several bash scripts (scripts/activator.sh, scripts/error-detector.sh, scripts/extract-skill.sh) that are intended to be executed by the agent or user. While these scripts perform legitimate scaffolding and monitoring tasks, they represent an additional attack surface within the agent's operating environment. The extract-skill.sh script includes basic regex-based validation for the skill name to mitigate basic injection risks.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 10, 2026, 03:25 AM