self-improve

Fail

Audited by Gen Agent Trust Hub on Apr 28, 2026

Risk Level: HIGHPROMPT_INJECTIONREMOTE_CODE_EXECUTIONCOMMAND_EXECUTION
Full Analysis
  • [PROMPT_INJECTION]: The skill instructions explicitly override standard interaction safety protocols. The 'Autonomous Execution Policy' section commands the agent to 'NEVER stop or pause to ask the user' and 'not ask for confirmation between iterations' once the loop begins, effectively suppressing the user's ability to review or intercept potentially dangerous actions.\n- [REMOTE_CODE_EXECUTION]: The improvement loop involves an automated cycle where an executor agent implements code changes and immediately executes them during the benchmarking phase. This results in the execution of LLM-generated code without human-in-the-loop review, creating a significant risk of arbitrary code execution if the model generates malicious logic.\n- [COMMAND_EXECUTION]: The skill relies on the autonomous execution of system-level commands, including Git operations (merges, tags, worktree management) and benchmark suites. These are performed without individual transaction approval after the initial setup phase.\n- [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection due to its processing of untrusted repository data. Ingestion points: The @explore and @architect agents read the codebase from the target repository path specified in SKILL.md. Boundary markers: The skill does not define clear delimiters or instructions to ignore embedded commands within the processed files. Capability inventory: The skill has extensive capabilities including file system modification and command execution. Sanitization: No sanitization or validation of the repository content is specified before it influences the planning and execution phases.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Apr 28, 2026, 10:25 AM