auto-dev

Fail

Audited by Gen Agent Trust Hub on Mar 14, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The generated autodev.sh script (based on references/autodev-template.md) executes the Claude CLI using the --dangerously-skip-permissions flag. This configuration grants the AI agent the ability to execute any shell command on the host machine without human oversight or confirmation prompts.
  • [REMOTE_CODE_EXECUTION]: The automation pipeline in autodev.sh repeatedly executes shell-based test commands (identified as {TEST_CMD}) during the TDD closure and 'Bug Hunt' phases. This behavior allows for the automatic execution of potentially malicious code if the test definitions are manipulated or if the project environment is untrusted.
  • [PROMPT_INJECTION]: The references/system-prompt-template.md contains instructions that explicitly bypass standard AI safety protocols by stating 'This session has no humans present. All scenarios requiring human confirmation are replaced by AI mutual confirmation.' This removes the human-in-the-loop safety barrier.
  • [INDIRECT_PROMPT_INJECTION]: The skill architecture is highly vulnerable to indirect injection attacks.
  • Ingestion points: The skill reads and processes todolist.md (untrusted input) and project source files to generate task cards and prompts.
  • Boundary markers: There are no explicit boundary markers or 'ignore embedded instructions' filters applied when interpolating user data into the AI prompts.
  • Capability inventory: The pipeline has full shell execution capabilities via the claude CLI and the {TEST_CMD} execution loop.
  • Sanitization: No evidence of sanitization or validation of the input todolist.md or test outputs before they are fed back into the AI sessions.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 14, 2026, 03:43 AM