gepetto

Fail

Audited by Gen Agent Trust Hub on Feb 19, 2026

Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION] (HIGH): The skill uses shell interpolation to pass the contents of claude-plan.md as an argument to external CLIs.
  • Evidence: In references/external-review.md, the command gemini ... "$(cat '<planning_dir>/claude-plan.md')" executes the output of the cat command within a shell string. If the plan file contains shell metacharacters like backticks, semicolons, or dollar signs, they will be evaluated and executed by the host shell.
  • Evidence: The use of --approval-mode yolo in the Gemini command explicitly bypasses safety confirmations, increasing the impact of a successful injection.
  • [PROMPT_INJECTION] (LOW): The skill implements a web research protocol that ingests untrusted data, creating an indirect prompt injection surface.
  • Evidence (Ingestion points): references/research-protocol.md defines a workflow using WebSearch and WebFetch to collect data from the internet.
  • Evidence (Boundary markers): There are no defined boundary markers or instructions to the agent to ignore embedded commands within the fetched web content.
  • Evidence (Capability inventory): The gathered research is used to inform implementation plans and interview questions, which are then processed by other subagents.
  • Evidence (Sanitization): No sanitization or validation of the fetched HTML/markdown content is performed before it is synthesized into the research report.
  • [EXTERNAL_DOWNLOADS] (LOW): The skill documentation encourages the use of external, non-standard CLI tools for its core functionality.
  • Evidence: references/external-review.md references gemini and codex CLIs which must be 'installed and configured separately by the user'. While the skill doesn't download them itself, it relies on the presence of these external binaries which may have their own security implications.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 19, 2026, 11:27 PM