tool-creator

Fail

Audited by Gen Agent Trust Hub on Mar 26, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The example implementation of bash_tool.py in the references folder uses subprocess.run(cmd, shell=True). This allows for the execution of arbitrary shell commands with the full privileges of the agent process, presenting a high risk of system compromise.
  • [REMOTE_CODE_EXECUTION]: The python_repl_tool.py reference example facilitates arbitrary code execution by invoking the Python interpreter with user-provided command strings via subprocess. This provides a direct path for remote code execution on the host environment.
  • [DATA_EXFILTRATION]: The agent_tool_template.py and associated examples access _global_node_states to retrieve sensitive internal agent context, including the original request prompt, full execution plans, and accumulated clues. This exposes the entire conversation history and internal reasoning logic to any tool created using this template.
  • [PROMPT_INJECTION]: The skill establishes an indirect prompt injection surface by processing user-supplied task strings and interpolating them into agent messages without explicit sanitization or strict boundary enforcement. Evidence Chain: 1. Ingestion point: The task parameter in the TOOL_SPEC defined in the agent tool template. 2. Boundary markers: While XML tags are used for some output formatting, the input task is passed directly into the agent context without distinct delimiters or instructions to ignore embedded commands. 3. Capability inventory: Generated tools have access to high-risk capabilities like shell and python execution. 4. Sanitization: There is no evidence of input validation, escaping, or filtering performed on the task variable before it is processed by the LLM.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 26, 2026, 04:53 PM