NYC

python-sdk

Fail

Audited by Gen Agent Trust Hub on Feb 18, 2026

Risk Level: HIGHREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • REMOTE_CODE_EXECUTION (HIGH): Multiple examples in references/tool-builder.md demonstrate using the eval() function on arguments provided by the AI agent (e.g., eval(call.args['expression'])). Since agent output can be influenced by malicious actors through prompt injection, this allows for arbitrary Python code execution on the system running the agent.
  • COMMAND_EXECUTION (MEDIUM): The references/agent-patterns.md file highlights how to enable the code_execution internal tool. While part of the SDK functionality, this grants an LLM the capability to generate and execute code, which requires extreme caution and sandboxing.
  • DATA_EXFILTRATION (LOW): Examples in references/tool-builder.md and references/files.md include network-enabled features like webhook_tool and requests.get, which could be used to send sensitive data to external endpoints.
  • PROMPT_INJECTION (LOW): The skill documentation describes agents that ingest untrusted data from web searches and user messages while possessing powerful capabilities like code execution. Evidence: 1. Ingestion points: agent.send_message and search tool outputs. 2. Boundary markers: Absent in examples. 3. Capability inventory: eval() usage and code_execution tool. 4. Sanitization: Absent in the provided examples.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 18, 2026, 10:12 AM