OpenAI Agents SDK Development

Warn

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONPROMPT_INJECTIONDATA_EXFILTRATION
Full Analysis
  • REMOTE_CODE_EXECUTION (MEDIUM): In examples/basic_agent.py, the calculate tool uses Python's eval() function to evaluate mathematical expressions provided by the user. While the tool employs a character whitelist (0123456789+-*/.() ) to restrict input to basic arithmetic, which significantly reduces the risk of arbitrary code execution, eval() remains an inherently risky function for processing dynamic user input.\n- PROMPT_INJECTION (LOW): The skill demonstrates an architecture vulnerable to Indirect Prompt Injection due to the processing of untrusted user input that can trigger sensitive tool actions.\n
  • Ingestion points: The message parameter in the handle_customer function within examples/multi_agent_triage.py.\n
  • Boundary markers: Agent instructions in examples/multi_agent_triage.py do not use explicit delimiters (e.g., triple quotes or XML tags) to isolate user-provided text from system instructions.\n
  • Capability inventory: The skill includes capabilities to process refunds (process_refund), create tickets (create_support_ticket), and evaluate code (eval).\n
  • Sanitization: The skill mitigates this risk by implementing an LLM-based Content Moderator agent as an InputGuardrail to filter malicious input.\n- DATA_EXFILTRATION (LOW): Documentation in references/tools.md provides examples using the httpx library to perform network requests to non-whitelisted domains (e.g., api.example.com). This pattern could be exploited for Server-Side Request Forgery (SSRF) if destination validation is not implemented in the actual tool logic.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 17, 2026, 06:40 PM