arxiv

Fail

Audited by Gen Agent Trust Hub on May 5, 2026

Risk Level: HIGHCOMMAND_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [COMMAND_EXECUTION]: Potential shell command injection in search and fetching steps. The user-provided search query and paper IDs are interpolated directly into shell-executed commands (python3 "$SCRIPT" search "QUERY"). An attacker can use shell metacharacters like backticks or semicolons to break out of the quotes and execute arbitrary commands.
  • [COMMAND_EXECUTION]: Python code injection in fallback routines. The skill uses HEREDOCs to generate and execute Python code on the fly. User-controlled variables are interpolated into the source code (query = urllib.parse.quote("QUERY")), which can be exploited to execute arbitrary Python logic within the agent's environment.
  • [COMMAND_EXECUTION]: Command injection in Step 6. The skill iterates over arxiv_id values retrieved from the external arXiv XML response and interpolates them directly into a shell command for the research wiki helper. If a malicious paper record is fetched from the API with an ID containing shell metacharacters, it can result in unauthorized code execution.
  • [PROMPT_INJECTION]: Vulnerability to indirect prompt injection (Category 8).
  • Ingestion points: Academic paper titles and abstracts are fetched from the external arXiv API and displayed to the agent in Step 5.
  • Boundary markers: The skill lacks delimiters or "ignore embedded instructions" warnings when presenting the untrusted content to the agent.
  • Capability inventory: The skill has Bash and Write capabilities, meaning malicious instructions embedded in an arXiv abstract could trick the agent into performing dangerous file system or shell operations during the summarization phase.
  • Sanitization: There is no sanitization of the fetched text beyond basic newline removal.
  • [EXTERNAL_DOWNLOADS]: Fetches metadata and PDF files from arXiv's official and well-known academic repository services.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
May 5, 2026, 06:32 PM