hypothesis-generation

Pass

Audited by Gen Agent Trust Hub on Apr 12, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill workflow requires the execution of multiple system commands via the Bash tool to automate document generation and image processing.
  • SKILL.md instructs the agent to run python scripts/generate_schematic.py for diagram generation and xelatex/bibtex for document compilation.
  • scripts/generate_schematic.py uses the subprocess module to orchestrate the internal execution of scripts/generate_schematic_ai.py using a list-based argument structure that avoids shell injection risks.
  • [EXTERNAL_DOWNLOADS]: The skill relies on network access to the OpenRouter API (https://openrouter.ai/api/v1) to perform core functions.
  • scripts/generate_schematic_ai.py uses the requests library to communicate with OpenRouter for generating and reviewing scientific schematics via Google Gemini models.
  • [DATA_EXFILTRATION]: User-supplied scientific observations and diagram descriptions are transmitted to an external service provider (OpenRouter).
  • This transmission is limited to the data necessary for generating and refining the AI-powered diagrams requested by the user.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection as it is designed to ingest and process untrusted scientific data from the open web.
  • Ingestion points: External data is collected via WebFetch (specifically targeting PubMed URLs) and general WebSearch results during the literature synthesis phase.
  • Boundary markers: The skill documentation lacks explicit instructions for the agent to use XML tags or specific delimiters to separate gathered research data from internal processing instructions.
  • Capability inventory: The agent has access to powerful tools including Bash (for LaTeX/Python execution) and the ability to Write and Edit files within the workspace.
  • Sanitization: There is no documented mechanism to sanitize or filter potential instructional triggers embedded within research papers or web content before they are integrated into the hypothesis generation logic.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 12, 2026, 08:27 AM