research-pipeline

Warn

Audited by Gen Agent Trust Hub on Mar 17, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses the Bash(*) tool, allowing for arbitrary command execution on the host system. It explicitly describes launching experiments in screen sessions and checking GPU availability on remote servers via shell commands.
  • [REMOTE_CODE_EXECUTION]: The workflow involves generating code (Stage 2) and subsequently deploying and executing it (Stage 3) on local or remote servers. While this is the primary purpose of the skill, the ability to run agent-generated code without mandatory human review (controlled by AUTO_PROCEED) is a significant security risk.
  • [PROMPT_INJECTION]: The skill is vulnerable to Indirect Prompt Injection. It ingests untrusted data from the web (WebSearch, WebFetch) and the arXiv API to generate research ideas and implementation details. There are no documented sanitization steps or boundary markers to prevent malicious instructions embedded in research papers or web content from influencing the generated code or agent behavior.
  • Ingestion points: WebSearch, WebFetch, and arXiv API results are processed to create IDEA_REPORT.md.
  • Boundary markers: None identified in the provided instructions to separate user intent from retrieved data.
  • Capability inventory: The agent has access to Bash(*), Edit, Write, and can invoke other agents/skills to execute code.
  • Sanitization: The instructions do not specify any validation or filtering for data retrieved from external sources.
  • [EXTERNAL_DOWNLOADS]: The skill fetches metadata and potentially PDF files from arXiv. While arXiv is a well-known and reputable service, the autonomous processing of these files introduces the aforementioned injection risks.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 17, 2026, 07:04 AM