experiment-pipeline

Warn

Audited by Gen Agent Trust Hub on Mar 17, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The skill explicitly directs the agent to locate, implement, and run executable code from external repositories (official or community re-implementations) to establish baselines.
  • Evidence: SKILL.md, Stage 1: "Find or generate executable baseline code... Resolve dependencies, fix compatibility issues... Run and compare metrics."- [EXTERNAL_DOWNLOADS]: The workflow involves fetching external codebases and resolving dependencies from third-party sources.
  • Evidence: SKILL.md, Stage 1: "Find the original baseline code (official repo, re-implementations...)" and "resolve dependencies".- [COMMAND_EXECUTION]: The agent is instructed to use the execute tool to run code changes, training scripts, and experiments throughout the 4-stage process.
  • Evidence: SKILL.md, Stage Loop: "Execute: Run the experiment. Record exact configuration, code changes, and runtime."- [PROMPT_INJECTION]: The skill possesses a broad surface for indirect prompt injection by ingesting and executing untrusted external code and paper descriptions.
  • Ingestion points: External repositories and research paper descriptions (SKILL.md, Stage 1).
  • Boundary markers: No explicit instructions provided to delimit or ignore instructions within external content.
  • Capability inventory: The agent has access to execute, write_file, and edit_file tools.
  • Sanitization: No sanitization or validation protocols are specified for external content.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 17, 2026, 03:10 PM