reproduce
Warn
Audited by Gen Agent Trust Hub on Apr 30, 2026
Risk Level: MEDIUMEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill fetches external content from arXiv via Tavily and clones code repositories from GitHub using
git clone. It also useswget,curl, andaria2cto download datasets from potentially untrusted URLs. - [COMMAND_EXECUTION]: The workflow relies heavily on shell commands for environment setup and task execution, including
uv venvfor virtual environments,ghfor GitHub interactions, andpythonfor running scripts. - [REMOTE_CODE_EXECUTION]: The skill's core purpose is to download and execute code from third-party repositories. This behavior inherits the security risks of the source content, as any malicious code within a cloned repository would be executed during the reproduction process.
- [DYNAMIC_EXECUTION]: The agent is instructed to write custom Python scripts (e.g.,
train.py,smoke_forward.py,optim_factory.py) and then execute them to verify the model's behavior. - [INDIRECT_PROMPT_INJECTION]:
- Ingestion points: Untrusted data enters the context through arXiv HTML/PDF extracts and GitHub repository content (READMEs, code comments).
- Boundary markers: The instructions recommend verbatim extraction but lack explicit delimiters or instructions to the agent to disregard malicious directives embedded within the paper text.
- Capability inventory: The agent possesses full file-system access, network capabilities, and the ability to execute arbitrary shell commands.
- Sanitization: No sanitization or validation is performed on the content extracted from external papers before it is used to guide the implementation and training stages.
Audit Metadata