openrlhf-training

Fail

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill requires the use of sudo to uninstall Python packages within the environment, which involves elevated privileges.
  • [COMMAND_EXECUTION]: The documentation instructs users to run Docker containers with the --cap-add=SYS_ADMIN flag. This grants the container significant root-level capabilities over the host system, which is a high-risk security configuration.
  • [REMOTE_CODE_EXECUTION]: The framework allows for the dynamic loading and execution of local Python scripts via the --remote_rm_url and --agent_func_path arguments. While these target local paths, they provide a direct mechanism for executing arbitrary code.
  • [REMOTE_CODE_EXECUTION]: The 'Custom Reward Functions' guide provides an example of 'Reinforced Fine-Tuning' that uses subprocess.run to execute model-generated Python code using pytest. This creates a vulnerability where malicious code generated by a model could be executed on the training infrastructure.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection through its ingestion of external datasets (via --dataset and --prompt_data). Malicious instructions embedded in these datasets could attempt to exploit the code execution capabilities of the reward function environment or influence the training outcome.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 28, 2026, 06:06 PM