openrlhf-training
Fail
Audited by Gen Agent Trust Hub on Mar 28, 2026
Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill requires the use of
sudoto uninstall Python packages within the environment, which involves elevated privileges. - [COMMAND_EXECUTION]: The documentation instructs users to run Docker containers with the
--cap-add=SYS_ADMINflag. This grants the container significant root-level capabilities over the host system, which is a high-risk security configuration. - [REMOTE_CODE_EXECUTION]: The framework allows for the dynamic loading and execution of local Python scripts via the
--remote_rm_urland--agent_func_patharguments. While these target local paths, they provide a direct mechanism for executing arbitrary code. - [REMOTE_CODE_EXECUTION]: The 'Custom Reward Functions' guide provides an example of 'Reinforced Fine-Tuning' that uses
subprocess.runto execute model-generated Python code usingpytest. This creates a vulnerability where malicious code generated by a model could be executed on the training infrastructure. - [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection through its ingestion of external datasets (via
--datasetand--prompt_data). Malicious instructions embedded in these datasets could attempt to exploit the code execution capabilities of the reward function environment or influence the training outcome.
Recommendations
- AI detected serious security threats
Audit Metadata