The Agent Skills Directory

[COMMAND_EXECUTION]: The skill requires the use of sudo to uninstall Python packages within the environment, which involves elevated privileges.
[COMMAND_EXECUTION]: The documentation instructs users to run Docker containers with the --cap-add=SYS_ADMIN flag. This grants the container significant root-level capabilities over the host system, which is a high-risk security configuration.
[REMOTE_CODE_EXECUTION]: The framework allows for the dynamic loading and execution of local Python scripts via the --remote_rm_url and --agent_func_path arguments. While these target local paths, they provide a direct mechanism for executing arbitrary code.
[REMOTE_CODE_EXECUTION]: The 'Custom Reward Functions' guide provides an example of 'Reinforced Fine-Tuning' that uses subprocess.run to execute model-generated Python code using pytest. This creates a vulnerability where malicious code generated by a model could be executed on the training infrastructure.
[PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection through its ingestion of external datasets (via --dataset and --prompt_data). Malicious instructions embedded in these datasets could attempt to exploit the code execution capabilities of the reward function environment or influence the training outcome.

openrlhf-training