openrlhf-training
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- Command Execution (HIGH): The skill instructs the user to execute
sudo pip uninstall, which grants unnecessary elevated permissions on the host system. Found in SKILL.md:sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y. - Command Execution (HIGH): The Docker command includes
--cap-add=SYS_ADMIN, which provides the container with excessive privileges, potentially leading to container escape or host compromise. Found in SKILL.md:docker run ... --cap-add=SYS_ADMIN. - External Downloads (MEDIUM): The skill installs the
openrlhfpackage from PyPI and pulls a Docker image fromnvcr.io/nvidia. Neither source is in the trusted list. Found in SKILL.md:pip install openrlhf[vllm]anddocker run ... nvcr.io/nvidia/pytorch:25.02-py3. - Prompt Injection (HIGH): The skill ingests untrusted datasets from external sources (Hugging Face) and uses them for model training, creating a significant indirect prompt injection surface with high-impact side effects (model behavior modification). 1. Ingestion points:
--datasetand--pretrainflags in SKILL.md. 2. Boundary markers: Absent. 3. Capability inventory: Command execution viaray job submitanddeepspeed. 4. Sanitization: Absent.
Recommendations
- AI detected serious security threats
Audit Metadata