openrlhf-training

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • Command Execution (HIGH): The skill instructs the user to execute sudo pip uninstall, which grants unnecessary elevated permissions on the host system. Found in SKILL.md: sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y.
  • Command Execution (HIGH): The Docker command includes --cap-add=SYS_ADMIN, which provides the container with excessive privileges, potentially leading to container escape or host compromise. Found in SKILL.md: docker run ... --cap-add=SYS_ADMIN.
  • External Downloads (MEDIUM): The skill installs the openrlhf package from PyPI and pulls a Docker image from nvcr.io/nvidia. Neither source is in the trusted list. Found in SKILL.md: pip install openrlhf[vllm] and docker run ... nvcr.io/nvidia/pytorch:25.02-py3.
  • Prompt Injection (HIGH): The skill ingests untrusted datasets from external sources (Hugging Face) and uses them for model training, creating a significant indirect prompt injection surface with high-impact side effects (model behavior modification). 1. Ingestion points: --dataset and --pretrain flags in SKILL.md. 2. Boundary markers: Absent. 3. Capability inventory: Command execution via ray job submit and deepspeed. 4. Sanitization: Absent.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 04:22 AM