The Agent Skills Directory

[COMMAND_EXECUTION] (HIGH): The skill instructs the user to execute a Docker container with the SYS_ADMIN capability (--cap-add=SYS_ADMIN). This grants the container excessive privileges on the host system, potentially allowing it to bypass security profiles or access host resources directly.
[COMMAND_EXECUTION] (HIGH): The installation section requires the use of sudo pip uninstall for several packages. Executing package managers with root privileges can lead to system-wide instability and provides an attacker with a path to host compromise if the skill or environment is manipulated.
[EXTERNAL_DOWNLOADS] (MEDIUM): The skill installs the openrlhf package and its dependencies (Ray, vLLM). These sources are not included in the provided Trusted GitHub Organizations list, and the installations are performed without version pinning or checksum verification.
[PROMPT_INJECTION] (LOW): (Category 8
Indirect Prompt Injection): The skill possesses an attack surface for indirect prompt injection via the processing of untrusted external data.
Ingestion points: The skill ingests external datasets from Hugging Face (e.g., OpenRLHF/preference_dataset_mixture2_and_safe_pku) during the Reward Model and PPO training steps in SKILL.md.
Boundary markers: No boundary markers or 'ignore' instructions are used to separate training data from training logic.
Capability inventory: The skill has the capability to execute shell commands, perform network operations via Ray, and write to the filesystem.
Sanitization: No sanitization or validation of the training data content is performed, leaving the agent susceptible to instructions embedded within the datasets.

openrlhf-training