openrlhf-training
Audited by Socket on Feb 15, 2026
2 alerts found:
MalwareAnomalyThis skill/documentation appears legitimate and aligned with its stated purpose of distributed RLHF training. No direct malicious code patterns are present in the provided text. Main security concerns are operational: examples that run docker with SYS_ADMIN and sudo uninstall commands, and implicit trust in externally hosted model weights/datasets. Users should avoid blindly running privileged commands, verify sources of model checkpoints and datasets, and run in isolated environments. Overall risk is moderate operationally but not indicative of malware.
This file is documentation and examples for implementing reward functions and agent logic. It contains one high-risk pattern: executing model-generated code via subprocess.run(pytest) after writing it to a tempfile, which enables arbitrary code execution on the host and potential data exfiltration or system modification. Other examples are benign algorithmic reward computations or use of evaluation models, but logging and model-loading can leak data or cause network activity. No signs of obfuscated or intentionally malicious code were found, but the code-execution example constitutes a significant security hazard if used without sandboxing and careful privilege, network, and logging controls.