grpo-rl-training

Fail

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: HIGHCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The file 'examples/reward_functions_library.py' includes a 'run_test_cases' function that uses the 'exec()' built-in to execute Python code strings. This function is intended to facilitate 'code_execution_reward' by testing the validity of code produced by the model.
  • [REMOTE_CODE_EXECUTION]: The skill allows for the execution of untrusted code generated by the AI model during runtime. While training, a model might produce scripts that perform unauthorized file access, network requests, or other malicious actions. The current implementation lacks sandboxing (e.g., Docker or gVisor), making the host environment vulnerable to full compromise by the generated code.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection through the processing of untrusted training data. Ingestion points include training datasets loaded via 'load_dataset' in 'templates/basic_grpo_training.py' and CSV files in 'SKILL.md'. XML-style tags are used for structure but do not sanitize the content within them. The 'exec()' function provides a high-privilege execution environment. No input validation or code analysis is performed before the 'exec()' call.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Mar 28, 2026, 06:07 PM