grpo-rl-training

Warn

Audited by Gen Agent Trust Hub on Feb 15, 2026

Risk Level: MEDIUMEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • Indirect Prompt Injection (MEDIUM): The skill loads external datasets using datasets.load_dataset in templates/basic_grpo_training.py. Maliciously crafted data in the dataset could influence the training process or poison the resulting model. Evidence Chain: 1. Ingestion points: get_dataset function loads external data ('openai/gsm8k'). 2. Boundary markers: The prompt uses XML tags for structure but lacks explicit instructions to ignore embedded commands within the dataset content. 3. Capability inventory: Performs model training and writes outputs to the filesystem (outputs/grpo-model). 4. Sanitization: No sanitization of dataset content is performed.
  • External Downloads (LOW): The script downloads models and datasets from Hugging Face and connects to Weights & Biases (wandb) for logging. These involve network operations to external domains. Downloads from Hugging Face are considered LOW risk per [TRUST-SCOPE-RULE].
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 15, 2026, 08:20 PM