grpo-rl-training
Warn
Audited by Gen Agent Trust Hub on Feb 15, 2026
Risk Level: MEDIUMEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- Indirect Prompt Injection (MEDIUM): The skill loads external datasets using
datasets.load_datasetintemplates/basic_grpo_training.py. Maliciously crafted data in the dataset could influence the training process or poison the resulting model. Evidence Chain: 1. Ingestion points:get_datasetfunction loads external data ('openai/gsm8k'). 2. Boundary markers: The prompt uses XML tags for structure but lacks explicit instructions to ignore embedded commands within the dataset content. 3. Capability inventory: Performs model training and writes outputs to the filesystem (outputs/grpo-model). 4. Sanitization: No sanitization of dataset content is performed. - External Downloads (LOW): The script downloads models and datasets from Hugging Face and connects to Weights & Biases (
wandb) for logging. These involve network operations to external domains. Downloads from Hugging Face are considered LOW risk per [TRUST-SCOPE-RULE].
Audit Metadata