NYC

simpo-training

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTION
Full Analysis
  • EXTERNAL_DOWNLOADS (LOW): The skill instructions involve cloning an external repository to obtain training scripts.
  • Evidence: git clone https://github.com/huggingface/alignment-handbook.git in SKILL.md.
  • Trust Status: Hugging Face (huggingface) is a recognized trusted organization. The finding is downgraded per [TRUST-SCOPE-RULE].
  • REMOTE_CODE_EXECUTION (LOW): The skill executes code from a recently cloned repository and installs it as a package.
  • Evidence: python -m pip install . followed by accelerate launch ... scripts/run_simpo.py in SKILL.md.
  • Context: The execution is performed on code from a trusted source (Hugging Face).
  • INDIRECT_PROMPT_INJECTION (LOW): The training process ingests untrusted datasets from the Hugging Face Hub (e.g., ultrafeedback_binarized).
  • Ingestion points: dataset_mixer entries in YAML configurations within SKILL.md.
  • Boundary markers: Absent; training processes typically ingest raw text.
  • Capability inventory: The skill triggers model training via accelerate launch, which is a high-compute capability.
  • Sanitization: Not explicitly implemented in the provided documentation, relying on the underlying training scripts.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 05:56 PM