sparse-autoencoder-training

Pass

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: LOWEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS] (LOW): The skill references the sae-lens and transformer-lens Python packages and includes examples of downloading pre-trained models and datasets from HuggingFace and Neuronpedia. These are standard in the research community but represent external network dependencies.
  • [PROMPT_INJECTION] (LOW): The skill demonstrates processing external text prompts to analyze model activations (Category 8: Indirect Prompt Injection surface). The capabilities associated with this (steering, ablation) primarily influence internal model state and do not escalate to file system or network-level threats in the provided context.
  • [CREDENTIALS_UNSAFE] (INFO): The upload_saes_to_huggingface function includes a placeholder for an API token (hf_token). While safe as a placeholder, it highlights a point where users must manage secrets carefully.
Audit Metadata
Risk Level
LOW
Analyzed
Feb 16, 2026, 03:23 AM