sparse-autoencoder-training
Pass
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: LOWEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS] (LOW): The skill references the
sae-lensandtransformer-lensPython packages and includes examples of downloading pre-trained models and datasets from HuggingFace and Neuronpedia. These are standard in the research community but represent external network dependencies. - [PROMPT_INJECTION] (LOW): The skill demonstrates processing external text prompts to analyze model activations (Category 8: Indirect Prompt Injection surface). The capabilities associated with this (steering, ablation) primarily influence internal model state and do not escalate to file system or network-level threats in the provided context.
- [CREDENTIALS_UNSAFE] (INFO): The
upload_saes_to_huggingfacefunction includes a placeholder for an API token (hf_token). While safe as a placeholder, it highlights a point where users must manage secrets carefully.
Audit Metadata