transformer-lens-interpretability

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • CREDENTIALS_UNSAFE (SAFE): The documentation mentions setting a Hugging Face token via os.environ["HF_TOKEN"] = "your_token". This uses a clear placeholder and is not a credential leak.
  • EXTERNAL_DOWNLOADS (SAFE): The skill references the standard transformer-lens Python package and links to its official GitHub repository and documentation. These are reputable sources within the AI research community.
  • COMMAND_EXECUTION (SAFE): Code snippets include standard Python library usage for loading and analyzing transformer models. No high-risk or arbitrary command execution patterns were found.
  • PROMPT_INJECTION (SAFE): The provided documentation does not contain any instructions aimed at bypassing AI safety filters or overriding system prompts.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 04:56 PM