transformer-lens-interpretability

Originally fromzechenzhangagi/ai-research-skills

Installation

SKILL.md

TransformerLens: Mechanistic Interpretability for Transformers

TransformerLens is the de facto standard library for mechanistic interpretability research on GPT-style language models. Created by Neel Nanda and maintained by Bryce Meyer, it provides clean interfaces to inspect and manipulate model internals via HookPoints on every activation.

GitHub: TransformerLensOrg/TransformerLens (2,900+ stars)

When to Use TransformerLens

Use TransformerLens when you need to:

Reverse-engineer algorithms learned during training
Perform activation patching / causal tracing experiments
Study attention patterns and information flow
Analyze circuits (e.g., induction heads, IOI circuit)
Cache and inspect intermediate activations
Apply direct logit attribution

Installs

325

Repository

davila7/claude-…emplates

GitHub Stars

30.0K

First Seen

Jan 21, 2026

Security Audits

Gen Agent Trust HubPass

transformer-lens-interpretability — davila7/claude-code-templates