tbench
Warn
Audited by Snyk on Feb 28, 2026
Risk Level: MEDIUM
Full Analysis
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 0.70). The SKILL.md workflow explicitly clones and reads a public HuggingFace leaderboard repo ("git clone https://huggingface.co/datasets/alexgshaw/terminal-bench-2-leaderboard") and inspects result.json files from those submissions, so the agent ingests untrusted, user-submitted third-party content that can influence analysis and subsequent decisions.
Audit Metadata