tbench

Warn

Audited by Snyk on Feb 28, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.70). The SKILL.md workflow explicitly clones and reads a public HuggingFace leaderboard repo ("git clone https://huggingface.co/datasets/alexgshaw/terminal-bench-2-leaderboard") and inspects result.json files from those submissions, so the agent ingests untrusted, user-submitted third-party content that can influence analysis and subsequent decisions.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 28, 2026, 09:57 AM