tbench
Fail
Audited by Socket on Feb 28, 2026
1 alert found:
MalwareMalwareSKILL.md
HIGHMalwareHIGH
SKILL.md
The code fragment describes a legitimate Terminal-Bench integration for Mux agent benchmarking with CI/CD workflows. The capabilities (environment-driven configuration, task selection, concurrency control, timeouts, result collection, and leaderboard submission) are coherent with the stated purpose. There are no suspicious download/execute patterns, no hardcoded secrets, and no anomalous credential access. Overall, the asset is benign with low security risk and clear, purpose-aligned data flows.
Confidence: 95%Severity: 90%
Audit Metadata