evaluating-code-models

Fail

Audited by Socket on Feb 15, 2026

1 alert found:

Obfuscated File
Obfuscated FileHIGH
SKILL.md

The evaluated document describes a legitimate benchmarking tool whose capabilities (model downloading, generation, and optional execution) are necessary for computing pass@k metrics. I found no hardcoded secrets, obfuscated payloads, or direct malware in the provided text. The primary security concerns are operational: enabling --trust_remote_code and --allow_code_execution, especially combined with --use_auth_token or mounting host files into containers, can lead to arbitrary code execution and credential exposure if used with untrusted models or unisolated environments. Recommended mitigations: prefer generation-only runs on host and execute tests inside vetted containers or isolated VMs, run containers with restricted network (--network=none) and minimal mounts, avoid --trust_remote_code for third-party models, and avoid exposing long-lived tokens in the runtime. Overall the package is functionally appropriate but operationally risky without careful isolation.

Confidence: 98%
Audit Metadata
Analyzed At
Feb 15, 2026, 09:41 PM
Package URL
pkg:socket/skills-sh/orchestra-research%2Fai-research-skills%2Fevaluating-code-models%2F@c885cfe00d7f9329d4c9819e2d236ce6f7109811