huggingface-community-evals

Pass

Audited by Gen Agent Trust Hub on Mar 23, 2026

Risk Level: SAFECOMMAND_EXECUTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [Shell Command Execution]: The skill uses subprocess.run() to execute evaluation CLI tools like inspect and lighteval. This is a standard pattern for Python scripts acting as wrappers for CLI frameworks.
  • Evidence found in scripts/inspect_eval_uv.py, scripts/inspect_vllm_uv.py, and scripts/lighteval_vllm_uv.py.
  • [Remote Code Execution via Model Loading]: The scripts include a --trust-remote-code flag. When enabled by the user, this allows the underlying libraries (transformers, vllm) to execute custom code provided by model authors on the Hugging Face Hub. This is a common but sensitive feature in the machine learning ecosystem required for certain model architectures.
  • Documentation and implementation found in SKILL.md and all script files.
  • [Credential Management]: The skill correctly advises users to manage their Hugging Face tokens via environment variables (HF_TOKEN) and provides a template in examples/.env.example. This follows security best practices for secret management by avoiding hardcoded keys.
  • Evidence in examples/.env.example and environment setup functions in scripts.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 23, 2026, 10:18 PM