hugging-face-evaluation

Pass

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONCREDENTIALS_UNSAFE
Full Analysis
  • [COMMAND_EXECUTION]: The skill uses uv run to execute local Python scripts located in the scripts/ directory for tasks such as parsing README tables and managing model metadata.
  • [REMOTE_CODE_EXECUTION]: It leverages the Hugging Face Jobs infrastructure via hf jobs uv run to perform model evaluations on remote CPU and GPU hardware provisioned by Hugging Face.
  • [CREDENTIALS_UNSAFE]: The skill requires HF_TOKEN for Hugging Face repository access and AA_API_KEY for the Artificial Analysis API. It provides instructions for managing these via environment variables and passing them as secrets to remote jobs, which is consistent with standard platform usage.
  • [COMMAND_EXECUTION]: The skill documents the use of the --trust-remote-code flag for models requiring custom code execution. This is a standard feature of the Hugging Face Transformers and vLLM libraries that allows the execution of arbitrary Python code defined within a model's repository.
  • [DATA_EXFILTRATION]: It performs automated extraction of evaluation data from Hugging Face README files and fetches benchmark scores from the Artificial Analysis API to populate model metadata.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 10, 2026, 01:50 PM