hugging-face-evaluation
Pass
Audited by Gen Agent Trust Hub on Mar 10, 2026
Risk Level: SAFECOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONCREDENTIALS_UNSAFE
Full Analysis
- [COMMAND_EXECUTION]: The skill uses
uv runto execute local Python scripts located in thescripts/directory for tasks such as parsing README tables and managing model metadata. - [REMOTE_CODE_EXECUTION]: It leverages the Hugging Face Jobs infrastructure via
hf jobs uv runto perform model evaluations on remote CPU and GPU hardware provisioned by Hugging Face. - [CREDENTIALS_UNSAFE]: The skill requires
HF_TOKENfor Hugging Face repository access andAA_API_KEYfor the Artificial Analysis API. It provides instructions for managing these via environment variables and passing them as secrets to remote jobs, which is consistent with standard platform usage. - [COMMAND_EXECUTION]: The skill documents the use of the
--trust-remote-codeflag for models requiring custom code execution. This is a standard feature of the Hugging Face Transformers and vLLM libraries that allows the execution of arbitrary Python code defined within a model's repository. - [DATA_EXFILTRATION]: It performs automated extraction of evaluation data from Hugging Face README files and fetches benchmark scores from the Artificial Analysis API to populate model metadata.
Audit Metadata