hugging-face-evaluation
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [REMOTE_CODE_EXECUTION] (LOW): Several scripts, including
inspect_vllm_uv.pyandlighteval_vllm_uv.py, include an optional--trust-remote-codeflag. This allows the execution of arbitrary Python code defined within a model's repository on the Hugging Face Hub. This is a standard but high-risk feature in the ML ecosystem; it is appropriately defaulted to disabled and requires explicit user activation. - [COMMAND_EXECUTION] (SAFE): The skill invokes external CLI tools like
inspectandlightevalusingsubprocess.runwith list-based arguments. This method is secure as it avoids shell interpretation and prevents command injection from user-provided model or task names. - [EXTERNAL_DOWNLOADS] (SAFE): The skill communicates with
huggingface.co(a trusted domain) andartificialanalysis.aito retrieve model metadata and benchmark scores. These interactions are consistent with the skill's purpose and use legitimate APIs. - [PROMPT_INJECTION] (LOW): The skill contains an indirect prompt injection surface by parsing external README files from the Hub. 1. Ingestion points:
evaluation_manager.pyparses README markdown content. 2. Boundary markers: No explicit delimiters are used to wrap the extracted data. 3. Capability inventory: The skill can update model repositories and trigger evaluation jobs. 4. Sanitization: The logic focuses on extracting numeric data from tables, which provides some structural filtering, but it does not perform exhaustive sanitization of the strings that may be incorporated into model metadata or future prompts.
Audit Metadata