evaluation-metrics
Warn
Audited by Gen Agent Trust Hub on Mar 1, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
- [COMMAND_EXECUTION]: The
evaluate_humanevalfunction inSKILL.mdutilizes thehuman_eval.execution.check_correctnessmethod. This method is designed to execute Python code generated by a language model to verify its functional correctness. Executing unverified model output on a host system poses a significant security risk, as a malicious or compromised model could generate code that performs unauthorized file access, network operations, or other harmful actions.- [PROMPT_INJECTION]: Several components, includingRAGMetricsandHallucinationDetector, are vulnerable to Indirect Prompt Injection. These components interpolate untrusted data (such as model predictions and retrieved contexts) directly into instructions given to another LLM without proper sanitization. - Ingestion points: The variables
prediction,context, andanswerused inSKILL.mdandscripts/llm_evaluator.pyare populated from potentially untrusted external sources. - Boundary markers: The evaluation prompts lack clear delimiters (like XML tags or triple quotes) and instructions to ignore any commands embedded within the data variables.
- Capability inventory: The skill possesses the capability to generate text via an LLM and, critically, execute code via the HumanEval benchmark logic.
- Sanitization: There is no evidence of filtering, escaping, or validation of the input data before it is interpolated into the evaluation prompts.- [EXTERNAL_DOWNLOADS]: The skill fetches evaluation metric scripts and pre-trained models (such as BERTScore models) from Hugging Face's official repositories using the
evaluateandtransformerslibraries.
Audit Metadata