evaluating-llms-harness
Warn
Audited by Snyk on Mar 28, 2026
Risk Level: MEDIUM
Full Analysis
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 0.90). The skill explicitly fetches and ingests untrusted third‑party content — e.g., pretrained models and datasets from HuggingFace and external APIs as shown in SKILL.md and references/api-evaluation.md (model args like pretrained=meta-llama/..., dataset_path: squad or local/HuggingFace datasets), and it even runs/evaluates model-generated code (see "HumanEval not executing code" / --allow_code_execution and references/custom-tasks.md execute_code), so external/user-generated content can be executed or otherwise materially influence runtime behavior.
Issues (1)
W011
MEDIUMThird-party content exposure detected (indirect prompt injection risk).
Audit Metadata