llm-evaluation

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • Dynamic Execution (HIGH): The skill documentation and templates (e.g., templates/promptfooconfig.yaml, references/evaluation-metrics.md) explicitly show how to use javascript and python assertions. This mechanism executes arbitrary code defined within YAML configuration files, posing a direct RCE risk if those files are sourced from untrusted PRs.\n- Indirect Prompt Injection (HIGH): The skill possesses a high-risk vulnerability surface when used in CI/CD quality gates as described in references/ci-cd-integration.md.\n
  • Ingestion points: Processes external prompt files, YAML configs, and CSV/JSON datasets (e.g., tests/cases.csv).\n
  • Boundary markers: Lacks explicit sanitization or boundary markers for test data.\n
  • Capability inventory: Promptfoo provides arbitrary JS/Python code execution (Cat 10), subprocess calls (npx), and network access for API requests.\n
  • Sanitization: No input sanitization is implemented for test variables interpolated into prompts.\n
  • Risk: A crafted PR could use malicious prompts or assertions to trigger unintended code execution or exfiltrate secrets (like API keys) from the CI environment.\n- Unverifiable Dependencies (MEDIUM): The quick start and core guides recommend npx promptfoo@latest, which downloads and executes code from a third-party npm package at runtime. This bypasses supply chain security best practices like version pinning and lockfile verification for a package not included in the trusted sources list.\n- Command Execution (MEDIUM): The skill provides multiple examples of executing shell commands and scripts (e.g., npx promptfoo@latest eval, redteam run) which, when combined with the dynamic assertion capability, increases the potential for unauthorized command execution on the host system.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 03:34 AM