The Agent Skills Directory

Dynamic Execution (HIGH): The skill documentation and templates (e.g., templates/promptfooconfig.yaml, references/evaluation-metrics.md) explicitly show how to use javascript and python assertions. This mechanism executes arbitrary code defined within YAML configuration files, posing a direct RCE risk if those files are sourced from untrusted PRs.\n- Indirect Prompt Injection (HIGH): The skill possesses a high-risk vulnerability surface when used in CI/CD quality gates as described in references/ci-cd-integration.md.\n
Ingestion points: Processes external prompt files, YAML configs, and CSV/JSON datasets (e.g., tests/cases.csv).\n
Boundary markers: Lacks explicit sanitization or boundary markers for test data.\n
Capability inventory: Promptfoo provides arbitrary JS/Python code execution (Cat 10), subprocess calls (npx), and network access for API requests.\n
Sanitization: No input sanitization is implemented for test variables interpolated into prompts.\n
Risk: A crafted PR could use malicious prompts or assertions to trigger unintended code execution or exfiltrate secrets (like API keys) from the CI environment.\n- Unverifiable Dependencies (MEDIUM): The quick start and core guides recommend npx promptfoo@latest, which downloads and executes code from a third-party npm package at runtime. This bypasses supply chain security best practices like version pinning and lockfile verification for a package not included in the trusted sources list.\n- Command Execution (MEDIUM): The skill provides multiple examples of executing shell commands and scripts (e.g., npx promptfoo@latest eval, redteam run) which, when combined with the dynamic assertion capability, increases the potential for unauthorized command execution on the host system.

llm-evaluation