promptfoo-evaluation

Warn

Audited by Gen Agent Trust Hub on Feb 23, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONEXTERNAL_DOWNLOADSREMOTE_CODE_EXECUTIONDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • Dynamic Execution (MEDIUM): The skill demonstrates how to configure Promptfoo to load and execute custom Python scripts from the local filesystem for assertions (e.g., file://scripts/metrics.py:custom_assert). This allows for arbitrary code execution in the evaluation environment.\n- External Downloads & Remote Code Execution (LOW): The instructions suggest using npx promptfoo@latest, which downloads and executes the package directly from the npm registry.\n- Data Exposure (LOW): Documentation in SKILL.md reveals a specific local path (/Users/tiansheng/Workspace/prompts/tiaogaoren/), exposing a system username.\n- Indirect Prompt Injection (LOW): The skill identifies a pattern where data is loaded from files and interpolated into LLM prompts without explicit sanitization or boundary markers.\n
  • Ingestion points: promptfooconfig.yaml and tests/cases.yaml use the file:// scheme to import external content.\n
  • Boundary markers: Absent in the provided prompt templates.\n
  • Capability inventory: Promptfoo executes shell commands (npx), runs Python scripts, and makes network requests to external providers.\n
  • Sanitization: Absent.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 23, 2026, 05:28 AM