deepeval-testing

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [Unverifiable Dependencies] (MEDIUM): The scripts/init-deepeval.sh script performs pip install deepeval pytest without version pinning. This makes the environment vulnerable to breaking changes or dependency confusion attacks.
  • [Indirect Prompt Injection] (HIGH): The skill is designed to process untrusted LLM outputs for evaluation, which serves as an injection vector.
  • Ingestion points: In templates/test_basic.py, the actual_output parameter in LLMTestCase is a direct entry point for untrusted data.
  • Boundary markers: There are no boundary markers or instructions to the agent to ignore embedded commands within the processed data.
  • Capability inventory: The skill is granted Bash, Write, and Edit tools in SKILL.md, and includes a shell script (scripts/init-deepeval.sh) for environment modification.
  • Sanitization: No sanitization or safety filtering is applied to the data before it is passed to the DeepEval metrics.
  • [Credential Access] (LOW): templates/conftest.py programmatically accesses the OPENAI_API_KEY environment variable. While expected for an LLM testing tool, it identifies a high-value target for any potential exfiltration payload.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 09:36 AM