deepeval-testing
Fail
Audited by Gen Agent Trust Hub on Feb 16, 2026
Risk Level: HIGHEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [Unverifiable Dependencies] (MEDIUM): The
scripts/init-deepeval.shscript performspip install deepeval pytestwithout version pinning. This makes the environment vulnerable to breaking changes or dependency confusion attacks. - [Indirect Prompt Injection] (HIGH): The skill is designed to process untrusted LLM outputs for evaluation, which serves as an injection vector.
- Ingestion points: In
templates/test_basic.py, theactual_outputparameter inLLMTestCaseis a direct entry point for untrusted data. - Boundary markers: There are no boundary markers or instructions to the agent to ignore embedded commands within the processed data.
- Capability inventory: The skill is granted
Bash,Write, andEdittools inSKILL.md, and includes a shell script (scripts/init-deepeval.sh) for environment modification. - Sanitization: No sanitization or safety filtering is applied to the data before it is passed to the DeepEval metrics.
- [Credential Access] (LOW):
templates/conftest.pyprogrammatically accesses theOPENAI_API_KEYenvironment variable. While expected for an LLM testing tool, it identifies a high-value target for any potential exfiltration payload.
Recommendations
- AI detected serious security threats
Audit Metadata