evaluations
Pass
Audited by Gen Agent Trust Hub on Apr 25, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: Fetches documentation and technical specifications from official LangWatch domains (
langwatch.ai) to guide the setup of evaluators and experiments. - [COMMAND_EXECUTION]: Instructs the agent to execute shell commands to run evaluation scripts, including
npx tsxfor TypeScript andsubprocess.runwithjupyter nbconvertfor executing Python notebooks. - [PROMPT_INJECTION]: The skill exhibits an attack surface for indirect prompt injection (Category 8) as it processes external data to generate evaluation logic.
- Ingestion points: Reads the agent's codebase, package manifests (package.json, pyproject.toml), git history, and system prompts (SKILL.md).
- Boundary markers: Absent; there are no specific instructions to ignore embedded commands within the analyzed codebase or prompts.
- Capability inventory: The skill can create, write to, and execute local files and scripts.
- Sanitization: No explicit sanitization or validation of the ingested code or prompts is described before they are interpolated into the evaluation scripts.
Audit Metadata