skills/langwatch/skills/evaluations/Gen Agent Trust Hub

evaluations

Pass

Audited by Gen Agent Trust Hub on Apr 25, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: Fetches documentation and technical specifications from official LangWatch domains (langwatch.ai) to guide the setup of evaluators and experiments.
  • [COMMAND_EXECUTION]: Instructs the agent to execute shell commands to run evaluation scripts, including npx tsx for TypeScript and subprocess.run with jupyter nbconvert for executing Python notebooks.
  • [PROMPT_INJECTION]: The skill exhibits an attack surface for indirect prompt injection (Category 8) as it processes external data to generate evaluation logic.
  • Ingestion points: Reads the agent's codebase, package manifests (package.json, pyproject.toml), git history, and system prompts (SKILL.md).
  • Boundary markers: Absent; there are no specific instructions to ignore embedded commands within the analyzed codebase or prompts.
  • Capability inventory: The skill can create, write to, and execute local files and scripts.
  • Sanitization: No explicit sanitization or validation of the ingested code or prompts is described before they are interpolated into the evaluation scripts.
Audit Metadata
Risk Level
SAFE
Analyzed
Apr 25, 2026, 06:42 PM