dec-bench-evals

Pass

Audited by Gen Agent Trust Hub on Mar 11, 2026

Risk Level: SAFEREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The skill's documentation in references/guide.md includes a shell command to install the tool via a piped remote script (curl -fsSL https://decbench.ai/install.sh | sh), which is an inherently risky pattern despite being intended for installation.
  • [EXTERNAL_DOWNLOADS]: The skill performs external network operations to download the dec-bench CLI and to interact with the decbench.ai registry for publishing and managing evaluation scenarios.
  • [COMMAND_EXECUTION]: The agent is instructed to execute several CLI commands to manage the lifecycle of evaluation scenarios, including dec-bench create, validate, build, run, and registry publish, as well as checking GitHub authentication using gh auth status.
  • [PROMPT_INJECTION]: An indirect prompt injection surface is created as the skill generates markdown prompts and TypeScript assertion scripts that are later ingested and executed by the evaluation framework.
  • Ingestion points: The skill creates files like prompts/naive.md, prompts/savvy.md, and assertions/*.ts that contain instructions processed at runtime.
  • Boundary markers: There are no explicit markers or safety instructions in the generated files to delimit generated content from the agent's core instructions.
  • Capability inventory: The framework has the ability to execute code, manage Docker containers, and perform database queries on ClickHouse and Postgres.
  • Sanitization: The skill does not implement validation or sanitization of the content it authors before it is used by the framework.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 11, 2026, 11:58 PM