dec-bench-evals
Pass
Audited by Gen Agent Trust Hub on Mar 11, 2026
Risk Level: SAFEREMOTE_CODE_EXECUTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [REMOTE_CODE_EXECUTION]: The skill's documentation in
references/guide.mdincludes a shell command to install the tool via a piped remote script (curl -fsSL https://decbench.ai/install.sh | sh), which is an inherently risky pattern despite being intended for installation. - [EXTERNAL_DOWNLOADS]: The skill performs external network operations to download the
dec-benchCLI and to interact with thedecbench.airegistry for publishing and managing evaluation scenarios. - [COMMAND_EXECUTION]: The agent is instructed to execute several CLI commands to manage the lifecycle of evaluation scenarios, including
dec-bench create,validate,build,run, andregistry publish, as well as checking GitHub authentication usinggh auth status. - [PROMPT_INJECTION]: An indirect prompt injection surface is created as the skill generates markdown prompts and TypeScript assertion scripts that are later ingested and executed by the evaluation framework.
- Ingestion points: The skill creates files like
prompts/naive.md,prompts/savvy.md, andassertions/*.tsthat contain instructions processed at runtime. - Boundary markers: There are no explicit markers or safety instructions in the generated files to delimit generated content from the agent's core instructions.
- Capability inventory: The framework has the ability to execute code, manage Docker containers, and perform database queries on ClickHouse and Postgres.
- Sanitization: The skill does not implement validation or sanitization of the content it authors before it is used by the framework.
Audit Metadata