evaluate-presets
Warn
Audited by Gen Agent Trust Hub on Mar 10, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill invokes local bash scripts (
./tools/evaluate-preset.shand./tools/evaluate-all-presets.sh) to perform its primary functions. While these are part of the skill's infrastructure, they allow for arbitrary command execution within the agent's environment. - [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it processes external data which then influences downstream agent actions.
- Ingestion points: The skill reads test task definitions from
tools/preset-test-tasks.ymland execution logs from.eval/logs/(includingoutput.logandsession.jsonl). - Boundary markers: The instructions do not specify any delimiters or safety warnings to prevent the agent from obeying instructions embedded within the test tasks or logs during the triage phase.
- Capability inventory: The agent has access to the
bashtool for command execution and specialized sub-agents (/code-task-generator,/code-assist) for creating and applying code modifications. - Sanitization: There is no evidence of sanitization or validation of the content read from the logs or YAML files before it is used to guide the fix implementation process.
- [COMMAND_EXECUTION]: The 'Autonomous Fix Workflow' describes a process where code is generated by an agent and then executed via the evaluation scripts. This dynamic execution of generated content poses a risk if the generation phase is manipulated by malicious input in the logs or task descriptions.
Audit Metadata