evaluate-presets

Warn

Audited by Gen Agent Trust Hub on Mar 10, 2026

Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [COMMAND_EXECUTION]: The skill invokes local bash scripts (./tools/evaluate-preset.sh and ./tools/evaluate-all-presets.sh) to perform its primary functions. While these are part of the skill's infrastructure, they allow for arbitrary command execution within the agent's environment.
  • [PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it processes external data which then influences downstream agent actions.
  • Ingestion points: The skill reads test task definitions from tools/preset-test-tasks.yml and execution logs from .eval/logs/ (including output.log and session.jsonl).
  • Boundary markers: The instructions do not specify any delimiters or safety warnings to prevent the agent from obeying instructions embedded within the test tasks or logs during the triage phase.
  • Capability inventory: The agent has access to the bash tool for command execution and specialized sub-agents (/code-task-generator, /code-assist) for creating and applying code modifications.
  • Sanitization: There is no evidence of sanitization or validation of the content read from the logs or YAML files before it is used to guide the fix implementation process.
  • [COMMAND_EXECUTION]: The 'Autonomous Fix Workflow' describes a process where code is generated by an agent and then executed via the evaluation scripts. This dynamic execution of generated content poses a risk if the generation phase is manipulated by malicious input in the logs or task descriptions.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 10, 2026, 09:18 AM