The Agent Skills Directory

[COMMAND_EXECUTION]: The skill invokes local bash scripts (./tools/evaluate-preset.sh and ./tools/evaluate-all-presets.sh) to perform its primary functions. While these are part of the skill's infrastructure, they allow for arbitrary command execution within the agent's environment.
[PROMPT_INJECTION]: The skill is vulnerable to indirect prompt injection because it processes external data which then influences downstream agent actions.
Ingestion points: The skill reads test task definitions from tools/preset-test-tasks.yml and execution logs from .eval/logs/ (including output.log and session.jsonl).
Boundary markers: The instructions do not specify any delimiters or safety warnings to prevent the agent from obeying instructions embedded within the test tasks or logs during the triage phase.
Capability inventory: The agent has access to the bash tool for command execution and specialized sub-agents (/code-task-generator, /code-assist) for creating and applying code modifications.
Sanitization: There is no evidence of sanitization or validation of the content read from the logs or YAML files before it is used to guide the fix implementation process.
[COMMAND_EXECUTION]: The 'Autonomous Fix Workflow' describes a process where code is generated by an agent and then executed via the evaluation scripts. This dynamic execution of generated content poses a risk if the generation phase is manipulated by malicious input in the logs or task descriptions.

evaluate-presets