eval
Warn
Audited by Gen Agent Trust Hub on Mar 17, 2026
Risk Level: MEDIUMCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [COMMAND_EXECUTION]: The skill is designed to run a user-defined evaluation command (
{eval_cmd}) within the worktree of an agent. This pattern allows for the execution of arbitrary shell commands if the configuration is sourced from an untrusted or improperly sanitized origin. - [PROMPT_INJECTION]: The skill implements an 'LLM Judge' mode that consumes and processes agent-generated data, creating a surface for indirect prompt injection.
- Ingestion points: The judge reads agent result posts located at
.agenthub/board/results/agent-{i}-result.mdand the output ofgit diffcommands. - Boundary markers: Absent. The description does not mention the use of delimiters or 'ignore' instructions to prevent the LLM from obeying commands embedded in agent results.
- Capability inventory: The skill executes local Python scripts (
result_ranker.py,session_manager.py) to rank results and update session states, which could be influenced by a successful injection. - Sanitization: Absent. There is no indication that agent-generated content is sanitized, escaped, or validated before being passed to the LLM judge.
Audit Metadata