agent-evaluation

Warn

Audited by Gen Agent Trust Hub on Mar 6, 2026

Risk Level: MEDIUMREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [REMOTE_CODE_EXECUTION]: The skill provides code templates that utilize the exec() function to perform code-based grading of agent outputs.
  • Evidence: Found in SKILL.md under Example 1 (Simple Coding Agent Eval).
  • Risk: This pattern allows for the execution of arbitrary Python code. If the agent evaluates untrusted output without a strictly enforced sandbox, it can lead to code execution on the host environment.
  • [COMMAND_EXECUTION]: The skill recommends using subprocess.run() to execute terminal commands such as pytest for grading coding tasks.
  • Evidence: Found in SKILL.md in the grade_swe_bench and grade_coding_agent function templates.
  • Risk: If inputs such as repo_path or test_file are derived from untrusted sources without validation, it could lead to command injection or unauthorized filesystem access.
  • [PROMPT_INJECTION]: The skill is susceptible to indirect prompt injection due to its core functionality of ingesting and analyzing untrusted data from other agents.
  • Ingestion points: The skill processes external data via outcome["code"], qa_case["input"], and agent transcripts in SKILL.md.
  • Boundary markers: The provided grader templates do not include delimiters or instructions to ignore embedded commands in the data being processed.
  • Capability inventory: The skill utilizes Shell, Write, and Read tools, and includes instructions for command and code execution.
  • Sanitization: No explicit validation or sanitization of the ingested content is shown in the provided grading examples.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Mar 6, 2026, 07:03 AM