receiving-code-review

Fail

Audited by Gen Agent Trust Hub on Feb 16, 2026

Risk Level: HIGHPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • PROMPT_INJECTION (HIGH): The skill is highly susceptible to Indirect Prompt Injection. It is designed to ingest and act upon 'External Reviewer' feedback, which represents untrusted data entering the agent's context. 1. Ingestion points: Feedback from external reviewers via GitHub comments or text. 2. Boundary markers: No delimiters or ignore-instructions are specified. 3. Capability inventory: Significant write capabilities including codebase modification ('IMPLEMENT') and GitHub API operations ('gh api'). 4. Sanitization: No sanitization or validation of external feedback is mentioned. An attacker could provide a code review comment containing instructions to exfiltrate data or inject vulnerabilities, which the agent is directed to evaluate and potentially implement.
  • COMMAND_EXECUTION (MEDIUM): The skill directs the agent to perform GitHub API operations (gh api repos/{owner}/{repo}/pulls/{pr}/comments/{id}/replies). If the parameters for these commands are derived from untrusted metadata provided by an external reviewer, it could lead to unauthorized API interactions or command manipulation.
  • PROMPT_INJECTION (LOW): The skill uses strong behavioral steering (e.g., 'NEVER', 'FORBIDDEN') to strictly control agent responses and suppress default behaviors like expressions of gratitude. While intended to enforce technical rigor, this pattern of overriding model personality can be exploited to bypass conversational safety filters.
Recommendations
  • AI detected serious security threats
Audit Metadata
Risk Level
HIGH
Analyzed
Feb 16, 2026, 01:15 PM