interview
Fail
Audited by Gen Agent Trust Hub on Feb 15, 2026
Risk Level: HIGHPROMPT_INJECTION
Full Analysis
- Prompt Injection (HIGH): The skill uses the
$ARGUMENTSplaceholder directly within an<instructions>block. A malicious user or an upstream tool providing these arguments could include commands that bypass the intended logic or safety filters. - Indirect Prompt Injection (HIGH):
- Ingestion points: The skill ingests untrusted data from two primary sources: the
$ARGUMENTSvariable and the local codebase via theTasktool withsubagent_type=Explore. - Boundary markers: The skill uses minimal delimiters (
<instructions>) for user arguments, which are easily escapable. No boundary markers are defined for the content extracted from the codebase during Phase 1. - Capability inventory: The skill is granted the
Writetool (enabling filesystem modification) and theTasktool (enabling the spawning of further subagents). - Sanitization: No sanitization, escaping, or validation logic is present to filter malicious instructions embedded within the codebase files or user arguments.
- Evidence: Phase 1 performs a 'deep-dive on the codebase' and Phase 3 uses the
Writetool. If a file in the explored codebase contains a hidden instruction (e.g., in a comment), it could influence the agent to write a backdoor or exfiltrate data during the specification phase.
Recommendations
- AI detected serious security threats
Audit Metadata