paper-revision
Pass
Audited by Gen Agent Trust Hub on Feb 20, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- PROMPT_INJECTION (LOW): Detected surface for Indirect Prompt Injection (Category 8) where malicious instructions in reviewer comments can manipulate agent behavior.
- Evidence Chain:
- Ingestion points: The skill accepts untrusted external data via
$0(reviewer comments) as defined inSKILL.md. - Boundary markers: Absent. The prompts in
references/revision-prompts.mdinterpolate the{reviewer_comment}and{concern}variables directly into instructions without delimiters or 'ignore embedded instructions' warnings. - Capability inventory: The skill performs file-writing operations to
.texfiles and invokes high-capability downstream skills such asexperiment-code, which likely executes shell commands or Python scripts. - Sanitization: No sanitization, validation, or escaping of the input reviewer comments is present in the workflow.
- SAFE (SAFE): No evidence of hardcoded credentials, malicious obfuscation, persistence mechanisms, or direct unauthorized network exfiltration was found in the provided files.
Audit Metadata