paper-revision

Pass

Audited by Gen Agent Trust Hub on Feb 20, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • PROMPT_INJECTION (LOW): Detected surface for Indirect Prompt Injection (Category 8) where malicious instructions in reviewer comments can manipulate agent behavior.
  • Evidence Chain:
  • Ingestion points: The skill accepts untrusted external data via $0 (reviewer comments) as defined in SKILL.md.
  • Boundary markers: Absent. The prompts in references/revision-prompts.md interpolate the {reviewer_comment} and {concern} variables directly into instructions without delimiters or 'ignore embedded instructions' warnings.
  • Capability inventory: The skill performs file-writing operations to .tex files and invokes high-capability downstream skills such as experiment-code, which likely executes shell commands or Python scripts.
  • Sanitization: No sanitization, validation, or escaping of the input reviewer comments is present in the workflow.
  • SAFE (SAFE): No evidence of hardcoded credentials, malicious obfuscation, persistence mechanisms, or direct unauthorized network exfiltration was found in the provided files.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 20, 2026, 05:23 AM