paper-revision

Pass

Audited by Gen Agent Trust Hub on Feb 22, 2026

Risk Level: SAFEPROMPT_INJECTIONCOMMAND_EXECUTION
Full Analysis
  • PROMPT_INJECTION (LOW): The skill is vulnerable to Indirect Prompt Injection because it ingests and acts upon untrusted reviewer feedback ($0).
  • Ingestion points: Reviewer comments input ($0) defined in SKILL.md and processed throughout the workflow.
  • Boundary markers: Absent. The prompts in references/revision-prompts.md (e.g., in the 'Targeted Edit Prompts' and 'For Missing Experiment Concerns' sections) interpolate the {reviewer_comment} directly into the agent's instructions without delimiters or clear separation between system instructions and untrusted data.
  • Capability inventory: The skill is capable of modifying local files (paper drafts) and invoking the experiment-code skill, which involves dynamic code generation and execution.
  • Sanitization: No sanitization or validation logic is present to filter malicious instructions embedded within reviewer feedback.
  • COMMAND_EXECUTION (LOW): The skill workflow explicitly directs the agent to generate and run code using the experiment-code skill when "additional experiments" are requested by reviewers. While this is part of the intended autonomous research use case, it creates a risk if the reviewer feedback contains malicious instructions designed to exploit the code execution environment.
  • DATA_EXPOSURE (SAFE): The skill accesses local paper drafts ($1) and skill-specific reference files. No evidence of hardcoded credentials, sensitive system file access, or unauthorized network exfiltration was detected.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 22, 2026, 05:00 AM