paper2code

Fail

Audited by Gen Agent Trust Hub on Feb 15, 2026

Risk Level: CRITICALREMOTE_CODE_EXECUTIONCOMMAND_EXECUTIONPROMPT_INJECTIONEXTERNAL_DOWNLOADS
Full Analysis
  • [PROMPT_INJECTION] (HIGH): The file 04_implementation_guide.md contains instructions that command the agent to bypass human-in-the-loop oversight. Specifically, it states '파일 간 허락을 구하지 말 것' (Do not ask permission between files) and '완료될 때까지 계속 구현' (Continue implementing until finished). This is a direct attempt to override standard AI safety guardrails for tool usage.
  • [REMOTE_CODE_EXECUTION] (HIGH): The skill's primary function is to generate and execute code based on external, untrusted research papers. Automated scans identified an RCE pattern where a file is downloaded from a remote source (arXiv) and subsequently staged for execution via the agent's bash and python tools.
  • [COMMAND_EXECUTION] (HIGH): The skill relies on direct shell command execution for environment setup and file management, using commands like uv init, uv add, and cat autonomously. Phase 4 mandates the use of these tools to 'directly replicate' papers without user intervention.
  • [EXTERNAL_DOWNLOADS] (MEDIUM): Phase 0 (05_reference_search.md) directs the agent to search for and evaluate third-party code repositories on GitHub. While it includes basic advice on checking licenses, it encourages the ingestion and potential execution of unverified external code.
  • [INDIRECT_PROMPT_INJECTION] (HIGH): The skill is highly vulnerable to indirect prompt injection because it consumes untrusted data (papers) and possesses high-privilege write/execute capabilities. It lacks boundary markers and sanitization, and requires the agent to 'accurately copy' technical details, which could include malicious payloads. (Ingestion: Research papers; Boundaries: Missing; Capabilities: Shell/Python execution; Sanitization: None).
Recommendations
  • CRITICAL: Downloads and executes remote code from untrusted source(s): https://arxiv.org/pdf/xxxx.xxxxx.pdf - DO NOT USE
  • AI detected serious security threats
Audit Metadata
Risk Level
CRITICAL
Analyzed
Feb 15, 2026, 04:09 PM