auto-benchmark
Pass
Audited by Gen Agent Trust Hub on Mar 18, 2026
Risk Level: SAFEPROMPT_INJECTIONEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONREMOTE_CODE_EXECUTION
Full Analysis
- [PROMPT_INJECTION]: The skill is susceptible to Indirect Prompt Injection (Category 8).\n
- Ingestion points: Competitive leaderboard URLs (Phase 2.1), arXiv queries, and competitor research blogs (Phase 3.1).\n
- Boundary markers: None. The skill lacks instructions to sanitize or ignore instructions embedded within the ingested research content.\n
- Capability inventory: The skill manages an automated 'Runner' (Phase 5.2) that executes code and evaluation suites based on findings.\n
- Sanitization: None. Techniques extracted from external papers are queued for implementation and execution without an explicit validation step.\n- [EXTERNAL_DOWNLOADS]: The skill performs automated scraping of leaderboard data and research content from external domains.\n- [COMMAND_EXECUTION]: The workflow involves running training and evaluation processes via the command line (Phase 5.2).\n- [REMOTE_CODE_EXECUTION]: Ingesting and acting upon 'techniques' from untrusted sources (Phase 3.2 to 5.2) creates a path for executing malicious logic if an attacker controls the source content (e.g., a poisoned research paper).
Audit Metadata