archive-crawler

Pass

Audited by Gen Agent Trust Hub on May 4, 2026

Risk Level: SAFEDATA_EXFILTRATIONCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
  • [DATA_EXFILTRATION]: The skill is designed to access and read sensitive personal data, including journals, emails (.mbox, .pst), and letters. Although the skill implements a mandatory allow-list safety gate in gbrain.yml and explicitly skips system/config files, the ingestion of personal archives constitutes a significant data exposure surface.
  • [COMMAND_EXECUTION]: The skill invokes external CLI tools (antiword, catdoc, readpst) to process legacy document and email formats. Executing these parsers on untrusted or potentially malformed archive content could expose the system to vulnerabilities within those tools.
  • [REMOTE_CODE_EXECUTION]: The skill executes inline Python scripts (python3 -c) to parse .docx files and validate .pst archives. This involves dynamic execution of code logic on data provided by external sources.
  • [PROMPT_INJECTION]: The skill has an attack surface for indirect prompt injection as it processes untrusted content from archives and emails that could contain hidden instructions for the agent.
  • Ingestion points: Content extracted from crawled files, including HTML, emails, and documents (SKILL.md).
  • Boundary markers: The skill lacks explicit delimiters or instructions to the agent to ignore instructions embedded within the processed archive data.
  • Capability inventory: The skill possesses the ability to execute shell commands, run Python code, and write new content to the agent's persistent storage (writes_pages: true).
  • Sanitization: Sanitization is limited to stripping HTML tags for display, which does not mitigate the risk of natural language prompt injection.
Audit Metadata
Risk Level
SAFE
Analyzed
May 4, 2026, 07:01 AM