archive-crawler
Pass
Audited by Gen Agent Trust Hub on May 4, 2026
Risk Level: SAFEDATA_EXFILTRATIONCOMMAND_EXECUTIONREMOTE_CODE_EXECUTIONPROMPT_INJECTION
Full Analysis
- [DATA_EXFILTRATION]: The skill is designed to access and read sensitive personal data, including journals, emails (.mbox, .pst), and letters. Although the skill implements a mandatory allow-list safety gate in
gbrain.ymland explicitly skips system/config files, the ingestion of personal archives constitutes a significant data exposure surface. - [COMMAND_EXECUTION]: The skill invokes external CLI tools (
antiword,catdoc,readpst) to process legacy document and email formats. Executing these parsers on untrusted or potentially malformed archive content could expose the system to vulnerabilities within those tools. - [REMOTE_CODE_EXECUTION]: The skill executes inline Python scripts (
python3 -c) to parse.docxfiles and validate.pstarchives. This involves dynamic execution of code logic on data provided by external sources. - [PROMPT_INJECTION]: The skill has an attack surface for indirect prompt injection as it processes untrusted content from archives and emails that could contain hidden instructions for the agent.
- Ingestion points: Content extracted from crawled files, including HTML, emails, and documents (SKILL.md).
- Boundary markers: The skill lacks explicit delimiters or instructions to the agent to ignore instructions embedded within the processed archive data.
- Capability inventory: The skill possesses the ability to execute shell commands, run Python code, and write new content to the agent's persistent storage (
writes_pages: true). - Sanitization: Sanitization is limited to stripping HTML tags for display, which does not mitigate the risk of natural language prompt injection.
Audit Metadata