mapping-documents

Pass

Audited by Gen Agent Trust Hub on May 7, 2026

Risk Level: SAFE
Full Analysis
  • [SAFE]: The skill uses well-established libraries like pdfplumber and the official anthropic SDK, which are standard for document processing and AI integration tasks.\n- [EXTERNAL_DOWNLOADS]: The skill connects to the Anthropic API for its primary functionality of semantic extraction. Anthropic is a well-known service provider, and this communication is necessary and intended for the skill's stated purpose.\n- [DATA_EXFILTRATION]: The skill handles API keys according to security best practices, allowing keys to be passed via environment variables or CLI flags rather than being hardcoded in the scripts.\n- [INDIRECT_PROMPT_INJECTION]: As the skill ingests and processes content from untrusted PDF files, it possesses an indirect prompt injection surface. A malicious PDF could attempt to influence the agent by embedding instructions in the extracted text. The skill implements several mitigation strategies:\n
  • Ingestion points: scripts/docmap.py reads PDF content using pdfplumber.open.\n
  • Boundary markers: The semantic extraction prompts use structured labels (SECTION, PAGES, TEXT) and enforce JSON response schemas to maintain separation between data and instructions.\n
  • Capability inventory: The skill is restricted to file operations (writing maps and indexes) and network communication with the Anthropic API; it does not execute code extracted from the documents.\n
  • Sanitization: The _normalize_symbol function uses Unicode NFKC normalization to ensure that extracted symbols are consistent and to prevent homoglyph substitution attacks.
Audit Metadata
Risk Level
SAFE
Analyzed
May 7, 2026, 04:36 AM