extracting-pdf-text
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- Indirect Prompt Injection (LOW): The skill converts untrusted PDF content into text for LLMs, which could contain malicious instructions designed to hijack the agent. 1. Ingestion points: Input PDF files (local or URL) processed by scripts in the scripts/ directory. 2. Boundary markers: Absent; extracted text is returned as raw strings without delimiters or instructions to the LLM to ignore embedded commands. 3. Capability inventory: Scripts can write files to the local system and send data to the Mistral OCR API. 4. Sanitization: Absent; the tools extract and return exactly what is found in the document.
- Data Exposure & Exfiltration (LOW): The extract_mistral_ocr.py script transmits PDF data to api.mistral.ai. This is the documented primary purpose of the script and requires a user-provided API key, making it a legitimate network operation rather than a malicious leak.
Audit Metadata