pdf-text-extractor
Pass
Audited by Gen Agent Trust Hub on Mar 13, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill fetches PDF files from remote servers using URLs found in the
papers/core_set.csvinput file. While it handles well-known sources like ArXiv, it allows arbitrary URLs, representing an external data ingestion point. - [DATA_EXFILTRATION]: Performs network GET requests via the
urllib.requestlibrary to retrieve external content from remote providers. - [PROMPT_INJECTION]: Extracts text from untrusted PDF files, creating a surface for indirect prompt injection.
- Ingestion points: Remote PDF files and
papers/core_set.csv(scripts/run.py). - Boundary markers: None implemented to delimit extracted text from agent instructions in the output files.
- Capability inventory: Filesystem write access (
papers/fulltext/) and network retrieval capabilities (scripts/run.py). - Sanitization: Only basic whitespace normalization is applied to the extracted text; no content filtering or instruction-detection logic is performed.
Audit Metadata