pdf-text-extractor

Pass

Audited by Gen Agent Trust Hub on Mar 13, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSDATA_EXFILTRATIONPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill fetches PDF files from remote servers using URLs found in the papers/core_set.csv input file. While it handles well-known sources like ArXiv, it allows arbitrary URLs, representing an external data ingestion point.
  • [DATA_EXFILTRATION]: Performs network GET requests via the urllib.request library to retrieve external content from remote providers.
  • [PROMPT_INJECTION]: Extracts text from untrusted PDF files, creating a surface for indirect prompt injection.
  • Ingestion points: Remote PDF files and papers/core_set.csv (scripts/run.py).
  • Boundary markers: None implemented to delimit extracted text from agent instructions in the output files.
  • Capability inventory: Filesystem write access (papers/fulltext/) and network retrieval capabilities (scripts/run.py).
  • Sanitization: Only basic whitespace normalization is applied to the extracted text; no content filtering or instruction-detection logic is performed.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 13, 2026, 01:53 AM