The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: The skill fetches PDF files from remote servers using URLs found in the papers/core_set.csv input file. While it handles well-known sources like ArXiv, it allows arbitrary URLs, representing an external data ingestion point.
[DATA_EXFILTRATION]: Performs network GET requests via the urllib.request library to retrieve external content from remote providers.
[PROMPT_INJECTION]: Extracts text from untrusted PDF files, creating a surface for indirect prompt injection.
Ingestion points: Remote PDF files and papers/core_set.csv (scripts/run.py).
Boundary markers: None implemented to delimit extracted text from agent instructions in the output files.
Capability inventory: Filesystem write access (papers/fulltext/) and network retrieval capabilities (scripts/run.py).
Sanitization: Only basic whitespace normalization is applied to the extracted text; no content filtering or instruction-detection logic is performed.

pdf-text-extractor