pdf-extractor
Pass
Audited by Gen Agent Trust Hub on Mar 10, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSCOMMAND_EXECUTIONPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: Fetches pre-trained machine learning models for the docling and marker backends from repositories hosted by IBM and datalab-to upon first use.
- [COMMAND_EXECUTION]: Provides a command-line interface extract-pdfs and utilizes system tools like pdftotext for specific extraction tasks.
- [PROMPT_INJECTION]: The skill processes PDF documents which creates an attack surface for indirect prompt injection. If an agent extracts content from a document containing hidden instructions, it may attempt to follow them.
- Ingestion points: Reads local PDF files provided by the user or external sources (extractors.py).
- Boundary markers: Absent; the skill does not wrap extracted text in delimiters to segregate it from agent instructions.
- Capability inventory: The skill has filesystem write access in extractors.py and can execute subprocesses through backends.py.
- Sanitization: Absent; extracted text is not validated or sanitized before being returned to the agent context.
Audit Metadata