doc-to-vector-dataset-generator
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- [Data Exposure & Exfiltration] (SAFE): The skill performs file reading and writing operations on local paths provided to the processing function. These operations are essential to its function and do not target sensitive system directories or involve external network communication.\n- [Indirect Prompt Injection] (SAFE): The skill acts as a data processing pipeline for untrusted external documents.\n
- Ingestion points: Documents are read from the file system via
extract_markdownandextract_pdf.\n - Boundary markers: The output JSONL format provides logical separation, but no specific prompt delimiters or instructions are added to the extracted text chunks.\n
- Capability inventory: File system access (read/write).\n
- Sanitization: Basic text cleaning and quality checks (length and alpha-character ratio) are performed, though they do not specifically target adversarial prompt injection content.\n- [Unverifiable Dependencies & Remote Code Execution] (SAFE): The skill utilizes standard packages like
pymupdfandscikit-learn. It does not perform dynamic package installation or execute remote scripts.
Audit Metadata