doc-to-vector-dataset-generator

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • [Data Exposure & Exfiltration] (SAFE): The skill performs file reading and writing operations on local paths provided to the processing function. These operations are essential to its function and do not target sensitive system directories or involve external network communication.\n- [Indirect Prompt Injection] (SAFE): The skill acts as a data processing pipeline for untrusted external documents.\n
  • Ingestion points: Documents are read from the file system via extract_markdown and extract_pdf.\n
  • Boundary markers: The output JSONL format provides logical separation, but no specific prompt delimiters or instructions are added to the extracted text chunks.\n
  • Capability inventory: File system access (read/write).\n
  • Sanitization: Basic text cleaning and quality checks (length and alpha-character ratio) are performed, though they do not specifically target adversarial prompt injection content.\n- [Unverifiable Dependencies & Remote Code Execution] (SAFE): The skill utilizes standard packages like pymupdf and scikit-learn. It does not perform dynamic package installation or execute remote scripts.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 05:57 PM