nemo-curator
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- [Prompt Injection] (SAFE): No instructions to override behavior or bypass safety filters were found. The skill is purely technical documentation. \n- [Data Exposure & Exfiltration] (SAFE): No hardcoded secrets or access to sensitive local paths. It follows standard data processing workflows for reading and writing Parquet files. \n- [Obfuscation] (SAFE): No hidden or encoded malicious content detected. \n- [Unverifiable Dependencies & Remote Code Execution] (SAFE): Uses reputable libraries (nemo-curator, dask, rapids) and downloads models from established sources (Hugging Face, NVIDIA). \n- [Indirect Prompt Injection] (LOW): As a data curation tool, it processes untrusted web data. While this is an ingestion surface, the skill is specifically built to mitigate these risks through sanitization modules. \n
- Ingestion points: S3 and local files processed via
DocumentDataset.read_parquet. \n - Boundary markers: Implicitly handled by the curation logic. \n
- Capability inventory: File system writing and distributed execution capabilities. \n
- Sanitization: Includes PII redaction, NSFW classifiers, and quality filters to clean external data.
Audit Metadata