nemo-curator
Warn
Audited by Snyk on Feb 15, 2026
Risk Level: MEDIUM
Full Analysis
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 1.00). Yes — the skill explicitly ingests open web-scraped, user-generated corpora (e.g., "Use NeMo Curator when: Preparing LLM training data from web scrapes (Common Crawl)" and code like DocumentDataset.read_parquet("common_crawl/*.parquet"), plus references to RedPajama v2/The Pile), so it reads untrusted third-party web content as part of its workflow.
Audit Metadata