The Agent Skills Directory

[EXTERNAL_DOWNLOADS] (LOW): The skill downloads and installs Python packages (nemo-curator, cudf, dask, rapids) and pre-trained models from Hugging Face and NVIDIA's registries. While Hugging Face is a trusted source, the use of external code and models at runtime represents a known security surface.
[PROMPT_INJECTION] (LOW): The skill has an indirect prompt injection surface (Category 8) because it is designed to ingest and process large, untrusted datasets from external sources. • Ingestion points: The skill reads data from 'common_crawl/.parquet' and 's3://large_dataset/.parquet' as described in SKILL.md. • Boundary markers: No specific delimiters or markers are implemented to isolate untrusted data during processing. • Capability inventory: The skill performs data filtering, deduplication, and PII redaction; no high-risk capabilities like shell command execution or system file modification were found. • Sanitization: The skill includes features like PIIRedactor and NSFWClassifier to help filter sensitive or inappropriate content, which mitigates some ingestion risks.

nemo-curator