nanogpt
Pass
Audited by Gen Agent Trust Hub on Mar 28, 2026
Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
- [EXTERNAL_DOWNLOADS]: The skill fetches training datasets from external repositories and well-known services. It downloads the TinyShakespeare dataset from a public GitHub repository and uses the HuggingFace datasets library to access OpenWebText and Wikipedia.
- [PROMPT_INJECTION]: The skill identifies a surface for indirect prompt injection as it is designed to ingest and process untrusted text data for model training.
- Ingestion points: The scripts in references/data.md read raw text from downloaded files and remote datasets.
- Boundary markers: Not present; the skill processes text directly into tokenized binary files.
- Capability inventory: Includes local file system writes for datasets and checkpoints, network operations via requests and HuggingFace, and model training execution.
- Sanitization: No sanitization is performed on the training corpus as the primary intent is tokenization of raw text.
- [SAFE]: No malicious activities such as credential exfiltration, privilege escalation, or persistent access were identified. The included Python code uses standard machine learning libraries and follows established educational workflows.
Audit Metadata