nanogpt

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFEEXTERNAL_DOWNLOADSPROMPT_INJECTION
Full Analysis
  • [EXTERNAL_DOWNLOADS]: The skill fetches training datasets from external repositories and well-known services. It downloads the TinyShakespeare dataset from a public GitHub repository and uses the HuggingFace datasets library to access OpenWebText and Wikipedia.
  • [PROMPT_INJECTION]: The skill identifies a surface for indirect prompt injection as it is designed to ingest and process untrusted text data for model training.
  • Ingestion points: The scripts in references/data.md read raw text from downloaded files and remote datasets.
  • Boundary markers: Not present; the skill processes text directly into tokenized binary files.
  • Capability inventory: Includes local file system writes for datasets and checkpoints, network operations via requests and HuggingFace, and model training execution.
  • Sanitization: No sanitization is performed on the training corpus as the primary intent is tokenization of raw text.
  • [SAFE]: No malicious activities such as credential exfiltration, privilege escalation, or persistent access were identified. The included Python code uses standard machine learning libraries and follows established educational workflows.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 06:06 PM