The Agent Skills Directory

[EXTERNAL_DOWNLOADS]: The skill fetches training datasets from external repositories and well-known services. It downloads the TinyShakespeare dataset from a public GitHub repository and uses the HuggingFace datasets library to access OpenWebText and Wikipedia.
[PROMPT_INJECTION]: The skill identifies a surface for indirect prompt injection as it is designed to ingest and process untrusted text data for model training.
Ingestion points: The scripts in references/data.md read raw text from downloaded files and remote datasets.
Boundary markers: Not present; the skill processes text directly into tokenized binary files.
Capability inventory: Includes local file system writes for datasets and checkpoints, network operations via requests and HuggingFace, and model training execution.
Sanitization: No sanitization is performed on the training corpus as the primary intent is tokenization of raw text.
[SAFE]: No malicious activities such as credential exfiltration, privilege escalation, or persistent access were identified. The included Python code uses standard machine learning libraries and follows established educational workflows.

nanogpt