NYC

sentencepiece

Warn

Audited by Snyk on Feb 15, 2026

Risk Level: MEDIUM
Full Analysis

MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).

  • Third-party content exposure detected (high risk: 0.80). The training guide explicitly shows ingesting external, public corpora (e.g., references/training.md's "Training from Python iterator" that calls datasets.load_dataset('wikitext','wikitext-103-raw-v1') and examples using input='corpus.txt'), so the agent would read untrusted, user-generated web content as part of its workflow.
Audit Metadata
Risk Level
MEDIUM
Analyzed
Feb 15, 2026, 09:02 PM