sentencepiece
Warn
Audited by Snyk on Feb 15, 2026
Risk Level: MEDIUM
Full Analysis
MEDIUM W011: Third-party content exposure detected (indirect prompt injection risk).
- Third-party content exposure detected (high risk: 0.80). The training guide explicitly shows ingesting external, public corpora (e.g., references/training.md's "Training from Python iterator" that calls datasets.load_dataset('wikitext','wikitext-103-raw-v1') and examples using input='corpus.txt'), so the agent would read untrusted, user-generated web content as part of its workflow.
Audit Metadata