llamaguard
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- [Indirect Prompt Injection] (LOW): The skill is designed to process untrusted user input for the purpose of safety classification. While this provides a surface for indirect prompt injection, it is the intended primary function of the moderation tool. The code follows best practices by using chat templates for input formatting.
- Ingestion points:
check_input,check_output, andmoderate_endpointfunctions. - Boundary markers: Utilizes
tokenizer.apply_chat_templateto separate user content from system instructions. - Capability inventory: Capabilities are limited to local model inference and result classification; no file-writing or arbitrary command execution is present.
- Sanitization: The LlamaGuard model itself serves as the sanitization mechanism.
- [External Downloads] (LOW): The skill downloads model weights from the
meta-llamaorganization on Hugging Face. This organization is listed as a trusted source, and the loading methods (from_pretrained,vllm.LLM) are standard for the field.
Audit Metadata