The Agent Skills Directory

[Indirect Prompt Injection] (LOW): The skill is designed to process untrusted user input for the purpose of safety classification. While this provides a surface for indirect prompt injection, it is the intended primary function of the moderation tool. The code follows best practices by using chat templates for input formatting.
Ingestion points: check_input, check_output, and moderate_endpoint functions.
Boundary markers: Utilizes tokenizer.apply_chat_template to separate user content from system instructions.
Capability inventory: Capabilities are limited to local model inference and result classification; no file-writing or arbitrary command execution is present.
Sanitization: The LlamaGuard model itself serves as the sanitization mechanism.
[External Downloads] (LOW): The skill downloads model weights from the meta-llama organization on Hugging Face. This organization is listed as a trusted source, and the loading methods (from_pretrained, vllm.LLM) are standard for the field.

llamaguard