NYC

llamaguard

Pass

Audited by Gen Agent Trust Hub on Feb 17, 2026

Risk Level: SAFE
Full Analysis
  • [EXTERNAL_DOWNLOADS] (SAFE): The skill directs users to download model weights from huggingface.co/meta-llama. This is the verified official organization for Llama models, and downloading weights via standard libraries like transformers or vLLM from this source is considered safe practice.\n- [COMMAND_EXECUTION] (SAFE): Installation instructions use standard Python package management (pip install) for well-known libraries. Execution patterns follow established documentation for model serving and do not involve suspicious or elevated privileges.\n- [INDIRECT_PROMPT_INJECTION] (LOW): As a moderation tool, this skill is designed to process untrusted user and assistant content.\n
  • Ingestion points: Untrusted data enters the agent context through the user_message and bot_response parameters in the moderation functions and the FastAPI endpoint.\n
  • Boundary markers: The skill relies on tokenizer.apply_chat_template, which provides structural delimiters for the model, helping to differentiate between instructions and untrusted data.\n
  • Capability inventory: The code snippets provided are focused solely on inference and do not possess or grant dangerous capabilities such as arbitrary file writes, network exfiltration, or shell execution based on the processed input.\n
  • Sanitization: Security is enforced by the specialized LlamaGuard model's training, which is specifically aligned to classify and handle harmful content rather than being vulnerable to its instructions.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 17, 2026, 05:58 PM