prompt-guard

Pass

Audited by Gen Agent Trust Hub on Mar 28, 2026

Risk Level: SAFE
Full Analysis
  • [PROMPT_INJECTION]: The skill includes common prompt injection strings like 'Ignore previous instructions' and 'You are now in developer mode'. However, these are provided as example inputs to demonstrate the detection capabilities of the Prompt Guard model. They are used as data for classification test cases and are not instructions intended to override the agent's behavior.
  • [EXTERNAL_DOWNLOADS]: The skill references the 'meta-llama/Prompt-Guard-86M' model and provides links to official tutorials and inference code on Meta's GitHub repositories. These resources originate from a well-known technology organization and are appropriate for the skill's primary purpose of implementing security filtering.
Audit Metadata
Risk Level
SAFE
Analyzed
Mar 28, 2026, 06:06 PM