skills/fusengine/agents/guardrails/Gen Agent Trust Hub

guardrails

Pass

Audited by Gen Agent Trust Hub on Feb 28, 2026

Risk Level: SAFEPROMPT_INJECTION
Full Analysis
  • [PROMPT_INJECTION]: The skill includes reference lists of common jailbreak phrases and 'ignore instructions' patterns. These are used for building defensive classifiers and are not intended to subvert the agent's behavior.
  • [PROMPT_INJECTION]: Reference templates in references/input-guardrails.md demonstrate the interpolation of untrusted data (e.g., {user_input}) into classification prompts. These templates lack strong boundary markers to isolate user input from system instructions.
  • Ingestion points: references/input-guardrails.md contains prompts for topical and jailbreak detection that directly incorporate user-provided text.
  • Boundary markers: The templates do not utilize delimiters (such as triple quotes or XML tags) or specific escaping logic to encapsulate the {user_input} variable.
  • Capability inventory: The skill only requests the Read tool, which significantly limits the potential for an injection to escalate into harmful actions.
  • Sanitization: The documentation describes PII redaction and regex validation (for emails, SSNs, etc.) as filtering layers, though these do not address instruction-based injection within the prompts themselves.
Audit Metadata
Risk Level
SAFE
Analyzed
Feb 28, 2026, 11:02 AM