guardrails
Pass
Audited by Gen Agent Trust Hub on Feb 28, 2026
Risk Level: SAFEPROMPT_INJECTION
Full Analysis
- [PROMPT_INJECTION]: The skill includes reference lists of common jailbreak phrases and 'ignore instructions' patterns. These are used for building defensive classifiers and are not intended to subvert the agent's behavior.
- [PROMPT_INJECTION]: Reference templates in
references/input-guardrails.mddemonstrate the interpolation of untrusted data (e.g.,{user_input}) into classification prompts. These templates lack strong boundary markers to isolate user input from system instructions. - Ingestion points:
references/input-guardrails.mdcontains prompts for topical and jailbreak detection that directly incorporate user-provided text. - Boundary markers: The templates do not utilize delimiters (such as triple quotes or XML tags) or specific escaping logic to encapsulate the
{user_input}variable. - Capability inventory: The skill only requests the
Readtool, which significantly limits the potential for an injection to escalate into harmful actions. - Sanitization: The documentation describes PII redaction and regex validation (for emails, SSNs, etc.) as filtering layers, though these do not address instruction-based injection within the prompts themselves.
Audit Metadata