The Agent Skills Directory

[PROMPT_INJECTION]: The skill includes reference lists of common jailbreak phrases and 'ignore instructions' patterns. These are used for building defensive classifiers and are not intended to subvert the agent's behavior.
[PROMPT_INJECTION]: Reference templates in references/input-guardrails.md demonstrate the interpolation of untrusted data (e.g., {user_input}) into classification prompts. These templates lack strong boundary markers to isolate user input from system instructions.
Ingestion points: references/input-guardrails.md contains prompts for topical and jailbreak detection that directly incorporate user-provided text.
Boundary markers: The templates do not utilize delimiters (such as triple quotes or XML tags) or specific escaping logic to encapsulate the {user_input} variable.
Capability inventory: The skill only requests the Read tool, which significantly limits the potential for an injection to escalate into harmful actions.
Sanitization: The documentation describes PII redaction and regex validation (for emails, SSNs, etc.) as filtering layers, though these do not address instruction-based injection within the prompts themselves.

guardrails