The Agent Skills Directory

[PROMPT_INJECTION]: The skill includes phrases like 'ignore previous instructions' and 'act as' within a regex list. These are intended for defensive detection by the 'InjectionDetector' class and do not represent a malicious attempt to override agent behavior.
[PROMPT_INJECTION]: The 'ConstitutionalFilter' class presents an indirect prompt injection surface when reprocessing data.
Ingestion points: The 'filter' method in 'SKILL.md' accepts an untrusted 'response' string for evaluation.
Boundary markers: The prompts used for the 'critic' and 'reviser' models do not use delimiters or instructions to prevent the model from obeying commands embedded in the text being analyzed.
Capability inventory: The skill performs additional LLM calls ('critic.generate', 'reviser.generate') based on the content of the 'response' and 'critique' strings.
Sanitization: No injection-specific sanitization is performed to remove potential instructions from the response text before it is interpolated into the filter prompts.

guardrails-safety