skills/pluginagentmarketplace/custom-plugin-ai-red-teaming/input-output-guardrails/Gen Agent Trust Hub
input-output-guardrails
Pass
Audited by Gen Agent Trust Hub on Feb 17, 2026
Risk Level: SAFE
Full Analysis
- [SAFE] (SAFE): The skill contains no malicious code or instructions. All patterns and code samples identified are defensive in nature and intended to improve the security posture of an AI agent by filtering harmful inputs and outputs. The patterns for prompt injection and sensitive data detection are implementation examples for safety filters.\n- [Indirect Prompt Injection] (SAFE): The skill intentionally processes untrusted external data to apply security filters. The logic is benign and aligned with its stated security purpose.\n
- Ingestion points: Untrusted data enters the context through the
user_inputparameter inInputGuardrails.validateand theresponseparameter inOutputGuardrails.filter(SKILL.md).\n - Boundary markers: While explicit boundary markers are not shown in the Python code snippets, the configuration in
assets/guardrails-config.yamlincludes regex patterns for detection of delimiters like[INST]and<|system|>.\n - Capability inventory: The skill has read/write capabilities for local file management to generate reports (found in
scripts/test-guardrails.py) but does not contain dangerous command execution, system modification, or network exfiltration logic.\n - Sanitization: The skill implements multi-layer sanitization through PII redaction, content filtering, and blocklist matching.
Audit Metadata